ASEArch - Algorithms and Software for Emerging Architectures

Flagship project

At their core, many scientific computing problems require the solution of either

Ax=b

or

min  x T Qx   + c T x,   subject to Ax ≤ 0 .

However, many such problems can be sparse and ill conditioned. Due to the relative complexity of algorithms and their associated memory access patterns, little work has been undertaken to fully support these important kernels on multicore architectures.

With the assistance of the CCP, STFC's Numerical Analysis Group is looking to apply its expertise in sparse direct methods and optimization to the creation of native libraries to support the solution of these problems. This currently involves the porting of key software to CUDA and the development of new algorithms and techniques to ensure full utilization of available computational resources.

An early phase of this research addressed the level 2 BLAS operation _trsv which performs a dense triangular solve and is often used in the solve phase of a direct solver following a matrix factorization. With new manycore architectures significantly reducing the cost of compute-bound computations, memory-bound operations such as this kernel have become increasingly important.   Through a careful analysis of communication requirements a new implementation was written that exploits NVIDIA's memory fence operations to give a speed up of up to 15 times over the best previous implementations.

Articles:

Presentations:

Software:

  • The SPRAL open-source library for sparse linear algebra and associated algorithms.
  • The HSL library for sparse linear algebra.
  • The GALAHAD library for optimization.
  • ASEArch BLAS on CCPForge, also incorporated into NVIDIA's CUBLAS library.