At their core, many scientific computing problems require the solution of either

* Ax=b *

or

min * x ^{ T } Qx *

*+ c*subject to

^{ T }x,*Ax ≤ 0*.

However, many such problems can be sparse and ill conditioned. Due to the relative complexity of algorithms and their associated memory access patterns, little work has been undertaken to fully support these important kernels on multicore architectures.

With the assistance of the CCP, STFC's Numerical Analysis Group is looking to apply its expertise in sparse direct methods and optimization to the creation of native libraries to support the solution of these problems. This currently involves the porting of key software to CUDA and the development of new algorithms and techniques to ensure full utilization of available computational resources.

An early phase of this research addressed the level 2 BLAS operation _trsv which performs a dense triangular solve and is often used in the solve phase of a direct solver following a matrix factorization. With new manycore architectures significantly reducing the cost of compute-bound computations, memory-bound operations such as this kernel have become increasingly important. Through a careful analysis of communication requirements a new implementation was written that exploits NVIDIA's memory fence operations to give a speed up of up to 15 times over the best previous implementations.

** Articles**:

- "Compressed threshold pivoting for sparse symmetric indefinite systems", J.D. Hogg and J.A. Scott, STFC Preprint RAL-P-2013-007.
- "Achieving bit compatibility in sparse direct solvers", J.D. Hogg and J.A. Scott, STFC Preprint RAL-P-2012-005
- "A fast Dense Triangular Solve in CUDA", J.D. Hogg, SIAM J. Scientific Computing 35(3), 2013
- "On the efficient scaling of sparse symmetric matrices using an auction algorithm", J.D. Hogg and J.A. Scott, STFC Preprint RAL-P-2014-002
- "A sparse symmetric indefinite direct solver for GPU architectures", J.D. Hogg, E. Ovtchinnikov and J.A. Scott, STFC Preprint RAL-P-2014-006

**Presentations**:

- "A GPU sparse direct solver for A*x=b" at NVIDIA's GPU Technology Conference (GTC 2014)
- "A GPU sparse symmetric indefinite solver with pivoting" at SParse Days 2014, CERFACS, June 2014

**Software**:

- The SPRAL open-source library for sparse linear algebra and associated algorithms
**.** - The HSL library for sparse linear algebra.
- The GALAHAD library for optimization.
- ASEArch BLAS on CCPForge, also incorporated into NVIDIA's CUBLAS library.