The Oxford e-Research Centre GPU Seminar Series Michaelmas 2011
Presentations from the OeRC Cuda/GPU seminar series (Michalemas 2011)
Seminar schedule for Michaelmas 2011
Lots has happened in GPU computing in the last 6-9 months. In this talk I will try to cover the highlights, including: • NVIDIA's Kepler, Maxwell and Project Denver developments • CUDA 4.0 and GPUdirect • PGI compilers • Intel's Sandy Bridge and MIC processors, with AVX vector units • OpenCL developments • new supercomputers • MATLAB and Accelereyes software Talk content: http://www.oerc.ox.ac.uk/research/many-core-general/computing-links
Flamingo is a general-purpose framework for auto-tuning program parameters. It has been developed with a view to choosing how a large problem can be broken down to be solved in parallel on a GPU, by tuning the thread block size, for example. The software has already been used within OeRC to successfully tune an airfoil simulation distributed with OP2 and is being explored by Fujitsu Labs and NAG. I will explain how Flamingo can be used to tune a parametrised program, what types of parameters can be tuned, and how the results can be examined to guide further optimisations. I will also show some results from tuning an airfoil simulation, showing how the optimal parameter settings vary across different GPUs and CPUs.
My work is on efficient stochastic simulations of chemical reaction systems on GPUs. To increase the outreach of my work it is tightly integrated with Matlab. I will talk about the challenges I faced doing stochastic simulation on GPUs in general, especially integrating GPU code with Matlab, and how to debug such code.
We use a coarse-grained model of DNA to study DNA nanotechnology systems. The model captures basic mechanical, structural and thermodynamical properties of DNA and allows us to study systems of sizes around 10000 nucleotides. We will describe the molecular dynamics implementation of the model on CUDA and then show examples of different systems that we simulate: DNA self-assembling tiles, DNA logic gates, DNA origami and molecular motors based on DNA.
Astrophysical radio transients are excellent probes of extreme physical processes originating from compact sources in our Galaxy and beyond. Radio frequency signals emitted from these object provide a means to study the intervening medium through which they travel, via the processes of dispersion, scintillation and scattering. Next generation radio telescopes are designed to explore the vast unexplored parameter space of high-time resolution astronomy, but require HPC solutions to successfully process the enormous volumes of data that are produced by these telescopes. We have chosen to use GPU technology, which we have applied to the problem of removing interstellar dispersion in order to detect millisecond radio bursts from astronomical sources. We have developed a combined software/hardware solution to real-time searches for millisecond radio transients, code-named ARTEMIS, which we have been optimising using the international Low Frequency Array station at Chilbolton, UK. I will present a brief introduction to GPUs and their application to the particular problem we are faced with. This is followed by a description of the LOFAR instrument we are using. The physical process of dispersion is discussed along with techniques for removing the dispersive effects of the interstellar medium, traditionally applied to observations of radio pulsars. I will describe the ARTEMIS project, its aims and current status. Optimisations and modifications of early GPU implementations of the dedispersion algorithms are presented. I will present results from two brute-force algorithms. The first is a GPU based algorithm, designed to exploit the L1 cache of the NVIDIA Fermi GPU. Our second algorithm is CPU based and exploits the new AVX units in Intel Sandy Bridge CPUs.
1. Introduction What is a radical pair? 2. Rotary-RYDMR A radio-frequency field radical pair experiment. 3. Theoretical Description Calculation of the quantum mechanical evolution of radical pairs. 4. Simulation with MATLAB and CUDA Fitting experimental data using MATLAB with CUDA.