GPU Computing

From ICPWiki
Revision as of 10:25, 9 September 2011 by Wmueller (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In the last couple of years, the computational power of graphics cards has grown much faster than that of conventional CPUs, although at the same time the graphics cards have become true general purpose processors. Recent graphics processors reach speeds of up to a Teraflop on a single PCIe-board. With the introduction of easy-to-use programming languages for the GPU hardware, this computational power can be harnessed for many applications. We use GPU computing to accelerate our computer simulations.


All our development uses NVidia's CUDA, which allows to program (NVidia) GPUs in a C++-like language, with only a few GPU-specific extensions. Therefore, learning CUDA is not difficult, provided one has a bit of background in C/C++ programming and can imagine codes that run in thousands and thousands of instances in parallel.

The best way of learning programming in CUDA is probably by looking at some simple examples. While the NVIDIA SDK comes with many examples, they are often not as simple as could be, but rather optimized. Therefore, we have compiled a collection of simple, GPL'ed examples that show how CUDA works. These examples are not tuned to the last bit, and codes might easily be tuned to run twice as fast. However, they are easy to read, and still perform an order of magnitude faster that compare CPU code.

Lattice Boltzmann


Lattice based methods, such as the lattice Boltzmann algorithm for fluid dynamics are particularly well suited for the massively parallel architecture of GPUs. Therefore, recent versions of ESPResSo can make use of a CUDA-accelerated lattice Boltzmann implementation. ESPResSo uses lattice Boltzmann to mediate hydrodynamic interactions between particles. While the Molecular Dynamics of the particles is calculated on the conventional front end CPU, the more expensive hydrodynamics is off-loaded to a GPU. By this, many simulations, that required a whole 32-node compute cluster so far, can now be performed on a desktop computer.