Petr Klapetek - Graphics card computing

Graphics card computing

Graphics cards can be used very efficiently for scientific computing. As known from the many fields of numerical physics (e.g. molecular dynamics), use of graphics cards instead of computer processor can lead to several hundreds of speedup, which can be a breakthrough in many scientific problems.

Besides attempts to run different calculation in graphics cards, like classical molecular dynamics (see A. Campbellov�, P. Klapetek, M. Valtr, Meas. Sci. Technol. 20 (2009) 84014), I am interested namely in use of graphics cards for Finite Difference in Time Domain Method (FDTD).

Finite Difference in Time Domain method (FDTD) is a standard method for electromagnetic computations for a broad range of frequencies covering nearly all the electromagnetic field related topics in industry and science. FDTD is based on an iterative numerical solution of the Maxwell equations simulating wave propagation in a sequence of very short time steps

For the computation on a Graphical Processing Unit NVIDIA CUDA environment was used. This environment provides common drivers and API for a broad range of NVIDIA products, running both on Windows and Linux operating systems. Both data processing and memory model is completely different for GPU and the part of the code that should be run on GPU (called kernel) must be written to fulfill these conditions. In general, the GPU is equipped by several multiprocessors, consisting of a large number of processors. Many hundreds of threads (kernel calls) grouped in thread blocks can be processed simultaneously on GPU therefore. Memory available on GPU can be divided into a global memory - accessible by all the multiprocessors, a shared memory - accessible by processors within one multiprocessor, and a local memory - accessible by single processor. All the memories are hardware limited (for each type of GPU differently).

The following speedups were observed after simply rewritting the C code of FDTD in order to fit the CUDA programming model. The time dependence on for same number of FDTD steps on computational volume of different cube size is described:

cube edge size	computer	graphics card
180	20 min 42 s	25 s
240	48 min 15 s	47 s
280	77 min 30 s	68 s

More can be found in recent publications:

P. Klapetek et al: Rough surface scattering simulations using graphics cards, Applied Surface Science 2010, 256, pp 5640-5643
P. Klapetek et al: Near field optical microscopy simulations using graphics processing units, Surface and Interface Analysis 2010, 42, pp 1109-1113