OpenCL accelerated black hole simulations using GPUs and Cell B.E.
We make use of the OpenCL framework to accelerate a gravitational physics numerical modeling application using the hardware accelerators -- Cell BE and Tesla CUDA GPU. OpenCL allows us to execute identical source-code on both these many-core architectures and yet obtain order-of-magnitude performance gains that are comparable to what can be derived from the native SDKs like CUDA and Cell SDK.
The specific application that we consider in this work is one that evolves the gravitational-waves generated by a compact object (such as a star of the size of our own Sun) that has a decaying orbit around a supermassive black hole. Such large black holes -- often more massive than a million times our Sun -- lurk at the center of most galaxies and routinely devour smaller stars and black holes. Such processes are commonly referred to as extreme mass-ratio inspirals (EMRIs) in literature.
Our EMRI modeling code is essentially an inhomogeneous, hyperbolic PDE (wave-equation in Kerr space-time geometry) solver that includes a rather complicated source-term. The numerical algorithm used by this code is a standard Lax-Wendroff finite-difference scheme. Because of the computational complexity of the source-term, it is often the most numerically intensive part of the whole evolution. On a many-core processor, it is precisely this part of the computation that is "farmed out" to the many compute-cores for parallel execution. The context of the numerical computation is double-precision floating-point accuracy.
The final outcome of our work (see plot below) is very similar on both Tesla GPU and Cell BE -- we obtain well over an order-of-magnitude gain in overall application performance! Our results also suggest that an OpenCL-based implementation delivers comparable performance to that based on native SDKs such as CUDA and Cell SDK. Moreover, the OpenCL source-code is identical for both these hardware platforms, which is a non-trivial benefit -- it promises tremendous savings in parallel code-development and optimization efforts.
We use the following hardware for our performance tests: IBM QS22 blade system, with two (2) Cell BE processors clocked at 3.2 GHz. This system is equipped with 16 GBs of main memory. In the GPU context, our system supports the Nvidia C1060 Tesla CUDA GPU. This system has an AMD 2.5 GHz Phenom (9850 quad-core) processor as its main CPU and four (4) GBs of memory. All these systems are running Fedora Linux as the primary operating system. We use vendor (IBM, Nvidia) supplied OpenCL libraries and compilers on both systems.



Comments
delivers comparable
delivers comparable performance to that based on native SDKs
Did you actually do any explicit testing of this? Any numbers? (i.e. is there still a few % to be gained by going native?)
Thanks,
pete
Native SDK comparison
Yes, we did a pretty extensive comparison with native SDKs like CUDA and Cell SDK. We got near identical speed-ups with CUDA and about 10% higher with Cell SDK.
There may be perhaps other more compelling reasons to go native. For example, OpenCL kernels don't yet support C++ features or any other languages like Fortran. The performance seems pretty good already. Of course, this is our experience on our application (YMMV).
I hope this helps!
"Black holes are where God divided by zero." - Steven Wright
What is CPU timing?
Nice to see. I was just wondering what the CPU timing represents. Is it the single core performance, or performance for all 4 cores? If the latter, was that the OpenCL performance on the CPU, or was the program parallelized for the CPU using some other parallel library?
Drew
---------------------------
Drew McCormack
http://www.mentalfaculty.com
http://www.macanics.net
http://www.macresearch.org
On the CPU
Thanks, Drew.
On the CPU, you're seeing single-core performance. AMD's OpenCL implementation doesn't support double-precision yet. And oddly, OpenMP yields very little performance gain on all 4-cores for this application.
GK
"Black holes are where God divided by zero." - Steven Wright