Science on a G5/PS3/MacPro - Benchmarks

As promised last week, I have here some benchmarks that compare the performance of double-precision floating point operations between a PowerMac (2.5 GHz G5), a PlayStation 3 (3.2 GHz Cell) and a Mac Pro (2.66 GHz Xeon). The PowerMac and Mac Pro are running Mac OS X Tiger. The PS3 is running Yellow Dog Linux 5.0. The codes I used are simple, serial and single-threaded. Thus, the numbers below are for a single core only on all the systems. I used a variety of compilers/flags on these different platforms, including some commercial ones. GCC is available for all these systems, IBM XLC (an alpha version) is available for the Cell and Intel's ICC is available for the Mac Pro. Note that for now, I am only using the PPU on the PS3 for these tests, so I'm really using a fraction of the PS3's available hardware.

The "Astrophysics Simulation" is based on one of my own research codes. Its essentially a linear, hyperbolic PDE solver with a complicated source-term. Sorry, the source code is not publicly available. The other two codes compute "Pi" using different approaches. Those codes can be found easily on the internet. Now, on with the numbers ..


Astrophysics Simulation

G5 Cell Xeon Compiler
12 41 9 gcc -O3
8 X 9 gcc -fast
X 19 4 xlc/icc

Compute Pi (Monte Carlo)

G5 Cell Xeon Compiler
18 36 8 gcc -O3
8 X 8 gcc -fast
X 18 3 xlc/icc

Compute Pi (Integral)

G5 Cell Xeon Compiler
16 27 6 gcc -O3
5 X 6 gcc -fast
X 13 3 xlc/icc

Time in seconds. Smaller is better.

Of course, feel free to draw your conclusions. But, here are my thoughts. I won't talk that much about the Mac Pro. We know very well that the combination of hardware and mature compilers that Intel has to offer today is hard to beat. Between the G5 and the Cell, my results are essentially confirming what Geek Patrol found. The PS3 performs at about half of the performance of a high-end PowerMac G5. While this may not sound impressive, recall, I'm only using a fraction of the PS3's potential since I'm only using the PPU and completely ignoring the 8 SPU's (UPDATE! See below). Also, note that IBM's XLC compiler is still in alpha state and it already optimizes 2X better than GCC. Moreover, IBM is working on a special compiler for the Cell, called the Octopiler, which is supposed to be able to optimize regular code to take advantage of the Cell architecture, including the SPU's. So, there is a lot to look forward to on this front in the near future. Finally, if you consider the low cost of the PS3, even with all these caveats, it still delivers an exceptional "bang-per-buck". So, if you're one of those people who invested on the PowerPC architecture maybe for certain very specfic tasks (eg. involving AltiVec etc.), I highly recommend your getting and playing with a PS3.

UPDATE: I just learned the very basics of SPU programming and ran the same tests on a single SPU of my PS3. Crudely, the performance of a single SPU appears to be in the same ballpark as that of a high-end G5 processor. This is quite impressive, especially if you note that the PS3's Cell has 6 of these available for crunching.

UPDATE II: IBM recently released an alpha version of XL Fortran for the Cell. It appears to work very well on my PS3 running YDL. Its performance on my codes is about 2X faster over Gfortran, which is very impressive. It is worth noting that this Fortran compiler only creates executables for the PPU. You can't write code for the SPUs using Fortran (yet). More details here: IBM XLF for Cell.

Comments

Idea?

I think to add to this, you should have also run Yellow Dog Linux on the Power Mac. That set up is used in many scientific ventures, and I think it would have been good to see up against the PS3 and the PowerMac running Mac OS X too.

Double precision performance

Double precision performance on the SPE's is much, much worse than single precision. Also the arithmetic is done in round-to-zero mode, which is perhaps not what would be desired for scientific calculation. Round-to-nearest-even is standard. GPU's have similar problems.

I would be extremely wary of gcc -fast. Among other things it promises that infinities and NaNs don't happen. Things like isnan() will always return false, even when presented with a nan. Were I reviewing papers, I would review any work that uses gcc -fast with extreme prejudice.

I agree that Cell is impressive, but I think that scientists should be sure of what they are getting into, if they plan to do much work with the SPEs. Given a choice between a 8 core Cell vs a 8 core workstation, you are probably going to be much better off with the workstation, at least for performance. I'm not sure what the respective prices would be. ...and of course the Cell performance/watt is very good.

Thanks for these comments.

Thanks for these comments. Yes, as a matter of practice, I like to run a few test cases of my codes on totally different hardware and compilers just to be sure of my results. Over the years, I have come across a few situations in which a certain combination of hardware and compiler have yielded wrong (!) results.

I agree, an 8 core CPU is more usable (definitely, much easier to program!). But, indeed, its the cost that needs to be factored in, which makes the PS3 very interesting ..

PS3 SPE performance

An update on using the PS3 for scientific computing. I just spent some time learning the very basics of SPU programming and ran the same tests on a single SPU of my PS3. Crudely, the performance of a single SPU appears to be in the same ballpark as that of a high-end G5 processor. This is quite impressive, especially if you note that the PS3's Cell has 6 of these available for crunching ..

Compilers... and vector coding ?

What about using XLC for the PPC on OSX ? I remember it was available at some point, and showed massive improvements over GCC (3 at this time).

GCC is really known to be pretty bad at the PPC. I can remember reading a white paper explaining why. Basically, it is difficult to map the PPC structure onto GCC.

Also... keep in mind that you are using double-precision FP ops. If your code code use the Vector units of the G5 or Cell... the Xeon would lie in dust.

Oh, don't get me wrong, I don't want to launch the Intel vs PPC war again. But the PowerPC (and Cell) architecture has a unique and fantastic Vector unit implementation that can provide massive power to scientific calculations. When possible of course.

Re

Did you utilize a help of a custom writing service for your good enough release? I think that you have got unique essay topic accomplishing skills. Thanks a lot for sharing that!