Computing Grid in a Box with an 8-core Mac Pro
Of course an 8-core Mac Pro is nice for "mine is bigger than yours" taunts, however if you're going to spend the money to buy a machine with 8-cores and 16 GB of RAM you better figure out how you are going to use all that horsepower. In a previous article I demonstrated how to use R + LAM/MPI to turn an 8-core Mac Pro (a personal machine I've dubbed the MacZilla) into a statistics supercomputer. In an effort to further justify the purchase of the MacZilla to the PI of our lab, I experimented with turning an 8-core Mac Pro into an all-in-one computing grid. To accomplish this I employed the VMWare Fusion virtualization software and Sun Grid Engine. I chose VMWare Fusion over Parallels since VMWare Fusion can fully utilize 16GB of RAM and plays very nice with 64-bit Operating Systems.
The set up is simple. I created seven Linux (Fedora Core) virtual machines using VMWare Fusion and gave each of them a single CPU/Core. VMWare Fusion offers the ability to give two CPU's to a VM, however I opted not to do this due to the nature of my computations. Each VM was given 2GB of RAM and 20 GB of disk space. I then installed and configured Sun Grid Engine on the host OS X operating system (acting as the head node) and the seven Linux guest VM's (acting as the compute nodes). I tested the setup using qsub to send out a large BioRuby-based sequence analysis task. VMWare Fusion handled the task flawlessly, although my hard disk seemed to be under heavy abuse, and the machine put out enough heat to roast marshmallows.
This set up has come to be quite useful for me as our lab is quickly running out of HPC computing capacity. When SGE queues are backed up on our main cluster, I simply load up my analysese on my MacZilla grid-in-a-box solution without leaving the comfort of my office (which is kept nice and toasty with the MacZilla firing on all cores).



Comments
Xgrid?
Seems like it would be easier to set up the computer as an Xgrid controller, and have it distribute jobs to each of the cores. The advantage of your setup is that SGE works with many OSes, but for local jobs, Xgrid should work fine.
RE: Xgrid?
Xgrid would work in this case, but 1) I did not want to install OS X server on my machine, and 2) I want to be able to integrate my VM nodes into our main SGE cluster when I'm not using it. Also, all of my scripts are written to be used with SGE's qsub (which could be easily changed to run with Xgrid, I just don't have the time).
Virtual machine overhead
I know it is hard to tell, but do you know how much overhead you have from running 7 instances of Linux? There's got to ba a significant amount of memory and CPU used for the various processes needed to run a whole OS. Did you somehow use a barebone version of Linux, specifically designed for that kind of computing node usage?
On the other hand, do you think your setup might make things better than having all the process running under OS X (e.g. Xgrid-type or MPI), by forcing memory and CPU "splitting" and CPU, hence removing some overhead from the kernel handling it. I don't know how VMWare works, but it seems it can "hijack" the kernel and isolate specific CPUs and memory to the hosted system(s). That could be a win? Note: It is highly likely I misunderstand how virtual hosting works, since I have no idea what I am talking about!!
charles
Point (2) makes sense -- if
Point (2) makes sense -- if you have significant SGE infrastructure already, then it makes sense to integrate the Mac as much as possible.
As to point (1), you don't need to install OSX Server to run an Xgrid controller. Apple's server tools are downloadable from http://www.apple.com/support/downloads/serveradmintools104.html , and you can start the Xgrid controller by typing
sudo xgridctl c start
in the command line (there's a GUI app to do this too, but I forget what it's called). It works really nicely on my little 9-node Mac/Linux cluster, with no OSX server anywhere!
Can't you use SGE for Mac?
I'm still confused here Joel. There's already Sun Grid Engine for Mac:
http://gridengine.sunsource.net/downloads/61/download.html
So why do you need to create all these VM instances (with Linux) to act as SGE clients? Couldn't you just create a queue on your box and run qsub locally?
RE: Can't you use SGE for Mac?
Ahh yes. I forgot to mention in the article that my analyses use some software that will not run under OS X (only runs under Fedora/Redhat). Hence the Linux VM's.
Using compatible executables (perhaps)
Hi,
of course I can't answer for Joel, but I would guess one advantage for running a virtual operating system via VMWare or Parallels is that you can - by wisely choosing the installation OS and options - then use the same executables as for the rest of the cluster.
I find it already annoying to have to manage distinct 32- and 64-bit executables for our mixed cluster (of course you can in most of the cases just run the 32-bit ones under the 64-bit OS, but that's another story), so adding OSX-executables would create even more hassle.
RE: Virtual machine overhead
Charles,
Anecdotally I was surpised that the overhead was not really above what resources I allocated to the VM's. Of course my hard disk did seem to take a good thrashing. I forgot to mention that I need to use software that wont run on OS X, therefore the Linux VM's were necessary vs Xgrid.
Xgrid Lite
it is called Xgrid Lite
it is free.
it is open source.
look at
look at http://www.macworld.com/2007/09/reviews/vmwarefusion/index.php?lsrc=mwrss
from that article...
"Fusion can run multiple virtual machines with much less RAM usage than you might expect. It does this by sharing the RAM that’s used for identical features among the opened virtual machines. So if you’re running two different Windows XP virtual machines, much of the RAM is shared, saving memory for additional virtual machines. In practice, I was able to run three Windows virtual machines and a Fedora Core Linux virtual machine simultaneously on my 4GB Mac Pro without running into any notable slowdowns."
Anyone have any experience with this?