Statistics Supercomputing on a Mac with R and LAM/MPI

Introduction

R is a venerable, free software environment for statistical computing and graphics. It has a large user community and an impressive number of available add-on packages. Notable among these is the Bioconductor family of packages, which constitute a comprehensive statistical computing environment for bioinformatics (with a focus on microarray data).

Fortunately R is very well supported on OS X and in fact, the R for Mac installation provides a very Cocoa-esque editing and execution interface for R, with such Mac-specific amenities as Quartz-based plotting windows all wrapped up in a nice OS X application bundle. You can download the OS X installation for R from any of the CRAN mirrors.

One particularly useful R package is the Simple Network of Computers, or SNOW package. Snow provides a number of parallelized R functions that piggyback on a number of distributed computing back end technologies, such as MPI, or Parallel Virtual Machine (PVM). For a nice quick reference of SNOW functions I recommend the SNOW Simplified website.

A Case Study

Recently I had a need to perform hierarchical clustering + multiscale bootstrap analysis of a massive microarray data set. Thankfully there is an R package called pvclust that trivially facilitates such analyses. The non-trivial aspect of my analyses was the CPU power needed to carry out my analyses using 10000 bootstrap permutations. A quick test on my Macbook Pro laptop indicated it would take days to execute on a single machine. Luckily the pvclust package provides a parallelized bootstrapping function based on SNOW, so I turned to our lab's linux cluster, but the shuffling of data across the network to the compute nodes was proving to slow things down too much (only 1GB ethernet interconnects on this cluster).

After wasting so much time with the linux cluster I realized that I had overlooked one of the newest members of my computing arsenal. I recently acquired a Mac Pro workstation with dual quad-core 3.0 GHz Xeon processors and 16GB or RAM, which I have dubbed the MacZilla. Foolishly I ignored the baby supercomputer sitting next to my desk.

I installed SNOW on the MacZilla using the GUI package manager offered by the R GUI for Mac. I decided to use the MPI back end for SNOW and installed the LAM/MPI packaged (preferred by SNOW) using MacPorts. To use the MPI back end with SNOW you need to install the Rmpi package. If you installed LAM/MPI using MacPorts you will need to specify the following configure flag when you install Rmpi:

--with-mpi=/opt/local/

This ensures that the Rmpi library is linked using the correct MPI implementation. With Rmpi in place you have all you need to use the parallel SNOW functions. The first step in any parallel SNOW code is to create a cluster object:

cl <- makeCluster(16, type="MPI")

The above command creates a 16 node MPI cluster object. This cluster object will be passed in as a parameter to any SNOW-based function. For example:

> A<-matrix(1:10, 5, 2)
> A
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

> parApply(cl, A, 1, sum)
[1] 7 9 11 13 15

When you no longer need the cluster object you must call stopCluster() to dispose of it:

stopCluster(cl)

Using my MacZilla workstation I created a 16 node MPI cluster and carried out my bootstrap analyses in a single hour, instead of the many hours if not days it may have taken on our Linux cluster (due to the data shuffling over the network). The main benefit being the availability of so many CPU cores on the same memory and CPU buses. Given the many high-level MPI interfaces for R and other scripting languages (PERL, Python, Ruby, etc), the 8-core Mac Pro gives you a lot of computing bang for your buck.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

MPI clarification

Sorry I am not superfamiliar with MPI. If I understand, you are using MPI on just one machine, to distribute the load over the cores. Did you actually set it up to use 16 nodes, even though you have just 8 cores? Or was that just an example?

charles

Charles- I think you are

Charles-

I think you are mixing up MP with MPI. MPI (via LAM, OpenMPI, MPICH, etc..) can be used for one machine with one or more processors, but can also be used for multiple networked machines (which can each have one or more processors). MP on the other hand, is designed for use on one machine with multiple processors.

-Mark

SNOW clusters with PVM

I've used SNOW a bit with rpvm (the R package with PVM bindings), also installed through MacPorts.

To get PVM working properly, I needed to put the following in .bashrc (or the appropriate configuration file) on each of the computers running SNOW nodes and the master machine running PVM:

export PVM_ROOT=/opt/local/lib/pvm/
export PVM_ARCH=DARWIN
export PVM_RSH=/usr/bin/ssh

This is OS X/MacPorts specific, and would differ for other platforms. One starts PVM from the terminal, adds nodes (or not, if nodes will run on the local machine) then starts up a cluster as specified in the parent post, except that the type argument of makeCluster becomes "PVM" (or just let SNOW choose for you). The final line there means that networked nodes are accessed via SSH rather than anything else.

One of the advantages of SNOW is that it doesn't seem to care what the other nodes are running on. So, you can set up an ad hoc cluster with Intel and PPC Macs (as I've tried), and I would guess (but haven't tried) that anything else running the same versions of R and pvm would work, too.

A second advantage is that any R objects can be copied to SNOW nodes prior to running repeated computations on that data, thereby avoiding most of the communications overheads when the master machine is not where the SNOW nodes reside. To do this in the case study, permutations would have to be generated locally on nodes -- I guess pvclust does not or cannot do this or it wouldn't be so slow on the Linux cluster?

Cheers, Ben.

RE: MPI clarification

I did use MPI with 16 nodes on a single machine. Even though the machine has only 8 cores, the processors have hyperthreading so I thought I would be able to get good performance out of 16 nodes over 8 hyperthreading cores. I have no empirical evaluation of both cases, but 16 MPI nodes on a machine with 8 hyperthreading cores seemed to perform very well.

Re: MPI clarification

OK, good. This is exactly what I had understood, so I am not completely off... I did not know what hyperthreading meant, so I looked it up on wikipedia. Wow. This is interesting too. It is almost like having 16 processors. Or, to put it another way, the design of MacZilla make it really efficient at running 16 simultaneous threads.

Multicode + hyperthreading thing makes the shift to multithreaded programming even more relevant and useful. Programmers nowadays are in even more trouble than I thought..

Thanks for the post, joel.

MP and MPI

I did realize it was the same MPI that you can use on networked machines. But I did not know about MP, thanks for the pointer!

Re: MPI clarification

From what I read, the Woodcrest and Clovertown processors found in Mac Pros do not have hyperthreading, in which case you might find that 8 nodes is more efficient. Eight hyperthreaded cores should mean 16 logical processors, which should be easy enough to check.

Ben.