Lab Journal: Installing Sun Grid Engine on Xserve Cluster

Xserve ClusterAuthor: Marcus Hanwell
Website: http://blog.cryos.net/

[Editor's Note:] Marcus is currently working as a postdoc in the Hutchison group at the University of Pittsburgh. He offered this tutorial as part of the Lab Journal: Mac Only Lab series.

I recently started a postdoctoral position and one of my first tasks was to get Sun Grid Engine up and running on our new Xserve cluster. I downloaded the sge-6.1u2 darwin binaries from http://gridengine.sunsource.net/ and then began the installation. The installation scripts are very picky about DNS and hostnames.

In order to install Sun Grid Engine the first step is to install the qmaster. In this case the head node will act as the master and export the Sun Grid Engine installation directory to the nodes via NFS. OS X needed some prodding before it would behave as expected.

Don't use the automounter via LDAP, for whatever reason it does not work. I created a /Cluster directory and unpacked the Sun Grid Engine tarballs to the /Cluster/sge directory. Share the /Cluster directory on the head node via NFS, but be sure to take the tick out of the three boxes for NFS export and stop sharing via other methods. The path to the root SGE directory needs to be the same on all nodes.

Once exported issue the following commands on each node - I used KDE's Konsole application to send input to all sessions, or Apple Remote Desktop tool can also send the commands to each node. The commands should be run as root or preceded with sudo.

mkdir /Cluster
nicl . -create /mounts/head:\\/Cluster
nicl . -append /mounts/head:\\/Cluster type nfs
nicl . -append /mounts/head:\\/Cluster opts ""
nicl . -append /mounts/head:\\/Cluster dir /Cluster
kill -1 `cat /var/run/automount.pid`

This step tells the automounter about the share and causes it to reread the database in order to mount the share. The next steps are interactive and so Konsole's send input to all sessions was the most appropriate tool. (I didn't find a similar program for Mac OS X. Apple Remote Desktop can send scripts, but not in an interactive manner. So I used my Gentoo laptop with KDE.) The sge_qmaster and sge_execd services must be added to /etc/services. Add the following lines to /etc/services on the master and all compute nodes.

sge_qmaster 536/tcp
sge_execd 537/tcp

Due to the master node having two interfaces and using both an internal and an external hostname it is necessary to add the hostnames to the /etc/hosts file and perform some extra configuration on the qmaster. Add the following to the head nodes /etc/hosts file,

10.1.1.1 external external.dns.example
192.168.1.100 head head.cluster.private

Then run the following,

cd /Cluster/sge
export SGE_ROOT=/Cluster/sge
./install_qmaster

This script will interactively set up the qmaster. The node name is "external," other defaults can be accepted. Once complete it will probably complain a little about hostnames not matching. Classic spooling was chosen along with the standard scheduler. All nodes were added at this step but more can easily be added later.

Once complete a file in /Cluster/sge/external/common/host_aliases was created with the line,

head external

This simply tells SGE that the two hostnames refer to the same machine. The /Cluster/sge/external/common/sgemaster was executed to restart the master. Next each node must be configured. Log in to the node, become root and issue the following commands,

source /Cluster/sge/external/common/settings.sh
cd /Cluster/sge
./install_execd

Follow the prompts and accept the defaults. Be sure to define a local spool directory for each node, located in /Volumes/Scratch/sge. Once complete each node will be added to the grid and can be seen by issuing a qhost command on the head node.

qconf -as head

Adds the head node as a submit host. It is not necessary to log into any nodes to perform tasks as qsub will now submit tasks to nodes and monitor tasks for completion. You should also add some manager accounts - these are user accounts that have full access to the SGE.

qconf -am admin user123

The next step is to set up and configure OpenMPI and the parallel environment stuff. Issue the following command,

qconf -ap mpi

Then I used the following configuration for the parallel environment,

pe_name mpi
slots 64
user_lists NONE
xuser_lists NONE
start_proc_args /Cluster/sge/mpi/startmpi.sh $pe_hostfile
stop_proc_args /Cluster/sge/mpi/stopmpi.sh
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min

This allows jobs to be submitted to the grid using the following command where four processors are requested,

qsub -R y -pe mpi 4 test.sh

This job will reserve its resources and use four slots on a node. This can be modified in order to reserve two slots or just a single slot.

Comments

SGE

Yes, you could do this, or you could get the iNquiry package from BioTeam (bioteam.net) and save yourself a ton of installation trouble and troubleshooting time. All of my clusters at McGill University were set up with iNquiry, and I would recommend it without reservation.

iNquiry Cost

Yes, I've seen demos of iNquiry and it seems like a great system. Certainly if you're running biology or bioinformatics, it's also a great way to have a fully set-up cluster.

But the cost of iNquiry seems a bit pricey:
http://web.bioteam.net/metadot/index.pl?iid=2187

That's about $10k for a commercial license and $2k for an academic license. For that cost, I'm paying a significant fraction of the cost of another Xserve node (as an academic).

It's your money. I think the point of this tutorial is that you can set up SGE yourself.

Re: iNquiry Cost

lol, yes, you can also build a car yourself.....

SGE

I think this tutorial presents a valid alternative for people wishing to deploy a grid. Other commercial products may be right for you and your work. I like to know how things work and have the freedom to modify them to fit my needs, which is one of the reasons I got involved with open source projects in the first place. Setting up a grid in this way has worked well and has given me more experience with SGE and how it works methods of debugging it.

I hope that people find this tutorial useful. It is good to point out other alternatives as you have here.

LDAP

Don't use the automounter via LDAP, for whatever reason it does not work.

Does this mean that we can not use Workgroup manager to create user accounts across the cluster?

SGE and OS X

Hi folks,

Full disclosure -- I work for BioTeam which has been mentioned in previous comments. This comment is not about my employer though ...

In my spare time I run the http://gridengine.info blog and wiki site. Stuff I learn from personal experimentation and commercial consulting gigs often ends up posted there in howto or article form. Expect to see a significant amount of OS X 10.5 Leopard info to start appearing there because clustering under 10.5 is turning out to be quite a bit different than doing the same thing under 10.4.

One specific comment about this tutorial -- Grid Engine has long since been assigned official IANA service port numbers. If the SGE entries are not already in your /etc/services file (depends on the age of your OS) then please consider using the "officially" assigned TCP ports so that you don't clobber any other service:

sge_qmaster 6444/tcp # Grid Engine Qmaster Service
sge_execd 6445/tcp # Grid Engine Scheduler Service

Regards,
Chris