The Xgrid Tutorials (Part I): Xgrid Basics

Author: Charles Parnot
Website: Xgrid@Stanford

At MacResearch we have some big plans for Xgrid in the coming months. Stay tuned for announcements. But to warm you up, we have a real coup: Charles Parnot, creator of one of the largest Xgrids on the planet, has agreed to write a series of tutorials for MacResearch. To kick off the series, Charles covers basic command line use. [MR]

You have a few thousands Xserve in your basement and you don’t know what to do with them. Or more realistically, there are a bunch of macs in your lab, old and new, big and small, desktop and laptops, idle all night and most of the day. And you are thinking: “What a waste! All the things we could do with these machines!”. Well, waste no more! As you may know, Apple has the solution for you, they call it Xgrid.

In this tutorial, I will try to guide you through all the steps necessary to get something “real” running with Xgrid. I decided to use as an example Fasta, a program used by biologists to identify DNA sequences. This is quite a good fit for Xgrid: relatively little data, long computations, easily parallelizable. I plan to have at least 3 installements for this tutorial, of increasing complexity. This first installment is aimed at scientists not necessarily familiar with Xgrid or with parallel computation, and will include a quick introduction to the command-line environment. Hopefully, it won’t be too boring for scientists already very familiar with some of these concepts, but eager to learn more about Xgrid too.

Think Xgrid

Before learning how to use Xgrid, you should determine if it even makes sense to use it and if you will actually be able to take advantage of it. On the xgrid-users mailing list, many ask if Xgrid can boost applications like iMovie, Adobe Photoshop or Final Cut Pro. The answer, in short, is “No”. Unfortunately, Xgrid is not yet the silver bullet that can magically take an existing program such as iMovie, cut it in slices and run in on all the macs it can find in the neighborhood. The “cut in slices” part is still your job, or in the case of iMovie, it would be iMovie’s job. It is difficult to give a simple answer to the simple question “Can Xgrid help to run my calculations?”. It depends not only on the type of tasks you want to run but also on the scale of the project (how many tasks?). In this section, I will show you how to distinguish between what Xgrid can and can’t do.

Let’s start with a non-computational analogy, with something that scientists do a lot: writing papers. If you work alone on on a paper, maybe it will take 1 week to write 10 pages (working really hard!). You could ask a graduate student to write it, but that would probably take 2 weeks. So here is the solution: graduate students are cheap, just get 10 graduate students to work on it, one page each, it will take 2 weeks / 10 = 1.4 days. Problem solved! Problem solved?… Well, I hope you have realized 2 things: (1) you should have more respect for graduate students; (2) it won’t work. Writing papers is not amenable to parallelization. Xgrid is not good at the computational equivalent of writing papers, which requires tight integration between the different parts of the job [1].

Next, imagine you are a world-wide expert on nicotinic receptors and you have to write a 150 page review. Before writing it, you need the full-text versions of all the relevant publications on the subject, but this is going to take ages because there are 800 of them, buried in various journals scattered throughout the library. Well, the graduate students can help! If that task takes 2 weeks for one person, it might take just 1 or 2 days if 10 graduate students work on it. Here, I hope you have realized 2 things: (1) you should still have more respect for graduate students, (2) it will work (maybe by adding free pizza too). Making photocopies of articles at the library is amenable to parallelization. Xgrid is very good at the computational equivalent of sending a horde of graduate students in a library.

If we go back to “real” computational tasks, here are some examples of calculations where Xgrid can help, because they can be broken down in small independent tasks that can each be computed in separate machines, in parallel:

  • Animation movies (think Pixar!), where each task is the calculation for just one frame of the movie (the image corresponding to 1/24th of a second)
  • Drawing Mandelbrot fractals, where each task is a small piece of the full fractal
  • Analyzing radio signal from space for signs of intelligence, where each task deals with the signal from a small amount of time or a small area of the sky (or both), also known as SETI@home

Mandelbrot fractals

Xgrid will be best at what are called “embarrassingly parallel” problems. You have to be able to divide your computation in dozens or hundreds of smaller tasks, where each task would take a few minutes to a few hours on one processor [2].

Here are now some examples of calculations that are not Xgrid-friendly:

  • Simulating the molecular movements of a protein, because you can only calculate the next step when the previous step is finished
  • Calculating the Fibonacci series, because each new value is dependent on the previous ones: u(n+2)=u(n+1)+u(n)
  • Compress an image into jpeg format, because it requires to process information about the whole image

However, even in the case of non-parallelizable tasks like above, Xgrid can be useful if you need to perform the same tasks over and over again with different conditions. For instance, to model the molecular dynamics of a protein (first example above), it is quite frequent to run several simulations with variable starting positions. This allows to explore the conformational space of the protein (see also [3]). It is trivial to run several independent simulations, one on each machine, a configuration where Xgrid can be useful. Similarly, while compressing one image might be hard to parallelize, compressing hundreds of images is easy to do in parallel on several machines. These examples show that Xgrid can be useful even for tasks that are not parallelizable to begin with. The whole project itself might be amenable to parallelization because you want to run many of these tasks.

One more thing… Before you can run a task on a computer, you need to start by sending some files: the program itself, but also the data to be processed. When the task is done, the results needs to be sent back too. In the example of image compressing, the compression algorithm is usually relatively fast. If transfering the image files back and forth takes as much time as running the program itself, you have not gained anything. This additional time actually needed to run a job, and wasted on data transfer, is also called “network overhead”. The network overhead is smaller for molecular dynamics simulation: the amount of data needed to describe a protein structure is relatively small, while the simulation itself is processor-intensive and takes a while.

In conclusion, Xgrid will be most useful if:

  • The calculations are naturally amenable to parallelization (e.g. fractals)
  • And/or you have to repeat the same calculations many times (e.g. animation movies)
  • The network overhead is not too big

In all cases, you will need at some point to have a good understanding of the computations you want to perform and of the programs you want to run with Xgrid. There is no way around that.

Start Xgrid

I won’t spend too much time explaining the Xgrid architecture or how to get Xgrid running, as there is a wonderful tutorial to get you started. I will try to cover the basics here so you can get started quickly. The Xgrid terminology is very simple:

  • the “client” sends jobs to the “controller”
  • the “controller” queues the jobs and sends them to the “agent”
  • the “agent” runs the job

Of course, the whole point of Xgrid is to have several agents, all working in parallel, each on a different job. Similarly, there can be several clients. For instance several members of your lab could submit jobs, each from their own computer. And this is basically all there is to understand about Xgrid! Now, let’s get these different pieces in place on your machine.

Xgrid Architecture

If you have already an Xgrid controller running with agents connected, you don’t need my help here. You can just skip to the next section. But if you have never used Xgrid before, I want to make the setup as simple as possible, so you can run the tutorial now and build a full-fledge Xgrid solution later. To get started immediately, download this package. After unzipping, you should have a folder with 3 applications in it: ‘xgrid-start’,’xgrid-stop’ and ‘xgrid-clean’.

Downloading xgrid_setup

All 3 will behave the same when double-clicked: a window will open in Terminal and you will need to enter your administrator password before the automagical script gets executed.

xgrid-start Running in Terminal

The first application ‘xgrid-start’ will start a controller and an agent on the machine you are using. The second application ‘xgrid-stop’ will stop them. For security reason, it is highly recommanded to run ‘xgrid-stop’ when done with the tutorial. The third one ‘xgrid-clean’ will give you a fresh start, with a brand new controller that don’t remember any of the previous jobs or agents it has seen (some people like to clean things).

To follow this tutorial, simply run ‘xgrid-start’ and you can then use your machine as the client, controller and agent, all three at the same time! It works just fine for testing purposes, even if it sounds a bit silly: your computer submits a job to itself, then it decides to assign that job to itself, because there is no one else around.

Finally, to check visually how things are going, I strongly recommand you install the free Xgrid Admin program developed by Apple. It will display all the agents and all the jobs attached to your controller. It is very easy to use, so I will let you play with it on your own.

After you have a controller and an agent running, you can submit jobs as a “client”. As you realize, the client is the most interesting and creative part: you, the scientist! The rest of this tutorial will be all about understanding the client.

Your friend the Terminal

Before we talk more about Xgrid, we need to talk about the Terminal and the “command-line environment”. For some, it might be scary, and if you are afraid of typing words instead of clicking buttons, now is the time to overcome that fear. There is nothing really hard about the command-line. All you do is type a command and the computer executes that command. When it is done, the computer patiently waits until you enter the next command.

Terminal.app icon

To get started, double-click the application “Terminal” (in /Applications/Utilities). A new window should open that welcomes you with this very interesting message (or something similar):

mycomputer:~ jdoe$

This is called the “prompt”, and it lets you know that the Terminal is waiting for you to type a command (for clarity, I won’t include the prompt in the examples). So let’s type a command! Drop your mouse, grab your keyboard, and try this (hit return to validate):

ls /Applications/Utilities

The Terminal will understand that you want to “list” the contents of your Utilities folder (“ls” is a shortcut for “list” because, you know, a four-letter word, this is too much typing). When execution of the command is finished, you get the prompt again. Next, you could abuse your new power and order:

say "hello"

I hope you find that last one fun! OK, I admit this is quite useless (somebody in the back said “pathetic”?). But the Terminal can in fact be extremely powerful (and the more curious may want more details). Every time you type a command, you actually start a program on your machine, just like you would start iTunes or Excel. Except the program is invisible! It runs in the background, without taking any space on your screen. The command we typed above did not take very long to complete: the program “say” was only running for a few instants. However, some other invisible program could run long and complicated calculations, save the results to disk and send you an email when done. You could even log out of the computer and the invisible program would keep running.

I hope you are now very excited about those faceless programs… You’d better be, because they are the only programs that Xgrid can run. You cannot ask Xgrid to open Safari or iMovie. Xgrid will only understand “ls /Applications/Utilities” or other more interesting commands.

Your buddy Fasta

Now that you are all excited about the Terminal, let me introduce the program that we will use with Xgrid. The name is “Fasta” [4]. The main purpose of Fasta is to identify in a large collection of sequences the pieces that are similar to a given sequence. You might want to think of Fasta as Google for DNA. For instance, you may have just found a mutation that makes worms able to solve second-degree equations. Then you wonder if there is a similar piece of DNA in humans. What do you do? You Google it? Errr… Nooo… You Fasta it! You run Fasta using the piece of worm DNA as the “query”, and the human genome as the “library”, and boom! You find the human gene for second-degree equation solving.

But before starting a revolution in genetics, let’s install all we need for this Xgrid/Fasta tutorial. I have prepared a little package for you. After downloading, move it to the folder called “Shared” that you will find in the “Users” folder [5], then double-click on it to extract all the files. You should now have a folder called “fasta-tutorial” that contains the “fasta” program (lowercase ‘f’), as well as all the human chromosomes [6] and several test sequences. The reason we put these files in this Shared folder is that Xgrid only has access to a limited number of places in the system, for security reasons, and “Shared” is one of them. If you have several agents, you will need to repeat that installation for each one of them (in a next installement, we will explore alternatives to such a tedious process; see also [7]).

Update: with Leopard 10.5, access to the filesystem has been considerably restricted, and the xgrid agent now runs in a 'sandbox', which reduces access to certain folders, including /Users/Shared. The tutorial would need to be updated to take that into account, for instance by using /tmp instead. In the meantime, sorry about the trouble!

Shared Folder contents

We are now ready to type our first fasta command using our friend the Terminal:

fasta magic-worm-gene.seq chromosomeY.fa

What this does is ask the computer to run the program “fasta” using “magic-worm-gene.seq” as the query, and “chromosomeY.fa” as the reference library [8]. In other words, look for the magic-worm-gene sequence in the Y chromosome. However, the above command returns an error message… Terminal, you are not my friend anymore! What happens is your computer is really dumb (not you). Even though you just put these files on your hard drive, and it should be obvious that the next thing you do is use these files, you still have to tell the Terminal where those files are. Instead of just saying “fasta” you should say “/Users/Shared/fasta-tutorial/fasta” so that your dumb computer knows where to look. The command you should write is then much longer, so make sure all the following is typed as one continous line with no carriage returns (it may appear as 2 lines on your web browser and in the Terminal):

/Users/Shared/fasta-tutorial/fasta \
-q /Users/Shared/fasta-tutorial/magic-worm-gene.seq /Users/Shared/fasta-tutorial/chromosomeY.fa

In this command, I also added a “-q”, which will prevent fasta from asking additional information that is irrelevant for this tutorial (see documentation via ftp). Because the command is quite long, it has been spread over two lines, and the '\' continuation character used, but you can also write it all on one line if you choose (without the '\', of course). If all is right, after you hit return, you first get some message that describes the query. Then the program starts scanning the Y chromosome and searching for something that looks like the piece of DNA you provided. While this happens, you do not get the prompt, and nothing happens on the screen. But your invisible buddy Fasta is running! When the calculation is actually finished, you get a bunch of interesting lines on your terminal display, that show you all the sequences similar to the query “magic-worm-gene.seq”, and then finally the prompt. You buddy Fasta is done, and your friend the Terminal is waiting for the next command.

Use Xgrid

If you look at the result above, the gene was not found in the Y chromosome, so we have to scan more of the human genome. It took several seconds to scan the Y chromosome, but this is the smallest chromosome (yet it makes half of the world quite different from the other half…). There are 22 more, and then the X chromosome. Scanning all of these would probably take a while. What if you wanted to also scan the rat genome, the worm genome, the rice genome and the drosophila genome? Or every sequence known out there? And what if you have more than one query? Maybe you want to compare every single worm gene to all the other genomes? This is going to take a while even on a Mac Pro Dual processor 3.0GHz Dual-Core Intel Xeon. What would you do if you had instead a room full of computers? To run the above query on every human chromosome, you could sit 24 graduate students in front of 24 computers, tell them to open the Terminal and have them type a different query on each machine:

machine1: fasta magic-worm-gene.seq chromosomeX.fa
machine2: fasta magic-worm-gene.seq chromosomeY.fa
machine3: fasta magic-worm-gene.seq chromosome1.fa
machine4: fasta magic-worm-gene.seq chromosome2.fa
machine5: fasta magic-worm-gene.seq chromosome3.fa
...
machine24: fasta magic-worm-gene.seq chromosome22.fa

As you will see, using Xgrid is very similar to this approach, but without the graduate students. You can just sit alone at your desk and type everything from your own machine. We are now going to act as an Xgrid client, by sending a job to our Xgrid controller. If you have setup your machine using the ‘xgrid-start’ provided with this tutorial, then the controller is actually the same machine as the client. In this case, the address of your controller is simply ‘localhost’ (it is the computer way of saying ‘me’ when talking about itself). In the Terminal, we will first tell xgrid about that:

export XGRID_CONTROLLER_HOSTNAME=localhost

Then, to submit the chromosome Y job, you would type the following command (again, be careful to type that with no carriage returns):

xgrid -job submit /Users/Shared/fasta-tutorial/fasta \
-q /Users/Shared/fasta-tutorial/magic-worm-gene.seq /Users/Shared/fasta-tutorial/chromosomeY.fa

There! You just did it! You sent an xgrid job!! If you look carefully, this is exactly the same command as we typed to start fasta, except it has an additional “xgrid -job submit” in front of it. Those few words “xgrid -job submit” tell xgrid to send whatever command follows to whatever agent is available. The command description is sent from the client to the controller, then to the first agent available, which will run the command exactly as written above. But you don’t have to care about what the controller and the agents are doing. As the client, you simply get a response from the xgrid command that looks like this:

{jobIdentifier = 231; }

Xgrid has acknowledged the submission, has put it in its queue and has given it an identifier that you can use to refer to that job later (‘231’ is just an example; the first identifier you will receive is ‘0’ and that number will go up with each subsequent submission). Before doing anything else with this job, let’s just submit another one immediately, this time scanning chromosomeX:

xgrid -job submit /Users/Shared/fasta-tutorial/fasta \
-q /Users/Shared/fasta-tutorial/magic-worm-gene.seq /Users/Shared/fasta-tutorial/chromosomeX.fa

And we get back the identifier for this job:

{jobIdentifier = 232; }

I encourage you to continue like this with the other 22 chromosomes (you are allowed to use copy and paste!). Pretty soon, you should have your controller quite busy, and probably busy for a while if you have only a few agents. This is where you should realize that Xgrid does actually 2 things: distribute the jobs; queue the jobs. The distribution is about splitting the workload between multiple processors. The queuing part makes sure a new job is started as soon as one finishes (or a new agent is available). While the distinction might seem purely academic, the queuing feature makes Xgrid useful even on a single-machine grid. If you have 200 tasks to run, you probably don’t want to start them all at once and let them compete for the processor resources. Xgrid will instead run them in succession, one after the other. This is the simplest “queuing” you can think of. To take full advantage of your computer, the controller will even decide to run in parallel 2 tasks if it finds that you have 2 processors (maybe your mac will run 8 tasks and use all its 8 cores). In other words, Xgrid can also help to efficiently schedule jobs on a single machine.

As you have just seen, submission is a breeze. Of course, the second part of the process is to retrieve the results. But first, you should check that a job is done. This will be apparent in the Xgrid Admin tool (in the Jobs tab). Or you can simply ask your friend Terminal:

xgrid -job attributes -id 231

which should return something like this:

{
    jobAttributes = {
        activeCPUPower = 0; 
        applicationIdentifier = "com.apple.xgrid.cli"; 
        dateNow = 2006-12-08 16:44:16 -0800; 
        dateStarted = 2006-12-08 12:46:46 -0800; 
        dateStopped = 2006-12-08 12:47:03 -0800; 
        dateSubmitted = 2006-12-08 12:42:05 -0800; 
        jobStatus = Finished; 
        name = "/Users/Shared/fasta-tutorial/fasta"; 
        percentDone = 100; 
        taskCount = 1; 
        undoneTaskCount = 0; 
    }; 
}

The important bit here is that jobstatus = Finished. When a job is finished, the agents sends the results back to the controller, and any trace of the job is then removed from the agent. But the controller will keep those results on disk, where they will stay until you specifically ask to delete the job. The results won’t be coming back automatically to the client (the client here is you playing with your friend Terminal). You have to ask the controller for the results. The magic incantation for that is quite simple. Here are 2 examples on how you could do it:

xgrid -job results -id 231
xgrid -job results -id 232 > ~/results-chromosomeX.txt

The first command will directly show you the results of job 231 in the Terminal, which is a bit overwhelming. The second command is more useful, in that it will save the results of job 232 to a file called “results-chromosomeX.txt” in your home folder (also known as “~”). In both cases, you simply provide the job identifier and xgrid knows which job you are talking about. The second command adds a terminal trick that transfers the output to a file instead of the display (this is what the ‘>’ is all about). After you repeat the above for all the chromosomes, you are ready to sift through the results and use your brains for real science again.

Conclusion

In this installment, I hope to have guided you through all the steps for a first-level understanding of Xgrid, starting with the very basic question of when and if you can really use Xgrid for your research, all the way to a first real-world example of how you could use Xgrid to get computations processed faster and automatically on a grid. Fasta makes for a great example of that, in that we were able to split a long computation into smaller pieces that can then easily be sent out on separate processors.

But you may already feel like you have done too many repetitive tasks that the computer should be able to do for you: loading all that data on the agents, submitting all those very similar jobs, retrieving the data, saving it in a convenient location as soon as a job is done. We will explore more of that in the next installements of this tutorial. You will also hear more about the problems you may encounter in maintaining an Xgrid cluster.


Notes

[1] Going back to the paper analogy, writing your manuscript is still going to be more effective if several people work on it: read-proof, format, write a piece of the introduction, write the methods, work on the figures,… However, this requires lots of synchronization and back-and-forth discussion, and can’t be called “embarassingly parallel”. The same is true of computation: when embarrassingly easy parallelization is not possible, there are still possibilities for accelerated computation if things are done in a tightly concerted manner. The code name for that would be MPI, a topic recently covered on MacResearch. I will deliberately ignore this technology in the rest of this tutorial.

[2] Embarrassingly parallel tasks have different granularities. In the case of animation movies, the smallest task you can go down to is the rendering of one frame, which can take several hours, and you can’t divide that task further without significant complications. If you have just one frame to calculate, it is not worth using Xgrid. Fortunately, animation movies need much more than one frame (at least 1440 frames per minute, and many more in you add motion blurring). In the case of a Mandelbrot fractal, you can divide your task as much as you want, because the calculation for each pixel is completely independent of all the other pixels.

[3] Some scientists in that field are interested in large conformational changes, such as those that happen during the folding process, when the protein shape changes radically. These events are “rare”, because they only happen every microsecond or so. Simulations only proceed in picosecond steps, which means it will take on average a billion step to see a change. On one single computer, you may have to run a simulation for decades or centuries before you see that kind of event. However, by randomizing the starting conformation, and running simulations on dozens of thousands on computers in parallel, the probability of these events is increased proportionally to the number of processors, and it becomes feasible to identify these transitions in a time much more compatible with the lifetime of a scientist. This is the idea behind folding@home (note that they don’t use Xgrid!).

[4] Pearson WR, Rapid and sensitive sequence comparison with FASTP and FASTA, Methods Enzymol. 183:63-98, 1990, Entrez Pubmed

[5] For the unix geeks: you could use instead some subdirectory in /usr or /var or some other invisible directory, and make the files world-readable

[6] Actually not the whole chromosomes, just the first 80,000 bases for each of them (only the Y chromosome is provided full-length in the tutorial package). The size of the genome is 3 billion bases and there are 4 possible bases at each position. The human genome would thus use aorund 500 MB, and that would be a little unfair for the MacResearch web servers. You are encouraged to download the full chromosomes, for instance at http://hgdownload.cse.ucsc.edu/goldenPath/hg17/chromosomes/, and run the tutorial with them.

[7] You may be wondering why we install these files on the agents and why we can not let Xgrid install it for us when it needs it. The reason is that Xgrid is not very good at reusing files: it does not do any “caching”. We could let Xgrid package the fasta programs and the human genome together with each job, but the whole thing would have to be uploaded again every time a new job is started. If you want to reuse files for several jobs, it is best to have them pre-installed on each agent, or set up your own cache system (this topic will be explored in a next installement).

[8] Note that the queries run in this tutorial are not something real biologists would normally do. In general, the librairies against which the query is run are not just one big stretch of DNA like a whole human chromosome, but are instead huge lists of genes or of other various piece of DNAs. For instance, the GenBank library contains more than 60 million records, for a total of more than 60 billion bases.

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks guys!

I'm glad you two are putting this together. I'm sure Drew's MacDevCenter tutorial has been a starting point for many of us, but it seems due for a refresh and extension. Will you be dealing with much of the XgridFoundation / GridEZ framework stuff in this series? I'm just starting to learn some Cocoa programming to try to build a more sophisticated job manager for our Xgrid here at MIT. Regards,

Adam

GridEZ

Hi Adam!
I am planning to get into GridEZ in Part III of the tutorial, with Part II more about GridStuffer, file caching techniques and general grid management tips and tricks.
If you are interested, I have an updated version of GridEZ.framework in the testing phase that has a MetaJob class that I think can be very useful and powerful. Let me know if you want the current (brittle) version of GridEZ to play with it :-)

charles

Framework

Hi Adam,
I am planning to talk about GridEZ in Part III of the tutorial, with Part II more about GridStuffer, file caching techniques and general grid management tips.
If you are interested, I have an updated version of GridEZ.framework in the testing phase that has a MetaJob class that I think can be very useful and powerful. Let me know if you want the current version of GridEZ to play with.

charles

NB: I am trying to post a response, but it gets marked as spam, so sorry if the post appears several times.

Sweet!

This is very nice; I look forward to further installments. Ah, where were you two weeks ago! I spent Christmas break setting up my Apple Workgroup Cluster and am in my second draft of the article I wish I'd found last month, haha. The Stanford web page, DM at MacDevCenter, WU-SL and a couple of unix books later, and all's mostly well, but lots of room for growth. Happy New Year.

excellent tutorial

This is perfect. Getting some medical scientists to understand the basics of xgrid has been a challenge but I anticipate this work will make the task much easier.

jg

[duplicate post]

[duplicate post]

pushing data out?

Nice tutorial, I look forward to the next two.

The main thing I am wondering, is whether Xgrid can push out blocks of data to the agents rather than simply issuing commands. Suppose I have a program that I'm going to want to run many times, but each time with a unique block of data (say, 1MB) -- I guess I could just pass Xgrid a URL for where the data can be found, and when the program itself runs on an agent then it downloads the data directly from my client machine, but it would seem that it would be neater to pass the data file to Xgrid and let it worry about servering it out to the agent(s).

Pushing data with Xgrid

Indeed, in the next installment, I will show some techniques to get data automatically downloaded to an agent, as needed. Unfortunately, Xgrid does not have that capability yet. All you can do is attach files to a job, which will be ALWAYS sent , even if they were sent before as part of a previous job.

Multiple independent steps

Hi, great tutorial! Thanks a lot. I wonder how one would proceed for a multi-step job with dependencies. As an example I'd like to render an image with povray using n independent jobs dispatched via xgrid, and once they are finished I'd like to combine the parts to the final picture. I guess I need to 1/ wait for all povray jobs to finish 2/ get their results 3/ run the compositing job. What's an elegant way to do this?

Povray

I am not sure it can be elegant in any way. Step 3 would probably be done outside of Xgrid, anyway, so basically, you have to wait until it is all done.

To submit a bunch of jobs and get the results back in an orderly fashion, you may want to consider GridStuffer, which might be as elegant as you may get with the current tools available. It would probably make step 2 much much easier, and likely step 1 as well. I will cover more of that in the next tutorial, and you may have a look at the recent entry in MacResearch:
http://www.macresearch.org/openmacgrid_is_easy_peasy

hope that helps!

charles

Thanks

This is a great tutorial and it helped me a lot.

But I've one question:
I understand that I have to install all the files on every agent for performance purpose.
But anyway how can I tell xgrid to install/send/use the files which are on the controller (in the users/shared folder) and accessible for the users. The agents are not directly accessible for the users, so they can't install it.

Because we have just a command line environment, Gridstuffer is not suitable for us.

Regards

Rene

File sending

Rene, I need to address your very relevant question in a separate tutorial, I have that in mind. There are actually 2 answers to the question: (1) you can use Xgrid but then files get sent every time with every job submission and Xgrid is not very good at sending laarge files (above 10-50 MB is too much, as least with Tiger), (2) you can cache files on /tmp or /Shared on the agents, by having the script you send do the upload from a file server, e.g. simply a web server could do.

For (1), you want to look at the -in option for xgrid CLI (see man page), and/or at the tutorial on the job specification format (http://www.macresearch.org/the_xgrid_tutorials_part_iii_running_batch_jobs).

hope that helps!

charles

Thank you for this tutorial

Thank you for this tutorial (and the followings) which helps me a lot.
Yet the command :
xgrid -job submit /Users/Shared/fasta-tutorial/fasta -q /Users/Shared/fasta-tutorial/magic-worm-gene.seq /Users/Shared/fasta-tutorial/chromosomeY.fa
gives no result. The xgrid 'results' command returns an empty line while the 'attributes' command doesn't report any error...
How could i see what goes wrong ? Regards
GV

Update: Guillaume has pointed out the sanbox issue, see below. The tutorial needs to be updated for 10.5

No results?

To Guillaume:

* first make sure the job did run: what do the attribute say? is the job 'finished'?
* second, you could check the stderr and stdout stream using the -so and -se flags, with a file name where to save those streams

No result

* the attribute say the job is finished,
* stderr and stdout streams are empty...
Xgrid works with a simple job like : '/bin/echo "Hello world"' using the terminal or GridStuffer.
Fasta works (out of Xgrid).
My operating system is Mac OS X v10.5 (leopard), could it be the origin of the problem ?

No results with Fasta on Leopard

Same problem as Guillaume. I've noticed that if I use relative paths, instead of full paths, xgrid works fine (even with fasta). I thought it was due to permission problems with the agents. However, if I use only my machine as agent and controller at the same time, I receive no results, no errors, nothing. Any clue?

Update: Guillaume has pointed out the sanbox issue, see below. The tutorial needs to be updated for 10.5

Xgrid in Mac OS X 10.5 Leopard

See haw Xgrid runs in Mac OS X 10.5 Leopard there :
http://lists.apple.com/faq/pub/xgrid_users/index.php?sid=16220&aktion=artikel&rubrik=001&id=26&lang=en

Editor's note: thanks for pointing that out, Guillaume! I will update the tutorial..

relative vs. absolute path

Could you please explain how you set up relative vs. absolute path on an XGrid batch file?

I'm using XGrid to test programs while I develop them, so it means I do not want to copy myself the new build to all the agents and controller every time I make a change. As far as I understand, it means I have to use relative path to have XGrid doing it for me. The Apple doc says "When a relative path is used, the executable and the input files or directories are copied to the agents".

I can not figure out what is wrong with this batch code:


{
jobSpecification = {
name = "My XGrid job";
taskSpecifications = {
task1 = {
command = "model";
};
};
};
}

It does not work. I suppose I have well written a relative path? (I have also tried ./model, and it fails as well)

This is strange because when I type


xgrid -job submit model

which should be the same, it works fine. When I do a shell script with the instruction ./model inside, it works as well. I do not understand why the same does not work with the batch file. Are relative paths not allowed in batch files?

Thanks for the help!!

G Chapron

relative path in a batch file [solved]

Hello again,

I have figured out how to have things working, so I will explain this below, as this might be of interest to some people.

The simple batch.plist does not send the file. We need to add an inputFiles to it. The way to do it is to run a simple xgrid simulation (i.e. not batch at all), like this:

xgrid -job submit model

This will return a job identifier.

Then we ask to get the plist file that corresponds to this just executed job. We do that by this command:

xgrid -job specification -id XXX ->batch.plist

where XXX is the number corresponding to the job identifier.

If we look at the plist file, we can see that it has well the inputFiles part:

{
jobSpecification = {
applicationIdentifier = "com.apple.xgrid.cli";
inputFiles = {
model1 = {
fileData = cefaedfe 07000000 03000000 02000000..... etc.... this is quite long
isExecutable = YES;
};
};
name = model1;
schedulerHints = {
0 = "Mac Pro";
};
submissionIdentifier = abc;
taskSpecifications = {
0 = {
arguments = (
);
command = model;
};
};
};
}

Then all what is needed is putting more tasks under the taskSpecifications, changing their name, and putting arguments.

The command

xgrid -job batch batch.plist

rightly runs the computation.

Guillaume

xgrid-start

Nice tutorial.

But I am curious about these 3 applications: xgrid-start, -stop and -clean. What do they do exactly? Could we start and stop xgrid with the same results using the terminal only?

Thanks,
jrui

xgrid wrappers

xgrid-start, -stop and -clean are just applescript wrappers around the terminal commands. You can open them in Script Editor to see what they do. The same results can definitely be obtained with the terminal only.

unavailable

my agents have a status of 'unavailable' despite setting:

Agent accepts tasks: always

One agent became available when idle. Accepted and completed the job.
But when I submit the same job again it just fails!?

any ideas?

Both machines running 10.5. One machine acting as agent controller and client.

Few Macs for medical use

jg - getting a scientist is the idea?

Xgrid will work on that basics? The link is about it?
A few Macs could do it?

medical device technologies

Please help

I'm new to xgrid, and I need to use it.
I made an application using java, and I want my application to run using xgrid.
Is it possible?

I tried this at terminal, but nothing happen. (It suppose to make output file)
xgrid -job submit java test

test is my java application (test.class)

How do I submit java application jobs in xgrid?

Help with Java

You may want to give the full path to the java app, /usr/bin/java

If you have more problems, I would recommand trying the apple mailing list, there has been quite a few questions about Java in the past, that brings with it a few issues related to sandboxing and restricted access to the window server, see xgrid-users@lists.apple.com