The Xgrid Tutorials (Part IV): Submit Jobs with Ruby
In this new (small) installment of the Xgrid tutorial series, we will explore how to use Ruby for flexible job submission. The techniques presented here will probably require a little more technical knowledge than using a graphical tool like GridStuffer. However, you may need more flexibility or have other constraints that make GridStuffer useless (for instance, maybe you only have Xgrid access in a command-line environement), and you may want something more user-friendly than direct calls to the xgrid command-line.
Just like the previous tutorials, I will assume you have downloaded and installed fasta and the DNA pieces we used in the first installment, because we will again use the same example. Also, I will assume you still remember your friend the Terminal, and your buddy Fasta.
Carving large jobs with Ruby
If you are a Ruby fan (the programming language), you will be happy to know that Tetsuya Suzuki developed a Ruby-based interface to Xgrid, called rxgrid. A big advantage of a script-based solution is that you instantly get a highly flexible tool, that you can fine-tune to meet all your goals and constraints.
All that rxgrid needs is a file defined using a "Ruby-based Batch Language for Xgrid", or RuBLX file. The file is basically Ruby, with 3 additional keywords used for Job, Task and File definitions. The full description of this interface is available on the rxgrid web site
For this tutorial, I will just provide an example based on... surprise, surprise... our buddy Fasta! Here is the RuBLX file we could write:
# Array containing all the chromosomes to scan
chromosomes = (1..21).to_a.push('X', 'Y')
# Array to keep the list of task names
tasks = [ ]
# Create one task per chromosome
chromosomes.each do |c|
task_name = "chr" + c.to_s
tasks.push(task_name)
# Next line is recognized by rxgrid as the definition of a task
task task_name do |t|
t.command = "/Users/Shared/fasta-tutorial/fasta"
t.arguments = [ '-q',
'/Users/Shared/fasta-tutorial/magic-worm-gene.seq',
'/Users/Shared/fasta-tutorial/chromosome' + c.to_s + '.fa' ]
end
end
# Define only one job, that contains all the tasks
# Next line is recognized by rxgrid as the definition of a job
job "magic-worm-gene" do |j|
j.tasks = tasks.dup
end
Save this file on your Desktop and name it 'job-fasta.rb'.
Then, we would use the rxgrid command (which is actually a Ruby script) to submit our multi-task job as follows, in the Terminal:
rxgrid -h localhost -job batch job-fasta.rb
From that point, the job is on your Xgrid controller, and is just like any other Xgrid job. You can get its specifications, monitor its progress in Xgrid Admin, and retrieve the results using the xgrid command or any of the other techniques described later in this tutorial.
Harvesting the jewels
Alternatively, to retrieve your results, you could use rxgrid again, using the following command:
rxgrid -h localhost -job results -map job-fasta_map.csv -id magic-worm-gene
You might wonder where the file 'job-fasta_map.csv' comes from. This is a "map file"
generated by rxgrid at the submission steps, that automatically created and named based on the ruby file 'job-fasta.rb'. The file keeps track of the correspondence between the job ID and the symbolic name 'magic-worm-gene' that we used in the submission file. If we had defined several jobs in 'job-fasta.rb' (for instance we could have iterated over several worm genes), we could have retrieved all the jobs at once with the following command:
rxgrid -h localhost -job results -map job-fasta_map.csv
The use of symbolic names for jobs and the ability to retrieve a bunch of results with just one command will likely make your life easier...
Python
I am unfortunately less familiar with Python (in fact, I only really know Perl), but there is an alternative to rxgrid available to Python fans, called PyXG. Be sure to check it out, it has been around for a while, and is well-tested.


