OpenMacGrid 2.0 : What's next and what do you think?

We started the OpenMacGrid project almost a year and a half ago. 'We' = MacResearch, of course. Thanks to all your contributions, OpenMacGrid has gathered hundreds of Macs, and has been providing several hundred GHz of computing power to the various scientific projects submitted to us. OpenMacGrid is a relatively small grid, with a definitely small budget, close to $0 actually. Thanks to the Xgrid magic, in particular with Xgrid 2.0 and Leopard, the administration of the OpenMacGrid controller has required a relatively small amount of time (which is good, because for all of us, this is a side project of a side activity). But we are proud that it has been able to achieve quite a lot, and has been running several scientific projects of very diverse nature. Speaking of Nature, OpenMacGrid has even contributed to a recent publication in a high-profile journal, thanks to Ben Bond-Lamberty efforts.

Could it be even better? Yes. Could we do more? Yes. At least, we can try... Read more below...

We have several ideas that have been just ideas for a little while, but at some point, we would like some of these ideas to become realities. For marketing purpose, let's call this "OpenMacGrid 2.0". And to help make that a reality, and to help pushing it forward, we will make that a public discussion, get your feedback and steal your ideas.

In this first post, I will just focus on one of these ideas. One thing we noticed is that the grid is not used all the time, and has actually long periods of inactivity. This is not really a bad thing, since this means it is more readily available to the scientists interested in using it. But seeing all this untapped power is also a bit frustrating.

One of the reason the grid may not be used at its highest is the barrier to entry of using Xgrid. One has to get familiar with Xgrid, has to be familiar with the command-line, and then has to be familiar with whatever program is used for his/her computations. So we discussed the possibility of providing a built-in service for programs that are (1) computation-intensive, (2) easily amenable to parallelization, (3) not network-bound –small data/computation ratio–, (4) of interest to a large audience of scientists. That's already pretty restrictive, but we also want to only pick a project that we can can manage with limited human resources, and that some of us are familiar with.

Our current pick is in silico ligand docking. The idea is to scan through libraries of small molecule drugs and see if any of them is predicted to bind a protein of interest. This protein of interest would of course be something involved in a human disease, and the drug hits could be used as the basis for new drug development. In addition, the hits can be valuable research tools, for in vitro studies or animal experiments.

Interested scientists could submit their protein(s) of choice, some simple docking parameters, and OpenMacGrid 2.0 would do the rest. In addition to the protein submitted by users, we could also launch more systematic jobs on all the proteins for which a crystal structure exists, and the results of this would be freely available, and stored on MacResearch servers.

This type of computations appears to fit pretty well the 4 goals outlined above. In addition, it has already been done with Xgrid and with OpenMacGrid, with Rob Yang's project on OMG. Rob has been and hopefully will be one of our special consultant(s) on this project going forward. The current computation pipeline would be something like this: Autodock for the docking software, Maybridge + NCI-diversity + Zinc for the ligand librairies, a web form for the submissions, and, of course, OpenMacGrid for the computations.

Thus, we already have an idea of what we want to start with. Sorry for non-biologists, but well, the 'large audience' of goal number 4 can only be so large. Of course, OpenMacGrid will still be accessible to other projects for scientists that want to talk directly to Xgrid, and we would still love to hear about other ideas for "built-in" OpenMacGrid computations. Also, if you know of other ligand docking services, tell us how ours could be different in interesting ways. If you have more feedback on how we can implement things, please post in the comments. Then, in the next post (1-2 weeks from now, hopefully), the goal will be to get more specific on the details of the submission process and access to the results.