Amazon to Host Scientific Data Sets
Amazon announced that they will be offering free hosting for several popular scientific data sets. I think this is a genius move for Amazon as it eliminates the cost of uploading your big data sets to the Amazon system. With the data sets already staged on their S3 system, all you need to do is fire up any number of compute nodes in EC2 and perform the necessary computations on the data. Amazon will allow you to create your own, private snapshot of the public data that you can compute on and store in a personal Amazon EBS volume. The following scientific data sets will be available initially:
Biology
- Annotated Human Genome Data provided by ENSEMBL
Chemistry
- A 3D Version of the PubChem Library provided by Rajarshi Guha at Indiana University
- UGI Virtual Conformer Library provided by Rajarshi Guha at Indiana University. 80GB of data in SD format on conformers for 500,000 molecules that can be used for virtual screening
It seems that they are open to suggestions for additional data sets so please send them your suggestions.



Comments
Costs
Can anybody comment on the costs of using this facility?
The coolest thing is that
The coolest thing is that Amazon provides you free access to the datasets. You mount the dataset as a storage block to your EC2 instance and data transfer between those two is free. You only have to pay for the use of your EC2 instances. In other words, access is free, you only pay for the computer.
More info EC2, including pricing can be found here: http://aws.amazon.com/ec2/
To get a better idea you can use their calculator to estimate the cost of different scenarios:
http://calculator.s3.amazonaws.com/calc5.html
Mathematica's Curated data
It's worth noting in this context Mathematica's direct integration of an expanding set of curated data sets.
http://reference.wolfram.com/mathematica/guide/NewIn70ComputableData.html
and also see the link to "Computable Data" on
http://reference.wolfram.com/mathematica/guide/Mathematica.html
David Reiss
http://Scientificarts.com/worklife