Amazon S3 for Your Scientific Data Storage
By drewmccormack at Thu, Apr 10 2008 1:40pm |
Amazon's S3 data storage web service has been quite a hit. For a small fee, Amazon will take your data off your hands and keep it safe. The O'Reilly site has an article on how you can use Amazon S3 from Perl. There are also libraries for other languages, like Python and Ruby. Services like S3 could potentially be of interest for scientific data storage, because they are cheap and safe. Is anyone out there already using S3 for their scientific data?



Comments
CPAS on S3
We have a project which includes developing and deploying CPAS on EC2 and S3. CPAS is a proteomics solution for managing tandem mass spectrometry analyses data. For information on CPAS see https://www.labkey.org. If you are interested in testing CPAS hosted on S3 contact Insilicos www.insilicos.com.
Send data to a 3rd party?
Send data to a 3rd party? You must be a scientist and not in IT. :-)
This would probably violate the terms of most grants.
Patrick Gallagher
Emory College Computing Support
http://patgmac.blogspot.com
bandwidth limitation
There are some things that will make me feel a bit queasy, like trusting someone to oversee my data. It's amazing how often our data is currently being entrusted to companies that we're not even aware of. I recently got a notice in the mail saying that my personal information may have been on the hard drive of some stolen computers from some random company that was associated with my student loan company.
Anyway, that long aside ... aside, the only thing about web services for data that I don't think is really that useful is the bandwidth limitation. At my institution, our upload speed doesn't exceed 1 MB/s peak. This means that large data backups will take a considerably longer time than even an equivalent mixed media backup locally. With the price of hard drives where it is currently, the bandwidth issue would have to significantly improve for me to consider web storage.
3rd Party
Hi Patrick,
Most of my scientific data is generated on the clusters of private companies working for the government. They also take care of backing the data up on tape robots.
So I take your point about passing data off to 3rd parties, but where do you keep it then. On hard disk's in your office? What if there is a fire?
I don't think public researchers have much to fear. Companies are another matter perhaps...
Drew
---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org