Redundant Array of Independent Clouds(tahoe-lafs.org) |
Redundant Array of Independent Clouds(tahoe-lafs.org) |
Edit: maybe Random Array of In-dependent Clouds (RAIdC)?
* how many of the supported options boil down to Amazon S3?
* is this the new "upload their important stuff on ftp, and let the rest of the world mirror it"? https://groups.google.com/group/linux.dev.kernel/msg/76ae734...
* whoever put together this newsletter is clearly doing a great job for that community
That's a really good question. Diego's experiment used:
• memopal: I don't know if it uses S3
• SugarSync: I don't know if it uses S3
• syncplicity: I don't know if it uses S3
• googledrive: not using S3
• UbuntuOne: yes, it uses S3
• DropBox: yes, it uses S3
By the way, my startup, Least Authority Enterprises is working on a future product which also goes by the codename "Redundant Array of Independent Clouds". Our project is no relation to Diego Righi's experiment, except perhaps we inspired him by talking about it.
We've received a research grant from DARPA to implement it. The backends we're developing for are all guaranteed to be separate backends from each other -- none of them turn out to be front-ends for another one!
• Amazon S3
• OpenStack Swift/Rackspace Cloud Files
• Microsoft Azure Blob Storage
• Google Storage for Developers
It's too bad that you weren't aware of, or didn't cite, Tahoe-LAFS when you wrote that paper! Even though you used my zfec library, which I created (by copying Luigi Rizzo's feclib) for Tahoe-LAFS's use. Heh heh heh.
I tried to get Tahoe-LAFS's existence registered in the official academic research world by publishing this: http://scholar.google.com/scholar?cites=7212771373747133487&... but it didn't really work. Most of the subsequent research that probably should have cited Tahoe-LAFS still didn't.
Perhaps that 5-page paper was too telegraphic to communicate a lot of the important properties. For example, it does not spell out the fact that Tahoe-LAFS includes a kind of proof-of-storage/proof-of-retrievability protocol. Also, perhaps, I chose too obscure of a venue to publish it in. I'm not sure.
For your reading pleasure here is a big rant by me on my blog, whining that Tahoe-LAFS is deserving of more attention than HAIL (which you do cite):
https://lafsgateway.zooko.com/uri/URI:DIR2-RO:d73ap7mtjvv7y6...
"It is frustrating to me that the authors of HAIL are apparently unaware of Tahoe-LAFS, which elegantly solves most of the problems that they set out to solve, which is open source, and which was deployed to the public, storing millions of files, and summarized in a peer-reviewed workshop paper before the HAIL paper was published."
Time was Google providing "unlimited" gmail storage and fuse gmail-fs was just released. I looked for another fuse fs like a hotmail-fs but did not push hard on it. I could not find and let my idea die.
It was hard to do and time consuming, but marginal gain would be small. Also I'm a f lazy system administrator. I hate coding :)
This should just go toward reinforcing our understanding that ideas are the easy part. Follow-through is the hard part.
My idea for backup would use something like DIBS ( http://www.mit.edu/~emin/source_code/dibs/) but instead of peers, use many free salami slice sizes of storage from cloud/hosting platforms.
Other than that, interesting project.
my understanding is that tahoe-lafs is meant to be used as a live filesystem. how does the redundancy configuration affect latency? i would guess a cloned volume ("RAID 1") would be faster than a distributed volume (e.g. "RAID 5" or "RAID 6").
https://tahoe-lafs.org/trac/tahoe-lafs/attachment/wiki/Perfo...
The different colors of the samples there are for three different settings of how many shares the file was erasure-coded (RAIDed) into: 3, 30, or 60 shares. This type of file ("MDMF" type) seems to go about as fast at any of those three levels of distribution, but this older and more common type -- https://tahoe-lafs.org/trac/tahoe-lafs/attachment/wiki/Perfo... -- ("CHK" type, which is for immutable files) performs much worse for larger levels of distribution. There's probably just some dumb bug which causes this slowdown. This page has some ideas as to what's causing it: https://tahoe-lafs.org/trac/tahoe-lafs/wiki/Performance/Sep2...