ceph - http://docs.ceph.com/docs/master/radosgw/s3/
swift - https://wiki.openstack.org/wiki/Swift/APIFeatureComparison#A...
manta - https://www.joyent.com/manta
SWIFT is indeed analogous to S3.
Unfortunately, Basho has been so successful with their TSDB and KV products that they have basically put S2 on maintenance mode. They are still "supporting" it, but no new features. I was hoping this Minio tool could do something similar, but with a single daemon is a single point of failure. Unacceptable for serious deployments.
Everyone's got their s3 of choice, always good to have more options on the table.
Smart.
Its a good strategy but not one that I see being exercised frequently enough.
There pretty much isn't anything you can do to improve their internet connections as cables to remote places are always getting dug up with week+ times to repair so you need something that can run locally for long periods. Ships have a different problem with very slow speeds that effectively means you can only transmit the absolute minimum off the ship when its out as sea (when they are at port they typically have normal internet connections to bulk dump data off on).
Easy to setup with Vagrant, and linking / sharing the Minio shared folder to the host makes it quite convenient to quickly check the files without going to the UI [2].
[1] - https://github.com/jubos/fake-s3
[2] - It stores the files as-is in the local filesystem (files in folders, unchanged), as opposed to having it 'wrapped' like Fake S3 does.
Proprietary forks are OK with us. It will be too expensive to maintain branches of their own and catch up with the upstream.
Currently Infinit.sh has my attention the most, but it's quite young still.
edit: https://news.ycombinator.com/item?id=12125344 this thread seems to be talking about what i want. With that said, i'm not yet sure if `mc mirror` supports Backblaze, as that (per price point) is my prime need
We[2][3] tend to agree with that.
One reason it might not work for you is that we are an order of magnitude more expensive than B2, so perhaps that's a better bet for you. On the other hand, $7.20 per year for our smallest borg account is almost as close to zero as your B2 minimum order would be, so ... who knows.
One upside of choosing our service is that you can choose your location (US, Zurich, HK, etc.)
[1] https://www.stavros.io/posts/holy-grail-backups/
[2] rsync.net
> If you're not sure what this means, our product is Not For You.
Please don't do that, its childish and unimpressive.
Currently minio supports
- pure FS backend with single disk - pure Erasure coded backend with multiple disks on single node (like ZFS)
For more information you can read here - https://docs.minio.io/docs/minio-erasure-code-quickstart-gui...
We do not do any sort of replication erasure code handles disk failures and we also implement transparent bit-rot protection as well.
To replicate one setup to many you can use 'mc mirror -w' which would watch on events and do continuous replication.
Relevant docs can be found here
https://docs.minio.io/docs/minio-client-complete-guide#mirro...
- Spin up a bunch of droplets on DigitalOcean, because I want reliability, etc.
- What's the best way to share drive space across these to create a single Minio storage volume, so if one DO node goes away I don't lose my stuff?
The minio available today for production use can export single disk or aggregate multiple disks on the same machine using erasure coding.
For this, if you want backup you can use github.com/minio/mc tool to mirror, more help here https://docs.minio.io/docs/minio-client-complete-guide#mirro...
> A volume may only be attached to one Droplet at a time. However, up to five volumes can be attached to a single Droplet.
Looks like you would have to roll your own solution.
How does something like this behave with really large files. Video files in 100s of gigabytes, for example. I'm asking because if one could set up a resilient online (online as in available) storage with fat pipes like this it could be used as a platform to build a centralized video hub for editing. It's another question how much sense would it make over a filesystem though.
I'm not sure about the point either. Maybe if you embedded a small player it would be zoomed out and fullscreen would show the native style.
Our current stable version can export single disk or multiple disks (using erasure coding providing protection against disk failures) As it is very easy to get started with (single binary, thanks to Go) people find it attractive for dev/test environments.
To replicate for HA (even for the single server version), use "mc mirror -watch SOURCE TARGET" command to pair them up. If you have multiple drives (JBOD), you can eliminate RAID or ZFS and use Minio's erasure code to pool them up. Distributed version is also in dev/testing at the moment. It should be out in a month.
https://github.com/restic/restic/blob/master/run_integration...
also, failure and backup modes.
* Minio erasure code setup also provides protection against "bit-rot".
As you can tell from the other comments, there's plenty of alternatives to pick from, and if you're going to dive in to the code yourself the language may be a deciding factors.
Edit: I'm going to elaborate, because people are calling me naïve. Full disclosure: I work at a cloud provider on a storage team.
For most people and applications, you simply don't get good value for your money by using filesystems and hard drives directly. We've tried to make things more reliable and durable with backup policies, RAID, and ZFS but the fact is all of these things come with operational and capital expenditures that compare unfavorably with common cloud storage options. There are some good technical reasons why cloud storage is better: basically technologies like RAID and ZFS are attempts to make each layer of your storage stack completely durable and available, but this approach is not competitive with the way cloud storage is typically implemented, which is to build a reliable distributed service on top of cheap hardware. Consider RAID 1, for example. This gives you N+1 redundancy at the drive level for an individual computer. This worked in the 1990s but drives are bigger and RAID failure modes suck with larger drives—it's worrying how common it is to see errors when rebuilding a degraded RAID array, and at N+1 that means that your data is lost from that computer. Essentially, with modern drive sizes (4+ TB seems pretty common these days) a RAID 1 array should always be considered N+0 instead of N+1.
Cloud storage is implemented much more intelligently. If you have distributed storage, you can simply spread files across computers in different DCs and use error correction codes to increase the redundancy. You can get more nines of durability and availability for less money this way. You end up with something like 33% overhead on disk space instead of 300% overhead, and you're also off the hook for a big chunk of your capacity planning and various other operational expenditures.
These days I would consider starting from "this file is in cloud storage, and we have a local cache" rather than "this file is in local storage, but we have a cloud backup". That's really all I'm saying.
It also won't always be competitive. Sometimes cloud storage is more expensive than regular filesystems, depending on how you're using it. If you're a big company you can sometimes amortize the costs of doing it yourself better. That's all I mean by "default"—I'm going to put my data in cloud storage unless I have a compelling reason to store it some other way.
Local filesystems and/or volume managers won't go away anytime soon. Internally, a system like S3 needs a unified access to the storage, which is provided by the filesystem.
I think we are going to see the emergence of new filesystems that are much simpler in design compared to ZFS (as reliability is left to an upper layer in the stack) for use in the Cloud. Somewhat similar to the trend toward lightweight OSes built for the cloud (CoreOS, Project Atomic, etc.). Many features that were in the realm of the operating system are now delegated to upper layers in the stack.
I may sound like I'm playing dumb, but I'm really struggling to see whats compelling about this in its current state aside from the fact that its one tool as opposed to a RAID + filesystem + something to make the data available.
As for distributed object storages, I would expect them to work great for video editing, since they can saturate any link given enough servers. But not out of the box, you would need a client designed for it, splitting files into chunks in parallel, etc.
Currently available minio is resilient to disk failures using erasure coding (similar to RAID)
I really think this could be useful to build something like Avid Interplay on top of.
http://www.slideshare.net/rakutentech/scaling-and-high-perfo...
Haha, you guys are awesome! You've totally figured it out. Stay awesome!
Relevant docs here https://docs.minio.io/docs/using-pre-signed-urls-to-download...
```
package main
import minio "github.com/minio/minio/cmd"
func main() {
go minio.Main()
... do your stuff ...
}```
Regular rsync.net accounts have full, unlimited support provided by a US-based engineer. As in, an honest to god unix engineer. Sometimes, but rarely, me.
I've used the official JS aws-sdk and boto3 as clients.
- The restart times of the Riak process ranged from 10 minutes to 3+ hours, during which time the cluster was basically useless. Not a single suggestion from support sped up this process.
- Every single night from 0800 - 0900 UTC, the cluster would grind to a halt (as measured by canaries measuring upload/download cycle times). This continued even after we migrated all customer data and traffic off of the cluster.
- Riak-CS ships with garbage collection disabled despite it being a critical feature. I inherited a cluster that had been run for some months without gc enabled. Turning it on caused the cluster to catastrophically fail. Basho Support, over a period of close to a year, was unable to find a single solution that would get our cluster back to health. If our cluster were a house on a show like Hoarders, the garbage in it would be considered load bearing.
- We attempted to upgrade our way out of our un-garbage-collect-able mess, but the transfer crashed. Every. Single. Time.
- Even had transfers worked, all of the bloated manifests have to be copied in their entirety, so you can't gc the incoming data on the new cluster.
- Even while babying the cluster, it would become unusable at least once a month, requiring a restart of all nodes. The slowest node took 3+ hours to start, followed by another 3+ hours of transferring data. This was 6+ hours of system downtime every month.
- During these monthly episodes, we attempted to engage with support and try to debug the processes (we were a team of seasoned Erlang developers). We could attach Observer and/or use the REPL to grab stats, but not a single support resource was able or willing to engage.
- For giggles, once we had migrated all users off of the cluster, we attempted to let gc run. It never completed. Not once. We let this go on for a few months before nuking the entire cluster.
Now, I absolutely realize that we got ourselves into that mess by running the cluster without gc for an extended period. But in the grand scheme of things, this cluster wasn't storing a very large amount of data -- tens of TB spread over tens of millions of objects. Having the cluster get into a state where gc can never run and where this causes snowballing instability is unacceptable.
We switched to Ceph. We've never looked back.
If you use riak in production you probably do want their (Basho) support. Their product when works, works great, but when there is a problem it's a bit hard to troubleshoot it without knowing erlang and being familiar with riak's source code.
That gives it some credibility. Especially ability to deal with tough challenges they'll encounter in this domain. Helps to have encountered most already. ;) I'll look at it in more detail later on. I'm also more interested in it if it has many-node, HA/SSI support. What's ETA on that feature?
We are also about to finish distributed server functionality you can track the work here https://github.com/minio/minio/tree/distributed
Occasionally, there are laws that also mandate certain controls that cloud providers in general did not have. That is also becoming rarer as time goes on.
There are cons, don't get me wrong, but to somehow claim that AWS is the end all be all of hosting choices is demonstrably wrong.
For example - You want to develop a financial exchange with a 100 microsecond average response time, peaks of 10Gbit traffic, and 5 9s of uptime. Do you host that on AWS? I wouldn't.
Another example - If I were a medium+ sized company (say 20+ employees), I would want my source control 100% on prem (excluding backup). Internet connections are too flakey, and Github gets DDOSed too often. I could not stake my entire business on github.
It's not the primary reason for On Premises, but it's one less thing to worry about: "our tech team has full control, instead of yet another company having some control that we can't see"
I think GlusterFS (battle-proof, but file-wise and assumes an administrator with access to everything) assumes or infinit.sh (robust ACL, but young, not open-source) better addresses those use-cases.
Also, when adding files yourself, those files become pinned by the default. Getting files won't pin automatically however. You have some more information about pinning here: https://github.com/ipfs/examples/tree/master/examples/pinnin...
https://github.com/ncw/rclone/issues/219#issuecomment-239695...
a) have a very competent dev and ops team b) have a business where you are the provider of an SLA.
For many companies with 5K+ employees, they are already distributed, already have multiple data centers, have workers all over the globe, and, when they are not primarily in the IT delivery business, tend to have IT departments that have limited budgets, lack of training and poor organizational awareness.
This leads to poor security practice, poor cost analysis, and long/nonexistent upgrade cycles on many behind-the-scenes workloads.
You are of course right, there are many diverse reasons to be on-prem, but at some point, many of those reasons go away with sufficient size and differing priorities. Things change when a company's business involves the consumption of IT services, rather than the delivery of IT services.
I can understand why on-premises git is better in some ways.
But you're overstating the frequency of github outages.
And it doesn't exactly kill the business when it's down for an hour.
Git is distributed after all.
It's a little of topic, but I don't think it's overstated. People complain about the stability of something like HipChat all the time, but GitHub is unavailable more often, in our area at least.
GitHub is huge target, and outages are extremely disruptive for companies.