Will Cloudflare R2 Win Customers from Amazon S3?(taloflow.ai) |
Will Cloudflare R2 Win Customers from Amazon S3?(taloflow.ai) |
I'm pretty sure R2 is what I was looking for to solve a specific problem.
I think it was the creator of Java that said something to the effect of "There are languages people hate, and languages people don't use", which I think is very applicable here!
After some bad experiences and reading about other people's bad experiences, I won't use AWS anymore for anything that's not yet big enough to have expensive lawyers and people monitoring expenses daily. So I'm very interested to know how Cloudflare compares on this front.
If Cloudflare really delivers R2 with free egress and a global CDN you'd have to be crazy not to switch.
S3 is faster than other cloud products such as Dropbox or GCP.
Will R2 be as fast, durable, reliable, etc, with same features, eg, all sorts of encryption key management etc?
Just co-locate your stuff, please.
No thanks, too time consuming and too many liabilities.
at best it makes it a compelling alternative, but the story ends here
I'm not even talking backups here, if you need a backup at all, you probably need it ASAP. This is a governance product.
Cloudflare brings R2 to the edge.
Archive is great for archiving. S3 standard tier is not for archiving.
Egress is the cost that prevents that.
So for backups, stuff you don't need often (AND don't need to restore quickly), it's fine. Not so for most other use-cases.
... Ok, after writing that, I realized that I should probably look up R2's offering first. And then my jaw hit the floor at "free egress bandwidth".
Free egress bandwidth? Yes please.
I will instantly convert.
Then I was worried about price per GB. $0.015/GB is incredibly competitive.
Good lord, where do I get early access to this thing? I'd transition all our infra today.
> That's why Cloudflare plans to eliminate egress fees, deliver object storage that is at least 10% cheaper than S3, and make infrequent access completely free for customers
A few elements of concern. It says they "plan to", so we don't know if this is near term or some moonshot type of goal. It also states earlier in the press release that the free egress frees are offered via the "Bandwidth Alliance", which removes bandwidth charges between member providers. While it is noteworthy that GCP (Google Cloud Provider) and Microsoft Azure are part of this alliance, AWS is not! So does this mean that egress fees will be charged if an AWS server requests the file?
I also don't know what to think about the statement that they plan to make infrequent access completely free... does that mean that I can throw files onto R2 for archive purposes and not pay anything? Because that is what it sounds like by that statement, but it obviously sounds impractical or too good to be true.
Original Press Release: https://www.cloudflare.com/press-releases/2021/cloudflare-an...
For archival use-cases, you do still pay us for data storage. We're referring to not charging for operations for infrequent access - we'll likely drop the stored data charge down too, eventually, but the current pricing is complex enough.
The bandwidth alliance means when Cloudflare requests a file from GCP, GCP won't charge you egress. Cloudflare will then deliver your file to your customer for free.
> Backblaze B2 transfer to Cloudflare is free egress due to bandwidth alliance and Cloudflare CDN is free egress. So you already kinda have that.
Hmm... What does it mean to transfer to Cloudflare? That's interesting.
I want free egress to my Hetzner servers. (4x16TB for $79EU/mo is unbeatable, primarily because Hetzner also has unlimited free egress bandwidth.)
But if Cloudflare offers servers, I should look into that. Do they? Even if they do, what's their egress pricing?
Thanks for the tip!
(I've been wondering about B2's mysterious "computing partners" -- their computing partners get free egress, so it seems entirely plausible that Cloudflare might be one such computing partner. I just didn't realize that Cloudflare might do computing at all -- in my head, they were a proxy, not a server farm.)
If you want free egress, you can check Wasabi: https://wasabi.com/cloud-storage-pricing/#three-info
Another nice thing about B2 is that they have a 1GB free daily egress allowance, which is handy for backup programs. HashBackup (I'm the author) uses that for downloading and repacking backup data to optimize storage efficiency.
They haven't clarified their file operations costs yet though. That could get pricy but will more than likely be cancelled out by the egress savings for most use cases.
CF already talked about egress pricing here https://blog.cloudflare.com/aws-egregious-egress/
i would say that when they talk unlimited, they actually mean it.
They were also acquired very recently[2] though no clue how that might impact things.
I'm fine with all those except the pre-loading $50 just to test, though that's as I want to test it personally. If I were using a business account that's not as much an issue.
[1]: https://console.rstor.space/pricing
[2]: https://www.yahoo.com/now/packetfabric-announces-acquisition...
For example GCP has a much better Kubernetes offering than AWS, but everything's native to S3. So you get stuck using crappier products. If R2 offered free egress, I'd move the entire data lake there just to sidestep this problem.
With Cloudflare you'd have to pay for the CDN separately for any real traffic amount.
Here's a test I recently ran comparing S3, B2, and Storj. The test uploads a 700MB backup, downloads it and rebuilds a database, then removes it. The test was run on a small 1 CPU, 512M Vultr VPS.
S3 upload: 20.6s
S3 download: 26s
S3 remove: 1.2s
B2 upload: 28.2s
B2 download: 28.7s
B2 remove: 4.0s
Storj upload: 34.7s
Storj download: 75.3s
Storj remove: 20.6s
B2 does have higher latency than S3, so for short operations like removing objects, S3 has higher performance. If your egress load is for a lot of small objects, I could see B2 being a lot slower. This shows up in my tests where removing 14 backup files is more than 3x slower on B2 than on S3. But even for this, I've never seen 10x slower.A footnote: the Storj tests are with a new driver so those numbers are preliminary results.
inbound traffic (ingress) is free, almost universally.
outbound traffic (egress) is $insane, almost universally.
It's where most of the cloud providers make most of their money, as far as I can tell.
I'm a fan of murdering egress fees, so therefore I am a fan of whatever Cloudflare R2 turns out to be. As long as they get rid of egress, I'll cheerlead them for life.
(To answer your question more directly: "yes, I think so. At least in my experience.")
> when that data is retrieved from the cloud, providers will then charge large fees; this is what’s known as a data egress.
There are many cloud providers seem to be not charging for egress.
"After specifying an existing storage bucket, R2 will serve requests for objects from the existing bucket,"
I've Beene evaluating R2, but this migration path makes it dead simple to use. I just point my code to read/write from/to R2 instead of S3, and I'll get egress fees from AWS during the migration, but then that's it?
This is probably worse for CloudFront than for S3 itself.
Double paying for storage is still going to be cheaper than egress fees overall.
[1] The Services are offered primarily as a platform to cache and serve web pages and websites. Unless explicitly included as part of a Paid Service purchased by you, you agree to use the Services solely for the purpose of (i) serving web pages as viewed through a web browser or other functionally equivalent applications, including rendering Hypertext Markup Language (HTML) or other functional equivalents, and (ii) serving web APIs subject to the restrictions set forth in this Section 2.8. Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.
"For example, Backblaze B2’s free operations threshold is capped at 2,500 a day, and if R2 simply let you make a GetObject request every second of every day, that would be something like 86,400 FREE daily Gets, which would cost ~$335 daily on B2 or $122,000 a year."
Backblaze B2 Class B operations are priced at $0.004 per 10k operations which the author did not take into account, so the calculation above is off by a factor of 10,000. The real cost for 1 GET per second for a whole year on Backblaze B2 is $12.59.
With S3 I've never had a problem with data loss. Eventual consistency stuff was totally annoying back when that was the way they did things, though that was documented. Dev support is strong if you are paying for it if you have a config issue. WORM support is great in key areas (regulatory, ransomeware protection for backup durability etc).
They usually don't go down during that time, but sometimes they do and their support told us to always expect outages during this time.
If you need your B2 data available during those windows, you'll need to also set up a failover data source for those times.
With S3 I can write a bucket policy that references roles that can be assumed by instances, Lambda functions, people via SSO etc.
With R2, I now need to worry about how to securely distribute credentials to access the buckets, how to rotate those credentials, how to ensure they don't leak. It's perfectly do-able, but it means AWS are always going to have the upper hand.
I'm excited to see some real competition in this space, but switching to R2 has some significant blockers at present. If I was working for Cloudflare on this product, I'd be focusing my effort on how to make integrating R2 with AWS services as simple as possible. I definitely think it's do-able, but it's going to need a team all of its own.
That being said, in that case you pay what you get for. S3 is not just storage, it's integrated with the rest of the AWS ecosystem.
But if you write a small service/product that just needs to store and transfer a lot of data 1) IAM is not needed and almost more a burden but 2) you can have it cheaper elsewhere e.g. R2 now
There are lots of different use cases for lots of different things. Since Cloudflare doesn't have the compute-type services that AWS has, rather is more specifically aimed at web-serving type services, I think their design here makes a lot of sense.
As a user that Cloudflare would target (someone who works on web sites and web-based apps), IAM is an annoyance to me that I would gladly not use on AWS if I could just toggle it off and replace it with a simpler permissions system.
For those unaware, the primary infrastructure cost of these businesses is serving up mp3.
Much like what YouTube does for video.
Using B2 and CF workers to keep costs low currently but if R2 is as good as it sounds I have no reason to write any overages billing code. Lovely!
I'm curious to hear what factors HN readers think I missed and if you think Cloudflare R2 will be a threat in the cloud object storage market place.
It's basically not possible to offer something like this on AWS or GCP, you have to run your own server somewhere with flat bandwidth pricing. If I were still in the data business, I'd absolutely be moving everything to R2.
I know you can already host websites on Cloudflare using another service but I want to see R2 get enabled.
Right now I'm using Netlify and the Netlify CLI lets me upload my static site post-build from CI (which is hooked up to my repos). I've got CF in front of that, but I'd much prefer to keep it all in CF.
I looked into Cloudflare Pages first, which on the face of it would be a better fit - but AFAICS it can only deploy from code hosted in a GitHub repo. Not even git in general, but specifically only with GitHub - and I'm using Azure DevOps.
I'd love to see additional deployment mechanisms for Pages - ideally support would be added to Wrangler, so I could deploy from a CD pipeline hosted anywhere, just like I can for Workers.
You just made my day!
This plus CloudFlare Workers might replace quite a few AWS S3/Lambda projects.
What about using SeaweedFS gateway to remote object store? https://github.com/chrislusf/seaweedfs/wiki/Gateway-to-Remot...
Basically you can think it is a cache. You operate with normal S3 APIs, and the data will be uploaded to remote storage asynchronously (most vendors have free uploading).
It's not a magic pill for all users, but it should be very fast, and cut down the API costs, egress costs.
Of course, I could be completely wrong. I would certainly welcome a correction.
Cloudflare has a business model where they can basically handle any network load you throw at them.
They are monetizing their advantage here by allowing you to make full use of their network as long as you make use of their paid storage as well.
The margins in the cloud business and AWS especially are breathtakingly large and yet, considered affordable by enterprises so far because the alternative of standing up your own compute and networking capacity is not just significantly expensive, there just isn’t enough talent available in the world to go around for every enterprise to have a decently staffed team doing so.
This is evidently the core of Cloudflare's business strategy, emerging in practically everything they do tactically; from Matthew Prince having an epic public sook earlier this year about AWS's network team telling him to jump in a lake, to this. Peering drives down the marginal cost of bandwidth, and this is super relevant since Cloudflare's main value-add product is DDoS mitigation. Unless you're an incumbent tier 1 carrier (which is the hereditary royalty of networking), the classic ways to obtain more/better peering are either have the most eyeballs, or the most content. A typical access ISP takes the former approach, a CDN like Cloudflare takes the latter.
As usual, free anything means you are the commodity being traded on.
Similarly they can use slow storage for most files and use their existing caching solutions for storing frequently accessed files.
They'll probably lose some money on customers who use lots of bandwidth, and make lots of money on everyone else.
Funny thing, paying for egress bandwidth is not something you do if you handle your own peering and other internet infrastructure. Paying for bandwidth is something that the cloud providers came up with to add further margin. So they likely pay $0/TB for that bandwidth.
The migration bit alone is going to be enough to motivate PMs and devs to switch. The free egress seems insane from a business perspective but if Cloudflare wants to do it them I'm all for it.
Very cool.
"I would have synergized harder, but I got tired of clicking the button!"
> Except you can put R2 in front of S3 and set it to "slurp" mode. That way as objects are requested through the normal course of use they'll be stored in R2. You can then keep S3 as a backup, or delete the objects that have moved over. Being a proxy is cool.
https://news.ycombinator.com/item?id=28703464
Note: I think it's pretty obvious that they are very pro AWS. I saw a lot of positive comments and very few negative comments here. Reading the blog post makes it look like it was the inverse.
Relying on Cloudflare to do tiered-caching / transfer-acceleration afforded by tight integration between its CDN and R2 would lead us to drop S3 altogether for our workloads.
We've also experimented using Workers KV as a blob store (it has a cap of 25MB per key; costs $5 per million writes, $0.5 per million reads; $5 per 10GB of storage; zero egress fee) and its pricing comes out cheaper than S3. We'd have moved to KV already if R2 hadn't been announced. Now, we think it is prudent to bide our time till R2 goes public.
But: we are a rather tiny tech shop and agonize over bill items like egress; most enterprises worry more about data security, compliance, and integration with big-data tools (like EMR, Athena, Firehose, RedShift etc). So, I am not sure if there'd be an exodus off-the-bat (at least not until Cloudflare has equivalent integrations / services in-place [0]). Though, I can see why companies like smugmug (who have been using S3 since 2006!) may move.
The killer here is, R2 will sweep away dev shops at the low-end of the market. If anyone's starting a bootstrapped SaaS company, not only does Cloudflare becomes the place for them to prototype a MVP (as opposed to AWS/Azure/GCP) but also an integration point for their offerings (consider: a Tableau competitor + R2) [1]. As noted by Ben Thompson a few days ago, there's little AWS can do despite knowing what's in store, other than cannibalize their own business (which they're not afraid to do!) [2]. We are entering uncharted waters here: Two companies fully drowning in HBS credos going after each other. Wonder what Clayton Christensen would have thought of that.
[0] https://en.wikipedia.org/wiki/Complex_contagion
I also bet it will not change anything for B2. They will still be cheaper than R2
It always funny to see people on HN thinking that price for enterprise companies is a critical factor. It is for some, but for most of them (especially the ones using AWS) it is not
Wasabi gets close, but their 1:1 storage to egress ratio makes it great for backups and a small subset of use-cases, but not much else.
I don't mind services that cut price dramatically in exchange for only supporting some (not all) use cases. That can be a very attractive offer, but does Backblaze explicitly position B2 for non-interactive use cases?
The fact of it is evident from the Cloudflare blog, but I don't particularly recommend giving that the time of day, since as with all content marketing it is also self-serving humbug.
That said I'm looking to see if just using R2 is a big improvement. It'll cost more but the reliability and performance might be worth it for us.
You only get "free" egress for the size of your storage.
The short answer is that at AWS/Cloudflare scale you’re paying a fixed cost for networking capacity, not for transit per byte. So their cost doesn’t scale anywhere near the way AWS egress costs do, and if they want it can just be free.
Now, AWS chooses to price their network usage as a utility, ie, you pay per unit of data transmitted. This is ridiculous for enterprise scale.
Cloudflare uses their purchased capacity as a strategic differentiator by basically letting you use as much as you want because they have so much purchased, you wouldn’t ever make a dent.
Any services behind this differentiator are what they can charge for. Like I said above, the margins in cloud are ridiculously high. This is why these companies are amongst the few in the world valued over a Trillion dollars. Throw in an upstart who has the strategic advantage and technical competence that Cloudflare has and boy, do we have a winner . The next few years will be very interesting.
CF may be great technically, but I personally wouldn't use them without an enterprise agreement in place. Bandwidth should be cheap, but cheap does not equal free.
Unless I had an enterprise agreement in place I'd rather work with a vendor that has a well defined usage-based pricing. I have a low appetite for risk, and usage-based pricing aligns incentives properly IMHO.
In our case, one of our games DAU went pretty crazy last Christmas which resulted in a huge increase in players (who all need to download hundreds of MB of data). Maybe if it'd continued for many months the situation would be different and that angry email from CF would have eventually arrived.
That would depend on the use case I'd assume.
The full excerpt [0]
> Most leading cloud providers allow their customers to input data into the cloud for free. However, when that data is retrieved from the cloud, these providers will then charge large fees; this is what’s known as a data egress.
Here is what I found out about ingress [1]
> Egress in the world of networking implies traffic that exits an entity or a network boundary, while Ingress is traffic that enters the boundary of a network.
[0] https://wasabi.com/help/glossary-of-terms/egress-charges-def...
[1] https://aviatrix.com/learn-center/cloud-security/egress-and-...
Here's the rule of thumb: https://news.ycombinator.com/item?id=28775836
> inbound traffic (ingress) is free, almost universally.
> outbound traffic (egress) is $insane, almost universally.
so think of it like, teleport yourself to S3's servers. Any data that comes in, you charge $0. Any data that goes out, you charge $massive.
This seems to be true for almost every provider I've found. Hetzner is one of the rare exceptions. If you need a server, get a Hetzner dedicated box, because it's unlimited traffic (both ingress and egress). It powers https://battle.shawwn.com/ (big dump of files).
I'd love to store things in S3 or GCE, but it's a non-starter, because transferring between GCE to Hetzner would cost $0.12/GB downloaded. 12 cents per GB! It doesn't sound like a lot till you do the math on 22TB.
Hence, R2 is incredibly appealing. I'd love $0.015 per GB storage cost + free egress, because it means I can download as much as I want to my hetzner server. Meaning, I don't need to worry about my hetzner drives failing.
- Set up the S3 mirror into R2
- Migrate your code to read from R2.
- Set up SQS to populate with S3 Create events. SQS listeners just make a GET request to R2 for that file.
- Generate S3 events and populate SQS by running List operations or by abusing S3 Lifecycle Management.
- Let it process.
- Switch writes to R2.
This all assumes you can’t delete from S3 till R2 is fully ready. Depending on the application, you could switch over writes to R2 in a different step and also possibly delete the S3 file in the SQS processor.
Where did you find the reference to 1TB per day?
On the page it says:
“1 TB of data egress for every 1 TB storage capacity used”
I would have assumed this is per month.
It's a shame as I love aspects of this type of storage service, even with the caveats, but they're not useful if there's no way to pay more for excess transfer. Luckily I think R2 fits that requirement (though paying for more operations vs paying for more transfer).
If you’re using AWS, S3 (and Cloudfront) it will just be a tiny part of your infrastructure, why would you move that out of AWS and lose the integration coming from having everything in AWS?
It's a good idea to split origin and CDN, this way requests can be served from cache while origin is down or overloaded.
Plus I'd challenge the suggestions that S3 and CDN are a tiny portion of the infrastructure. For a lot of sites it's a significant chunk, things remaining wouldn't be many, maybe compute, DNS, and CI.
In this case, for most AWS/CF customers, that diversification is of the "downtime on either represents downtime for their customer" variety. (I'm still a fan of this offering and the pressure it put on the current high price of egress bandwidth, of course.)
because of this?
If you don't care and just need some blob storage, this could be great.
> There are no charges for outbound data transfer when using a dedicated network connection to the RSTOR network and direct peering is established. These connections are charged separately. To find out more please contact us.
I would imagine for enterprise customers you might either be using direct peering / similar or they're able to charge based upon ballpark bandwidth usage. Otherwise I'm unsure as it does seem quite in line with Wasabi otherwise.
Makes zero sense to keep public assets on S3 in this case. Only when you need ACL integration or glacier would S3 make sense.
If R2 was down could you re-route back to S3?
Diversifying infrastructure provides additional options to mitigate outages.
If R2 was down and you didn’t rely on anything else that relied on R2, sure, you could repoint DNS and/or issue S3-direct URLs and then clean it up on the way back out.
“Additional options” doesn’t come without “additional complexity” and that complexity has a cost.
R2’s a great cost-saving move here; I don’t think it’s a reliability improvement generally.
I am an AWS administrator/architect for work so we are always trying to weigh pros and cons. AWS S3 is notoriously overpriced. This is a well-established fact. There are many other providers that offer comparable solutions (or even superior) such as the new R2. But we feel the effect of vendor lock-in because of S3's integration with other AWS services, which is what keeps a lot of people over-paying for S3. I think the auto-migration feature is potentially one of the best arguments for switching to R2.
R2 is undeniably a better value than S3. S3 requires me to select a region and optionally even limit an availability zone (if I need to keep costs low). CDN/edge locations are all extra cost via AWS Cloudfront. And the reality is many people are already using Cloudflare as CDN in front of S3 storage. So R2 just becomes a no-brainer at that point. I think it will be a successful launch. I am excited to try it.
Very informal survey ... but I wonder if you are familiar with the 'rclone' tool:
... just curious ...
I suppose my only skepticism is "but how fast can I egress?" -- if the bandwidth is 100x slower than GCP, it might dampen my enthusiasm a little bit. But honestly I'd still take a 100x slowdown if it means I can do long term archival without paying $200 just to download the data, soooo....
Anyway, cheers, and thanks for doing impactful work!
Not necessarily.
A nearlyfreespeech noted when they changed to a charge-by-incoming-bandwidth model, links are still often bought symmetrically and for them the thing that eats the most bandwidth is incoming DDoS attacks, not legitimate traffic.
It seems to me that Cloudflare are likely to be in this situation except even more so.
Backblaze (the consumer thing, not B2) has always had unlimited backups, and still does. One of their execs (Brian) posted a graph showing a backup space utilization histogram for all customers in 2018. The first few data points are:
30% use <100GB 15% use <200GB 9% use <300GB ...
So ~55% of their customers are using <300GB and paying $6/mo. On B2, their cloud storage product, $6 will buy 1.2TB of storage. Way more than half of their unlimited customers are paying 4x more for unlimited storage than they would if they paid for metered storage.
Interestingly, on that same histogram, are the last 2 customers: one using 293TB and one using 430TB. While IMO they are misusing the service, they are also an extreme minority. If that last guy was paying for that space with B2, it would cost $2150/month. Backblaze has said they just don't care. Now if it were half their users doing it, of course they would.
I can tell you one thing: R2 with free bandwidth is going to be a gift for the porn industry!
https://blog.cloudflare.com/introducing-r2-object-storage/#:...
In my opinion, expect 3-6 months for the beta, then maybe 2 years for GA. Storage at scale isn't easy, especially if they want to be comparable to aws in speed, SLA, data protection (ie. duplicated to 3+ physical data silos), etc.
I'm using Azure, rather than AWS, and I hope R2 is available for Azure eventually too.
In the mean time, I came across a Cloudflare blog post from earlier this year that said you can use Azure's preferred routing feature to point to Cloudflare, which will result in "substantially cheaper" egress bandwidth fees. Sounded good, but when I looked at Azure's bandwidth pricing it looks like egress routed through Cloudflare is barely any cheaper, only 9% or so :(
The question: am I missing something, or is that paltry reduction really all Azure is doing?
E.g. https://developers.cloudflare.com/cloudflare-one/connections...
https://gist.github.com/selcuk/90ce1ce8b8d46c869efd6da24cde1...
I am not planing on doing either, but I am just curious what you would do about it?