Feeding data to 1000 CPUs – comparison of S3, Google, Azure storage

Feeding data to 1000 CPUs – comparison of S3, Google, Azure storage(blog.zachbjornson.com)

216 points by ranrub 10 years ago | 72 comments

oavdeev 10 years ago |

Stock Ubuntu needs SR-IOV driver to get to the actual bandwidth limit on ec2, it makes a lot of difference. We routinely get to ~2 Gbps down from S3 with that setup (using largest instance types).

edit: Gbps not GBps

rmcpherson 10 years ago | |

That's true, although the latest stock ubuntu HVM AMIs (14+, I believe) have the SR-IOV driver already and use it by default. Older AMIs need to have it installed and enabled on the AMI. I believe enhanced networking is only available on HVM amis.

oavdeev 10 years ago | | |

This problem definitely existed with official 14.04 (HVM) AMI, though I haven't re-tested this recently, they may have fixed it. It did have some kind of SR-IOV driver but it was too old.

hrez 10 years ago | |

Good point for "enchanced networking" instances. I didn't see OS specified in the article. AMZN linux would have SR-IOV driver by default. PV vs HVM might also have an impact.

zbjornson 10 years ago | | |

Per the comment here [1] and the linked twitter convo, I'll retest S3 with Amazon Linux soon. These tests used Ubuntu 14.04 on all providers, and did use HVM. My understanding is that this will possibly increase the network throughput of the VM, but the benchmarks stayed below the VM's capacity (which was the reason I included the charts of VM throughput).

[1] https://news.ycombinator.com/item?id=10846497

lowbloodsugar 10 years ago |

If you are pulling large files from S3 we have found that they can be sped up by requesting multiple ranges simultaneously. It is easy to hit 5Gb/s or 10Gb/s on instances with the necessary bandwidth, accessing a single file, or multiple files. We have not encountered a limit on S3 itself. YMMV.

hrez 10 years ago | |

Excellent https://github.com/rlmcpherson/s3gof3r is my tool of choice for "fast, parallelized, pipelined streaming access to Amazon S3."

If you want to saturate network bandwidth with S3 that's the one tool I know that can do it.

jedberg 10 years ago |

AWS has a limit on the total throughput any one account can have to S3, so the more CPUs OP adds, the worse OPs performance will be on each one. I suspect the other providers have the same restriction.

I either missed it or OP didn't specify how many instances they was using at once to run their benchmark, but the more instances they used, the worse it will be per node.

This did not seem to be accounted for.

EDIT: OP says below it was from one instance, so what I said doesn't apply to this writeup.

BrandonY 10 years ago | |

This is not the case with Google Cloud Storage. I cannot speak to the other providers.

Google Cloud Storage does not limit read or write throughput with the exception of our "Nearline" product (and even Nearline's limiting can be suspended for additional cost, a feature called "On-Demand I/O").

jedberg 10 years ago | | |

That's good to know, and definitely adds credence to my opinion that networking is the area where Google is definitely winning the Cloud Wars(tm)

zbjornson 10 years ago | |

All the benchmarks were from a single instance.

(Note that I have done some testing from AWS Lambda, where we had 1k lambda jobs all pulling down files from S3 at once. That's a bit harder to benchmark...)

jedberg 10 years ago | | |

Hi OP, nice writeup! I hope my comment wasn't construed as dismissing the work, just a criticism of one small part.

It sounds like that wouldn't have been a factor, except for the cap you seem to have discovered on Amazon that you called out.

My only suggestion then is you may want to make it explicit that you ran the benchmarks from a single instance.

hrez 10 years ago | | |

Any comments on how it worked out with Lambda?

colechristensen 10 years ago | |

Do you have any sources or more information about the per-account S3 limits?

jedberg 10 years ago | | |

I don't have any published sources, it's something they told me, but it's hinted at here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-...

They explicitly mention the RPS per account limit in that doc, which is related.

nostromo 10 years ago | | |

Take into account OP's former jobs. I imagine if anyone would run into such a limit, it would be Reddit or Netflix.

lowbloodsugar 10 years ago | |

If such a limit exists, it would not have been hit on such a small benchmark. However, I am unaware of any such limit and it has never been raised in any discussion I have had with them. I am responsible for a large compute and data storage platform backed by S3.

Is this a limit that is hit anywhere near the 150GB discussed in this article, or is it something that you hit only if you are Netflix? We have TB in S3 and have not observed any limit other than EC2 instance bandwidth.

jedberg 10 years ago | | |

The amount of data one has in S3 isn't really relevant to the discussion, only how quickly you're trying to pull it into your instances.

ChuckMcM 10 years ago |

When I see things like "data set size 150GB" and "1000 CPUS" I just naturally assume they are all in memory and never come from disk :-)

zbjornson 10 years ago | |

That's one of many data sets on the server, so unfortunately we can't keep them all in memory at once. :(

ChuckMcM 10 years ago | | |

Lets assume when you're saying "cpu" when you mean "core" and your typical server class machine has 24 of those. A 1000 "cpus" is 41 machines, if they each donate 32GB to the cause[1] that is 1.3TB worth of data which is only a few microseconds away from any core.

I'm not sure why anyone would build a server with less than 96GB on it these days, so its not at all unreasonable. Now your service provider my jerk you around but you can run two racks of machines (48 machines) in a data center with specs like that for about $25K/month (including dual gigabit network pipes to your favorite IP transit provider) So it isn't even all that huge of an investment.

[1] Consider your typical 'memcached' type service where data is named as a function of IP and offset.

jacquesm 10 years ago | |

I think that data set is too small to constitute a good benchmark for the setup.

JoachimSchipper 10 years ago | | |

You're not wrong, but apparently such a short burst is what they're actually doing in their application.

ranrub 10 years ago |

with kernel tuning, S3 performance improves (and will probably improve on GC/Azure as well). Also, author uses Ubuntu 14.4 (see https://twitter.com/Zbjorn/status/684492084422688768), which doesn't use AWS "Enhanced networking" by default. Would be interesting to see results for tuned systems.

skywhopper 10 years ago |

Very interesting comparison, glad to see it. I don't have a comment on the content itself but I do have a note on the presentation.

The colors used for S3 and Azure Storage in the graphs are very near indistiguishable to me, as I have moderate red-green colorblindness. It's easier to tell apart on the bar graphs, since the patches of color are much larger, although I still have to work at it, and use the hints of the labels, but on the line graphs, it's basically impossible to tell apart. A darker shade of green would solve the problem for me personally, but I'm not all that bad a case, nor an expert on the best shades to pick for general color-blindness accessibility.

Just something to think about when presenting data like this.

mistermann 10 years ago | |

Color blind here as well, I had to zoom in incredibly close to distinguish the difference.

zbjornson 10 years ago | | |

Thanks for pointing this out, and my apologies! Will fix that going forward.

jen20 10 years ago |

Has the author (if they are reading here) considered using Joyent's Manta to take the processing to the data instead?

vgt 10 years ago | |

There are plenty of architectures that do exactly this. EMR-on-S3, Google Dataproc on GCS, Snowflake-on-S3, BigQuery-on-GCS, etc etc.

The bigger point in the article is that these exact "take processing to the data" architectures operate exceedingly well on S3, GCS, Azure.

And, as a biased observer, these architectures operate on GCS the best due to great performance measured in the article, quick VM standup times, low VM prices, and per-minute billing.

zbjornson 10 years ago | | |

I'm still trying to parse the docs and Manta source code to see what it actually does, but it seems unique if the data storage nodes are also the data processing nodes and no data transfer happens from some storage service before the job begins. The other key factor is having neither startup time nor the cost of a perpetually running cluster. Per my comment below [1], we have used Lambda with S3 to get something like this, as well as our own architecture built on plain EC2/GCE nodes.

[1] https://news.ycombinator.com/item?id=10846514

justinsaccount 10 years ago | | |

As you sure you understand what "take the processing to the data" means?

EMR-on-S3 is the "copy the data to the processing nodes" variety.

linc01n 10 years ago | |

I think Manta is better if the result set is smaller than input set. So network performance won't matter that much. And also a per second pricing is better since the author need the result in 10 seconds.

Spinning up a cluster of VMs and use 10 seconds and they charge you min. 1 hour seems expensive to me.

dharbin 10 years ago | |

I don't know about Manta, but this is the entire point of HDFS. It easier to move code than data.

zeristor 10 years ago | | |

Indeed, but they're having such fun. Let's leave them be.

zbjornson 10 years ago | |

Hadn't heard of it, looks cool. Thanks for the tip :)

rmcpherson 10 years ago |

In S3 tests on c3.8xlarge instances, I've seen 8 Gbps throughput on both uploads and downloads using parallelized requests. Testing with iperf between two of the same instances maxed out about 8 Gbps as well so the throughput limitation is likely EC2 networking rather than S3.

These tests were done over a year ago so bandwidth limitations on EC2 may have changed since.

This testing was with https://github.com/rlmcpherson/s3gof3r

zbjornson 10 years ago | |

That's really cool. Wonder if the same technique (parallel streams) would help for Azure and GCS. I know GCS has some built-in capabilities for composite uploads/downloads, which might achieve a similar effect.

imperialdrive 10 years ago |

Thanks for sharing your research - I've been up to the neck in EC2 migrations and trying to benchmark as I go... S3 is the neck chunk of work. Rock on!

hrez 10 years ago |

What missing from description is network setup. Is it ec2 classic, VPC? Is ec2 getting to s3 through IG? Hopefully not through NAT. There is also VPC endpoint to s3. Which all may have different performance profiles especially with multiple instances.

zbjornson 10 years ago | |

Network was VPC. The EC2 instance had an IG attached, yes, but I'm not sure if you're asking if an internal vs. external URL for S3 was used? Are you saying there's a better endpoint than s3-<region>.amazonaws.com for S3 requests from EC2?

hrez 10 years ago | | |

I meant http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-en...

It's a private connection to AWS services including S3. You'd use the same URL as it's a routing basically. No idea if VPC endpoints would be better than IG though. P.S. Just tested and I get about half of the latency on VPC endpoint.

dwelch2344 10 years ago |

I'd be interested to see how AWS' Elastic File System (EFS) compares (though I'd imagine it's not great, given it's mounted via NFS)

jaytaylor 10 years ago | |

No hard numbers for you, but FWIW I ran tests about 4 months ago and the performance was /very/ low compared to what is achievable compared to S3 and even normal NAS.

zbjornson 10 years ago | |

I've been on the list to get into their preview program for a while so I can benchmark it, actually! Part 3 of the blog post is going to include some NFS stuff either way.

acdha 10 years ago | | |

When you do, it would be really useful to include the classic fio/bonnie/etc. stuff to break down performance by the type of operation (e.g. file creation / deletion, streaming read/write, random read/write) and block size.

EFS supports NFSv4 so it should avoid being as routinely limited by server round-trip latency as NFSv3 tends to be but it'd be nice to see how well that works in practice.

frik 10 years ago |

How reliable is Azure? For example the story of Gitlab on Azure was a disaster: https://news.ycombinator.com/item?id=10781263 Something like that wouldn't happen on AWS, GC, Softlayer, etc.

qaq 10 years ago |

WTF would one deploy such thing in the cloud?

cottonseed 10 years ago | |

Because renting 1000 cores for a limited time is much cheaper than buying them outright?

qaq 10 years ago | | |

1000 cores of what ? Vcore is marketing BS. Even if it was not marketing BS it's 28 2U 3 node boxes (if using older cpus) or 14 2U 3 node boxes (if using more recent ones) unless they have extremely spiky workload using AWS is pointless. Bandwidth bound scientific apps ==> use infiniband cluster.

gtaylor 10 years ago | |

I spun up something like 200 "cores" to archive a large Cassandra cluster to Google Storage (Kubernetes cluster plus 200+ containers running the archive worker). Could have gone much bigger to get it done faster, but it wasn't necessary. ETL or archive jobs would be the most common case, to answer your question.