How We Partitioned Airbnb’s Main Database in Two Weeks(nerds.airbnb.com) |
How We Partitioned Airbnb’s Main Database in Two Weeks(nerds.airbnb.com) |
We later tried just running mysql ourselves on big instances, and raid'ing across a large amount of EBS volumes... we ended up running into other weird issues with that too. We would sometimes get terrible write latency spikes, which we were told was a result of "stuck blocks on the SAN". Apparently the backing SAN would sometimes have some blocks that performed very poorly (maybe a disk under high contention in one SAN cabinet?), and this would cause our overall RDS performance to plummet, but only irregularly. We would usually get on the horn and after talking with someone it would either magically stop being slow ("we dont see any problems here!") or we would be told about some "stuck blocks" and they would do some type of remapping or migration of those blocks. Not very transparent to us what really was going on. Sometimes we would just spin up new instances and ebs volumes, and do perf testing on them until we got a set that performed consistently, until something goofy happened again a week later or something. Pretty awful. We tried local instance storage, but it just wasn't fast enough (primary reason), and it felt a bit dangerous -- even though we backed up to an EBS volume pretty regularly.
We ended up bailing from aws and saw huge performance improvements (reduction in latency, etc) by using ssds and real hardware. We even actually ended up with some cost savings! Not long after we left, amazon came out with local SSD storage (high I/O instances I think they were called?), which may have been workable, but by then we had migrated away (we still used s3 though, and still used on-demand instances for developers).