DoorDash manages high-availability CockroachDB clusters at scale(cockroachlabs.com) |
DoorDash manages high-availability CockroachDB clusters at scale(cockroachlabs.com) |
How does DoorDash get to 1.2 million queries per second. 1.2mqps * 10000 seconds in 3 hours = 12 billion queries to process 5 million orders? That's wild. Is it all analytics? This is highly suspect. 35m users isn't nothing, but it isn't exactly Facebook scale either.
2,400 queries per order? That's not that crazy IMHO. There might be significant database fan-out on each click (depending on how they do geographic lookups, search ranking / synonyms / sponsored stuff, the repeat your last order features, whether the ranked search returns the full object or a reference that then has to be individually queried, etc.). There might be many clicks per order because people browse a lot (both to find a restaurant then to find dishes within the restaurant), leave reviews, poll for delivery status updates, etc.
Isn't that off by at least an order of magnitude though? It forces them to operate a much larger cluster than should be necessary.
That said 1.2 million queries per second is wild. Would be interesting to see the breakdown.
A curious description for a platform which acts as a broker for transactions between users!
> About 2,300 total nodes spread across 300+ clusters.
> About 1.9 petabytes of data on disk.
> Close to 900 changefeeds.
> Largest cluster is currently 280 TB in size (but has peaked above 600 TB), with a single table that is 122 TB.
all of this yet my food still arrives cold af
kidding aside, I wonder if DD has the same problems as Uber or Lyft except with food delivery. Each new "change feed" is a specific region, county/municipality, or city. Federal, state, and local laws all handled delicately.
Ha.
The first thing I noticed and you almost got to it in your summary: at 1.2MM/2300 = 520 qps per node, this isn't a wild setup. I'm wrapping my head around how they're generating that amount of load. Seems like an easy task for any database to handle.
This is a summary of a recent conference talk:
https://youtu.be/jCjrfpF64Kc?si=Gf-gp_ixX2V6Qz8V
This was my team. We did and lived this. AMA.
It wouldn't be a good idea to take a large working PostgreSQL app and try to switch over to using CrDB. You'd spend all your time (unwittingly rewriting the entire app) speeding up and grouping a few queries at a time.
For the highest throughput endpoints we did make some changes to be more optimal on CRDB so we could run a smaller cluster, but it didn't require anything close to a rewrite.
Even after that, now your applications using the DB have to be aware of the sharding - interactions between users who are housed on different shards etc could require a lot of work at the application layer. If your customers can be easily be split into tenants which never interact with each other this isn't so bad but for a consumer app like DoorDash there isn't clear tenant boundaries.
We looked at all this for Kami and realised that it would be much easier for us to move from PostgreSQL to CockroachDB (we had exceeded the write capacity of a single PostgreSQL primary) than to shard Postgres, and it'd make future development much faster. We could have made sharding work if we had to... but it's not 2013 any more and we have distributed SQL databases, why not use them?
Also the majority of entities they're tracking (users, drivers) do not have fixed locations.
Maybe it's not as hard as I'm thinking. I guess you just have to accept that any query can span an arbitrary number of shards and the results need to be union'd.
I'm sure a lot of smart people have tackled this at the DoorDashes and Ubers of the world and maybe there's some optimal way of handling it. I would love to hear about that.
> How do you deal with entities who are near the boundaries and whose current operational data therefore spans >1 shards? (Imagine somebody at near the geographic intersection of like, five shards looking for pizza in a 10 miles radius or w/e)
Hitting 5 shards might not be that bad. I think you could divide the world into sufficiently large hexagonal tiles; you'd hit at most three shards then. Maybe each fixed-size tile is a logically separate database. Some would be much hotter than others, so you don't want to like back each by a fixed-size traditional DBMS or something; that'd be pretty wasteful.
> Also the majority of entities they're tracking (users, drivers) do not have fixed locations.
Yeah, you at least want a global namespace for users with consistent writes. The same email address belonging to different people in different regions is unacceptable. In theory the global data here could just be a rarely-updated breadcrumb pointing to which database holds the "real" data for that user. [1] So you can make the global database be small in terms of data size, rarely-written, mostly read in eventually consistent fashion, and not needed for geographically targeted queries. That could be worthwhile. YMMV.
[1] iirc Spanner and Megastore have a feature called (something like) "global homing" that is somewhat similar to this. For a given entity, the bulk of the data is stored in some subset of the database's replicas, and bread crumbs indicate which. If you get a stale bread crumb, you follow the trail, so looking up bread crumbs with eventually consistent reads is fine. [edit to add a bit more context:] One major use case for this is Gmail. It has tons of regions in total, but replicating each user's data to more than 2 full replicas + 1 witness would be absurdly wasteful.
[edit:] looks like CockroachDB has the concept of a per-row preferred region, which might also be vaguely similar. <https://www.cockroachlabs.com/docs/v23.1/table-localities#re...> I haven't used CockroachDB and only skimmed this doc section.
You could do it by market (eg. SFBA, Los Angeles, San Diego) or by state.
[0] https://www.cockroachlabs.com/docs/stable/topology-follow-th...
They could have just as easily dropped in Oracle. You pay for expensive DB up front, and can hire cheaper junior DBAs and developers going forward.
No, for all the reasons I just said?
> It forces them to operate a much larger cluster than should be necessary.
How much machine cost and operational effort do you imagine they would save if they reduced the qps by a factor of 10 without changing the number of regions, number of tables, or size of the data? How much SWE time do you imagine that'd take to do and maintain?
I've run a global Paxos-based database that received two orders of magnitude more qps than this. It cost less than you're probably imagining. I sometimes hunted down silly queries, but mostly leader ops, and mostly to mitigate hot spots or as a quixotic latency reduction effort...overall, this was the cheapest layer of the system.
A query to a well-implemented OLTP database is not like a request to some Python/PHP/Ruby app.
We could have modified our application layer to handle those cases, but it's a lot of extra complexity and room for error, and we'd have had to consider and solve for all of these cross-tenant situations as we add new functionality, so I was really keen to avoid that.
Also, there are some really big districts - NYCDOE has >1.1 million students and 1,800 schools. Even with them on a dedicated shard, it's quite possible that it'd get overloaded and we'd be spending more dev effort figuring out how to safely split them onto multiple shards.
When we looked at using distributed SQL database instead it was a clear win - from the application's perspective, it just looks like a really, really big PostgreSQL box, so we didn't need to change much. (the SQL support is very close to PG - The most annoying thing for us was the lack of trigram indexes, and Cockroach has now added those now). And in terms of the operational side, upgrading and maintaining CRDB has actually been easier than PG - version upgrades are easier to do without downtime, and schema migrations don't lock tables.
You could probably just do it by continent as no
one from SA would order anything from Europe
Imagine a luxury version of DoorDash that does work this way. As I awake in my luxury palace in Sao Paulo, I realize that I would like some fresh grapes from the champagne region of France. With a few taps on my Luxury Door Dash app, a plane is on its way with my grapes.I mean, I've seen conventional SQL databases handle ten million orders per hour on a single host. I find it hard to believe DoorDash is processing more than ten million orders per hour, even in a large city.
I suppose they might exceed what a single host can handle if they're, I don't know, recording every driver's location once per second?
I suppose they might exceed what a single host
can handle if they're, I don't know, recording
every driver's location once per second?
Even then, that's not that much data. You only need to retain the current location of the driver and you can aggressively prune data more than N seconds old.A quick Google suggests there are 2M Doordash drivers, but I'll assume that's "all drivers who have ever signed up for DoorDash, ever" and the number of DoorDash drivers actually working at any given moment is a small fraction of that.
If we assume that a max of 100,000 drivers are working at any moment, and a slightly more relaxed location update interval of 10 seconds, that's 10K updates per second which is not exactly super high performance stuff. Of course, tracking driver location is just once piece of their operations.