How Discord Stores Trillions of Messages

How Discord Stores Trillions of Messages(discord.com)

111 points by phineyes 3 years ago | 10 comments

Solid article. Is it fair to say that the "data services" layer is essentially a cache sitting in front of the database, or am I misunderstanding it's function?

zorkian 3 years ago | |

(I work at Discord in infrastructure.)

We use data services to do "data related things" that make sense to do at a central proxy layer. This may include caching/coalescing/other logic but it doesn't always, it really depends on the particular use case of that data.

For messages, we don't really cache. Coalescing gives us what we want and the hot channel buckets will end up in memory on the database, which is NVMe backed for reads anyway so the performance of a cache wouldn't add much here for this use case.

In other places, where a single user query turns into many database queries and we have to aggregate data, caching is more helpful.

archi42 3 years ago | |

In essence, yes, but strictly speaking, no. Instead of caching responses, that layer seems to only bundle equal requests.

So once a request is sent to the database, every other instance of the same request (e.g. "hey, fetch me all messages from server id 42") is put on hold. Once the initial request gets an answer from the database, that answer is distrubuted to the inital requester and all those which were on hold. Now if someone is late to the party, they will initiate a new request to the database, because the response is not cached.

I really like this, and this a is great article to share on HN :)

dantiberian 3 years ago |

How much of the latency reduction was due to the data services layer with request coalescing and how much was from ScyllaDB? Did the schema and other factors stay the same when migrating to ScyllaDB?

ezekg 3 years ago | |

> We performed automated data validation by sending a small percentage of reads to both databases and comparing results, and everything looked great. The cluster held up well with full production traffic, whereas Cassandra was suffering increasingly frequent latency issues.

Cassandra was still suffering even with the new data services.

parthdesai 3 years ago | | |

>We elect to engage in some meme-driven engineering and rewrite the data migrator in Rust

Could you guys elaborate more on the data migrator by any chance?

jeremychone 3 years ago |

As usual, great article from Discord team.

Rust in the Cloud will make more and more sense as companies focus on optimizing operational costs without compromising scalability and quality.