Why Discord is switching from Go to Rust

Why Discord is switching from Go to Rust(blog.discordapp.com)

1582 points by Sikul 6 years ago | 642 comments

Looks like the big challenge is managing a large, LRU cache, which tends to be a difficult problem for GC runtimes. I bet the JVM, with its myriad tunable GC algorithms, would perform better, especially Shenandoah and, of course, the Azul C4.

The JVM world tends to solve this problem by using off-heap caches. See Apache Ignite [0] or Ehcache [1].

I can't speak for how their Rust cache manages memory, but the thing to be careful of in non-GC runtimes (especially non-copying GC) is memory fragmentation.

Its worth mentioning that the Dgraph folks wrote a better Go cache [2] once they hit the limits of the usual Go caches.

From a purely architectural perspective, I would try to put cacheable material in something like memcache or redis, or one of the many distributed caches out there. But it might not be an option.

It's worth mentioning that Apache Cassandra itself uses an off-heap cache.

[0]: https://ignite.apache.org/arch/durablememory.html [1]: https://www.ehcache.org/documentation/2.8/get-started/storag... [2]: https://blog.dgraph.io/post/introducing-ristretto-high-perf-...

dochtman 6 years ago | |

One the one hand, yes. On the other hand, all of this sounds much more complex and fragile. This seems like an important point to me:

"Remarkably, we had only put very basic thought into optimization as the Rust version was written. Even with just basic optimization, Rust was able to outperform the hyper hand-tuned Go version."

chubs 6 years ago | | |

I found similarly when I ported an image resizing algorithm from Swift to Rust: I'm experienced in swift thus was able to write in an idiomatic way, and have little Rust experience thus I wrote it in a naive way; yet still the rust algorithm was twice(!) as fast. And swift doesn't even have a GC slowing things down!

novok 6 years ago | | |

A C app will tend to outperform a Java or Golang app by 3x, so it isn't too surprising.

stingraycharles 6 years ago | |

> The JVM world tends to solve this problem by using off-heap caches. See Apache Ignite [0] or Ehcache [1].

For those who care, I was interested how off-heap caching works in Java and I did some quick searching around the Apache Ignite code.

The meat is here:

- GridUnsafeMemory, an implementation of access to entries allocated off-heap. This appears to implement some common Ignite interface, and invokes calls to a “GridUnsafe” class https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...

- This class is the closest to the JVM’s native memory, and wraps sun.misc.Unsafe: https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...

- And this, sun.misc.Unsafe, is what it’s all about: http://www.docjar.com/docs/api/sun/misc/Unsafe.html

It’s very interesting because I did my fair share of JNI work, and context switches between JVM and native code are typically fairly expensive. My guess is that this class was likely one of the reasons why Sun ended up implementing their (undocumented) JavaCritical* etc functions and the likes.

sreque 6 years ago | | |

Unsafe lets you manipulate memory without any JNI overhead other than when allocating or de-allocating memory, and that is usually done in larger chunks and pooled to avoid the overhead at steady state. Netty also takes advantage of Unsafe to move a lot of memory operations off the java heap.

Unsafe was one of the cooler aspects to Java that Oracle is actively killing for, well, no good reason at least.

chrisseaton 6 years ago | | |

> context switches between JVM and native code are typically fairly expensive

Aren't these Unsafe memory read and write methods intrinsified by any serious compiler? I don't believe they're using JNI or doing any kind of managed/native transition, except in the interpreter. They turn into the same memory read and write operations in the compiler's intermediate representation as Java field read and writes do.

winrid 6 years ago | | |

The idea is that that call is still less expensive than going over the wire and MUCH less expensive than having the GC go through that heap now and then.

adev_ 6 years ago | |

> The JVM world tends to solve this problem by using off-heap caches. See Apache Ignite [0] or Ehcache [1].

Yeah, but I really do not bite your argument.

When you are reduced to do manual memory management and fight the GC of your language, maybe you should simply not use a language with GC in the first place.

They are right to use rust ( or C/C++) for that. It's not for nothing that redis (C) is so successful in the LRU domain.

> It's worth mentioning that Apache Cassandra itself uses an off-heap cache.

And still ScyllaDB (C++) is able to completely destroy Cassandra in term of AVG latency [0]

[0]: https://www.scylladb.com/product/benchmarks/

tsimionescu 6 years ago | |

> I can't speak for how their Rust cache manages memory, but the thing to be careful of in non-GC runtimes (especially non-copying GC) is memory fragmentation.

As far as I know, a mark-and-sweep collector like Go's doesn't have any advantage over malloc/free when it comes to memory fragmentation. Am I missing some way in which Go's GC helps with fragmentation?

lossolo 6 years ago | | |

Go GC implementation uses memory allocator that was based on TCMalloc (but derived from it quite a bit). They use a free list of multiple fixed allocatable size-classes, which helps in reducing fragmentation. That's why Go GC is non-copying.

wpietri 6 years ago | |

Maybe I've missed this, but why do they need a particularly large LRU cache? Surely this isn't all one process, so presumably they could reduce spikes by splitting the same load across yet more processes?

Techies4Trump 6 years ago | | |

Larger cache = faster performance and less load on the database.

I only glossed over the article but the problem they had with Go seems to be the GC incurred from having a large cache. Their cache eviction algorithm was efficient, but every 2 minutes there was a GC run which slowed things down. Re-implementing this algorithm in Rust gave them better performance because the memory was freed right after the cache eviction.

Splitting it across more processes will result in more cache misses and more DB calls.

pshc 6 years ago | |

Great comment and thanks for the reading material.

Now I'm wondering if there's a Rust library for a generational copying arena--one that compacts strings/blobs over time.

steveklabnik 6 years ago | | |

Generational arenas yes, but copying, I'm not aware of one. It's very hard to get the semantics correct, since you can't auto-re-write pointers/indices.

cft 6 years ago | |

> From a purely architectural perspective, I would try to put cacheable material in something like memcache or redis

You cannot use a caching server at that scale with those latency requirements. It has to be embedded

toxicafunk 6 years ago | | |

Something like rocksdb (https://rocksdb.org/) then

tlrobinson 6 years ago | |

> From a purely architectural perspective, I would try to put cacheable material in something like memcache or redis, or one of the many distributed caches out there. But it might not be an option.

Can you speak to why using something like memcache or redis may not be an option?

otterley 6 years ago | | |

For latency-sensitive services, having to traverse the network to access a shared cache may be too slow. To use the current story as an example, you'd be trading off an occasional 100-millisecond latency spike every 2 minutes for an added 1-2ms of latency for every request.

PeterCorless 6 years ago | |

Wow. We literally just published why to not put a cache in front of your server to mask its bad performance behind a layer of complexity. tl;dr: make sure you have a solid DB to begin with. (Forgive the gated asset, but it's a good read!)

https://go.scylladb.com/7-reasons-no-external-cache-database...

mrjn 6 years ago | |

Don't know enough about Rust, but I think Go would benefit immensely by allowing its users to disable GC and allow de-allocating memory by hand. GC is great for simpler applications, but more complex projects end up fighting so much with memory and GC in Go that all the benefits of automatic de-allocations are negated. Love every other aspect of Go.

rvcdbn 6 years ago |

Seems like you were hitting: runtime: Large maps cause significant GC pauses #9477 [0]

Looks like this issue was resolved for maps that don't contain pointers by [1]. From the article, sounds like the map keys were strings (which do contain pointers, so the map would need to be scanned by the GC).

If pointers in the map keys and values could be avoided, it would have (if my understanding is correct) removed the need for the GC to scan the map. You could do this for example by replacing string keys with fixed size byte arrays. Curious if you experimented this approach?

[0] https://github.com/golang/go/issues/9477

[1] https://go-review.googlesource.com/c/go/+/3288

carllerche 6 years ago |

Tokio author here (mentioned in blog post). It is really great to see these success stories.

I also think it is great that Discord is using the right tool for the job. It isn't often that you need the performance gains that Rust & Tokio so pick what works best to get the job done and iterate.

_xrjp 6 years ago | |

Basically because of:

> Rust is blazingly fast and memory-efficient: with no runtime or garbage collector, it can power performance-critical services, run on embedded devices, and easily integrate with other languages.

Polyisoprene 6 years ago | |

No offense to Tokio and Rust, I really like Rust, but having someone rewriting their app because of performance limitations in their previous language choice, isn’t really someone picking the right tool for the job necessary.

I’m not so sure they would have done the rewrite if the Go GC was performing better, and the choice of Rust seems primarily based on prior experience at the company writing performance sensitive code rather than delivering business value.

say_it_as_it_is 6 years ago | | |

Correct. They wouldn't have considered Rust if the GC was performing better. They also wouldn't have even adopted Go if Elixir was sufficient. This team seems to have an incredible talent pool who is willing to push further for the sake of, as you say, delivering business value. Improving UX, investing in capabilities for growth, are valid business reasons why they're iterating over so many solutions. It's really impressive to see what they're accomplishing.

acheron9383 6 years ago | | |

Right tool for the job should also take into account the experience of the devs you have at your disposal. For an omniscient Dev, is Rust the best tool for the job? Unsure. But for them with already significant rust experience? Sounds like it.

qaq 6 years ago | | |

too much focus on "business value" often ends-up with codebase in a state that makes delivery of that business value pretty impossible. Boeing was delivering a lot of business value with MAX ...

RcouF1uZ4gsC 6 years ago |

> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.

Collections are one of the big areas where Go's lack of generics really hurts it. In Go, if one of the built in collections does not meet your needs, you are going to take a safety and ergonomic hit going to a custom collection. In Rust, if one of the standard collections does not meet your needs, you (or someone else) can create a pretty much drop-in replacement that does that has similar ergonomic and safety profiles.

correct_horse 6 years ago | |

I'm not sure what you mean by standard collections, but BTreeMap is in Rust's standard library.

pdpi 6 years ago | | |

I think the point the GP is trying to make is that there’s no reason why BTreeMap couldn’t be an external crate, while only the core Go collections are allowed to be generic.

A corollary to this is that adding more generic collections to Go’s standard library implies expanding the set of magical constructs.

zerr 6 years ago | | |

In Go, standard collections are compiler's magic while in Rust or e.g. C++ - they are implemented as libraries.

Cthulhu_ 6 years ago | |

I like to think it's a tradeoff; limit the language and standard library and you limit the amount of things you have to consider. That is, 99% of applications probably won't need a BTree.

(anecdotal: in Java I've never needed anything else than a HashMap or an ArrayList)

_ph_ 6 years ago |

If you have a problem at hand which does not really benefit from the presence of a garbage collector, switching to an implementation without a garbage collector has quite a potential to be at least somewhat faster. I remember myself to run onto this time trigger for garbage collection long in the past - though I don't remember why and mostly forgot about ever since until I read this article. As also written in the article, even if there are no allocations going on, Go forces a gc every two minutes, it is set here: https://golang.org/src/runtime/proc.go#L4268

The idea for this is (if I remember correctly) to be able to return unused memory to the OS. As returning memory requires a gc to run, it is forced in time intervals. I am a bit surprised that they didn't contact the corresponding Go developers, as they seem to be interested in practical use cases where the gc doesn't perform well. Besides that newer Go releases improved the gc performance, I am a bit surprised that they didn't just increase this time interval to an arbitrary large number and checked, if their issues went away.

KMag 6 years ago | |

Not only is there good potential for a speed improvement, but languages built around the assumption of pervasive garbage collection tend not to have good language constructs to support manual memory management.

To be fair, most languages without GCs also don't have good language constructs to support manual memory management. If you're going to make wide use of manual memory management, you should think very carefully about how the language and ecosystem you're using help or hinder your manual memory management.

jrockway 6 years ago |

This seems like a nice microservices success story. It's so easy to replace a low-performing piece of infrastructure when it is just a component with a well-defined API. Spin up the new version, mirror some requests to see how it performs, and turn off the old one. No drama, no year-long rewrites. Just a simple fix for the component that needed it the most.

thijsvandien 6 years ago | |

You don't need microservices for that, though. One might as well have moved that piece into a library.

kccqzy 6 years ago | | |

And then deal with cross-language FFI boundaries and cross-language builds.

flafla2 6 years ago |

> After digging through the Go source code, we learned that Go will force a garbage collection run every 2 minutes at minimum. In other words, if garbage collection has not run for 2 minutes, regardless of heap growth, go will still force a garbage collection.

> We figured we could tune the garbage collector to happen more often in order to prevent large spikes, so we implemented an endpoint on the service to change the garbage collector GC Percent on the fly. Unfortunately, no matter how we configured the GC percent nothing changed. How could that be? It turns out, it was because we were not allocating memory quickly enough for it to force garbage collection to happen more often.

As someone not too familiar with GC design, this seems like an absurd hack. That this 2-minute hardcoded limitation is not even configurable comes across as amateurish even. I have no experience with Go -- do people simply live with this and not talk about it?

tiffanyh 6 years ago |

It should also be noted that Rust interoperates extremely well with Erlang, which is the basis of Discord (via Rustler).

https://github.com/rusterlium/rustler

https://blog.discordapp.com/scaling-elixir-f9b8e1e7c29b

_bxg1 6 years ago |

It's always good to see a case-study/anecdote, but nothing in here is surprising. It also doesn't really invalidate Go in any way.

Rust is faster than Go. People use Go, like any other technology, when the tradeoffs between developer iteration/throughput/latency/etc. make sense. When those cease to make sense, a hot path gets converted down to something more efficient. This is the natural way of things.

kardianos 6 years ago |

I'm glad they found a good solution (rust) to solve their problem!

Also note this was with Go1.9. I know GC work was ongoing during that time, I wonder if this time of situation would still happen?

correct_horse 6 years ago |

I've heard lots of hot takes on "what Go really is". Here's mine.

Go is what would have happened if Bell Labs wrote Java.

kick 6 years ago | |

Minor nitpick: That already happened, Limbo is what happened when Bell Labs wrote Java.

monocasa 6 years ago | | |

And go is very very derived from plan 9. It could be considered a sibling of limbo in a lot of ways.

correct_horse 6 years ago | | |

Huh. I managed to hear about Inferno, but not remember the Limbo part.

In that case, Go is Bell Labs' second attempt at Java.

anthk 6 years ago | | |

More like Limbo and Inferno.

_ph_ 6 years ago | |

Interesting comment, as 2 of the main Go creators (Ken Thompson and Rob Pike) did work at the Bell Labs. So while I doubt they tried to write Java, Go in a sense was written by the Bell Labs :).

(And Kernighan was their floor-mate too, that must have been a stunningly great environment)

correct_horse 6 years ago | | |

I was taking that into consideration when I authored my comment!

flavio81 6 years ago | |

>Go is what would have happened if Bell Labs wrote Java

And Unix is what happened when Bell Labs wrote an operating system -- something that was born outdated from the start.

Just like Golang.

makapuf 6 years ago | | |

Yet simple enough to be understood and its features actually used, widespread and that stood the test of time.

yippir 6 years ago |

I chose Rust over Go after weighing the pros and cons. It was an easy decision. I wouldn't consider using a high level language that lacks generics. The entire point of using a high level language is writing less code.

shdh 6 years ago | |

The syntax looks pedantic to me. Going to require some adjusting.

mperham 6 years ago |

Better title: "One Discord microservice with extremely high traffic is moving to Rust"

jhgg 6 years ago | |

This is one of multiple, we did not blog about this one, but switching a Python http service for analytics ingest that was purely CPU bound to rust resulted in a 90% reduction in compute required to power it. However, that's not too interesting because it's known that Python is slow haha.

We have 2 golang services left, one of them has a rewrite in rust in PR as of last week (as a fun side project an engineer wanted to try out.)

Additionally, as we move towards a more SOA internally, we plan to write more high velocity data services, and rust will be our language of choice for that.

Lev1a 6 years ago | | |

> but switching a Python http service for analytics ingest that was purely CPU bound to rust resulted in a 90% reduction in compute required to power it. However, that's not too interesting because it's known that Python is slow

Kinda like this: https://blog.sentry.io/2016/10/19/fixing-python-performance-... ?

onebot 6 years ago | | |

Think replacing elixir with Rust would ever be a consideration? Rust isn't there yet, but if you are NIF'ing a bunch of stuff, seems like it could make sense at some point?

okgood288 6 years ago | | |

Well sure when it’s a micro service that probably has more lines of infra config than biz logic LOC.

This isn’t exactly “Linux kernel: now in Rust!”

Glad you’re making tech for you all better.

We get to take up the externalized runtime costs of the mess that is the Electron app.

Engineers are super efficient at offloading the last mile of effort.

tybit 6 years ago | |

Given the rampant misuse of Microservices, this was a really nice read about a seemingly well designed system.

They were able to rewrite their hot spot in a new language without having to rewrite all their business logic in a new language. Not that there wouldn’t have been solutions with a monolith, but this certainly seems elegant and precise.

unlinked_dll 6 years ago |

It'd be cool to look at more signal statistics from the CPU plot.

It appears that Go has a lower CPU floor, but it's killed by the GC spikes, presumably due to the large cache mentioned by the author.

This is interesting to me. It suggests that Rust is better at scale than Go, and I would have thought with Go's mature concurrency model and implementation would have been optimized for such cases while Rust would shine in smaller services with CPU bound problems.

Great post!

arnsholt 6 years ago | |

My first guess for the slightly higher CPU floor of the Rust version is that the Rust code has to do slightly more work per request, since it will free memory as it gets dropped, whereas the Go code doesn't do any freeing per request, but then gets hit with the periodic spike every two minutes where the entire heap has to be traversed for GC.

jhgg 6 years ago | | |

tokio 0.1 was definitely less efficient, when we compare go to 0.2, tokio uses less cpu consistently, even when compared to a cluster of the same size almost a year later with our growth over the time since we switched over.

pixel_fcker 6 years ago | |

Go's CPU floor is lower compared to the naive Rust port (roughly 20% vs 23% from eyeballing). Their optimized Rust version is shown in the next series of graphs as being ~12%.

fmakunbound 6 years ago |

Why /does/ it run a GC every 2 minutes? I went looking and didnd't find a reason in the code...

https://github.com/golang/go/search?q=forcegcperiod&unscoped...

Go's GC seems kind of primitive.

Thaxll 6 years ago |

Really interesting post, however they're using a 2+years old runtime, Go 1.9.2 was released 2017/10/25 why did they not even try Go 1.13?

For me the interesting part is that their new implementation in Rust with a new data structure is less than 2x faster than an implementation in Go using a 2+years old runtime.

It shows how fast Go is vs an very optimized language + new data structure with no GC.

Overall I'm pretty sure there was a way to make the spikes go away.

Still great post.

yazaddaruvala 6 years ago | |

The graphs were in different units. The final Rust version was over 100x faster.

Thaxll 6 years ago | | |

Which doesn't make any sense. Rust is not x100 faster than Go.

eric-hu 6 years ago |

I'm curious what the product-engineering landscape in the company looks like to allow for a language rewrite to happen. I feel like this would be a hard sell in all companies I've worked at. Was this framed as a big bug fix? Or was faster performance framed as a feature?

modo_mario 6 years ago | |

I think they're at a scale now where the cost of running it starts to become important as well. At least when we're talking about big performance increases like this.

meirelles 6 years ago |

The Twitch folks were facing a related situation with the GC. They developed a workaround that they called Ballast, reducing the overall latency and making it more predictable. Quite impressive results [0].

The Go's GC is groundbreaking in several aspects, but probably needs to provide ways to fine-tune it. Posts like this make me believe that one-size-fits-all settings are yet to be seen.

[0]: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...

johnmc408 6 years ago |

Non programmer here, but would it make sense to add a keyword (or flag) to Go to manually allocate a piece of memory (ie not use GC). That way, for some use cases, you could use avoid GC for the critical path. Then when GC happened, it could be very fast as there would be far less to pause-and-scan (in this use case example). Obviously this would have to be optional and discouraged...but there seems to be no way to write an intensive real-time app with a GC based language. (again non-programmer that is writing this to learn more ;-)

kerkeslager 6 years ago |

Go is not a general-purpose language. It's a Google language designed to solve Google's problems. If you aren't Google, you probably have different problems, which Go isn't intended to solve.

EDIT: Currently at -4 downvotes. Would downvoters care to discuss their votes?

justadudeama 6 years ago |

> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.

Can someone explain to me how BTreeMap is more memory efficient than a HashMap?

afranchuk 6 years ago | |

A BTreeMap should typically have O(n) memory usage, whereas a HashMap (depending on load factor) will usually have O(kn) memory usage, where k > 1. This is because a HashMap allocates the table into which it will store hashed values upfront (and when the load is too great), so it can't anticipate how many values may be added nor what sorts of collisions may occur at this time. Yes, collisions are typically stored as some allocate-per-item collection, but the desire of a HashMap is to avoid such collisions. A BTreeMap allocates for each new value.

Note that this explanation is a bit handwavy, as both data structures have numerous optimizations in production scenarios.

nybble41 6 years ago | | |

There is no difference between O(n) and O(kn), if k is a constant. The notation deliberately ignores constant factors. (That's why you can say a BTreeMap requires O(n) memory independent of the size or type of data being stored, provided there is some finite upper bound on the sizes of the keys and values.)

cesarb 6 years ago | | |

> collisions are typically stored as some allocate-per-item collection

Rust's HashMap stores the collisions in the same table as the non-collisions (open addressing), not in a separate collection.

jhgg 6 years ago | |

This is a bit unclear. The root map is still a hash map, but it's a "map of maps" the inner map is a BTreeMap - this is for memory efficiency, as the inner map is relatively smaller and we wouldn't have to deal with the growth factor of a hash map (and having to manually manage that.) where as the root hash map is pre allocated to its max size.

reggieband 6 years ago |

When I see this kind of GC performance, I wonder why you wouldn't change the implementation to use some sort of pool allocator. I am guessing each Read State object is identical to one another (e.g. some kind of struct) so why not pre-allocate your memory budget of objects and just keep an unused list outside of your HasMap? In a way this is even closer to a ring where upon ejection you could write the object to disk (or Cassandra), re-initialise the memory and then reuse the object for the new entry.

I suppose that won't stop the GC from scanning the memory though ... so maybe they had something akin to that. I assume that a company associated with games and with some former games programmers would have thought to use pool allocators. Honestly, if that strategy didn't work then I would be a bit frustrated with Go.

I have to say, out of all of the non-stop spamming of Rust I see on this site - this is definitely the first time I've thought to myself that this is a very appropriate use of the language. This kind of simple yet high-throughput workhorse of a system is a great match for Rust.

geodel 6 years ago |

Makes sense write most efficient stuff for in-house and give resource hog Electron apps to users.

jrockway 6 years ago | |

Discord pays for their servers, but not for their users's computers.

Hamuko 6 years ago | | |

That's fine as long as you ignore the fact that the users are the customers.

archi42 6 years ago |

Uhm, I'd suppose the service runs on one or more dedicated nodes - so there should be no competition for RAM (or if a node runs multiple services, the I'd expect a fixed memory amount to be available). In such an environment, each fixed size LRU cache could just allocate a huge chunk of RAM for data + indices (index size is bound by data size). That's nothing to do with the ownership model, it's just manually managed memory.

Yes, reality is more complex since they probably have multi socket servers/NUMA, which might add memory access latencies and atomic updates to the LRU might require a locking scheme, which also isn't trivial (and where async Rust might be useful).

stiray 6 years ago |

And brings me back to my years old nag. "Ok, you got GC, fine. But DO give me option to hand free specific memory when I want to. I don't consider hand allocation and deallocation such a pain than GC going wild."

This doesn't only go for Go.

karma_daemon 6 years ago |

I wish the article would show a graph of the golang heap usage. I'm reminded of this cloudflare article [0] from a while back where they created an example that seemed to exhibit similar performance issues when they created many small objects to be garbaged collected. They solved it by using a pooled allocator instead of relying solely on the GC. Wonder if that would have been applicable here to the go version.

[0] https://blog.cloudflare.com/recycling-memory-buffers-in-go/

mangatmodi 6 years ago |

Why would they switch to rust, rather than upgrading from 3 years old version?

jhgg 6 years ago | |

This blog post perhaps is a bit "after the fact" we had made the switch over mid 2019, and wanted to try out rust as well for services like this, due to adoption elsewhere in the company. Also, after upgrading between 4 golang versions on this service and noticing it didn't materially change performance, we decided to just spend our time on the rewrite (for fun, and latency) and to get a head start into the asynchronous rust ecosystem.

This blog post kinda internally matches our upgrade to std::futures and tokio 0.2, away from futures 0.1.

typical182 6 years ago | | |

Do you have any load tests or synthetic benchmarks that are still capable of producing this?

It would be interesting to see what a more modern Go would do given there have been a bunch of tail latency GC improvements since your older 1.9 Go version... and in an ideal world, it would be nice to file an issue on the tracker if you were still seeing this.

(Maybe that ends up later helping another one of your Go services, or maybe it just helps the community, or maybe it’s a topic for another interesting blog...).

In any event, thanks for taking the time to write up and share this one.

jamra 6 years ago | | |

This comment doesn’t make sense. Didn’t rust not have async back then?

The timelines don’t appear to fit

The_rationalist 6 years ago | | |

Out of curiosity, why didn't you choose Kotlin? It can reuse the Java ecosystem which allow you to save tons of money, and give you advanced features and scalability. It is a sexier and more ergonomic language too. And with e.g ZGC, you can have a GC that is fine tunable, and that has very low latency.

By choosing rust you will suffer a great deal of the limitations of it's poor, not production ready, ecosystem. I'm not even talking about the immaturity of the async await support.

StreamBright 6 years ago |

Pretty amazing write up from Jesse. I really like how they maxed out Go first before even thinking about a rewrite in Rust. It turns out no-GC has pretty significant advantages in some cases.

Rapzid 6 years ago | |

Unsafe or it doesn't count ;)

yobert 6 years ago |

Not to participate in the flaming-- but I'd love to hear some stats about compile times for the two versions of the service. (Excellent write-up by the way! Thanks!)

dfee 6 years ago |

The one problem I’m curious as to how channel-based chat applications solve, to which my google-fu has never lead me in the right direction: how do you handle subscriptions?

I imagine a bunch of front end servers managing open web sockets connections, and also proving filtering/routing of newly published messages. Alas, it’s probably best categorized as a multicast-to-server, multicast-to-user problem.

Anyways, if there’s an elegant solution to this problem, would love to learn more.

_qwfv 6 years ago | |

Not sure if this is exactly what you are looking for, but I'd do some digging into consistent hash rings.

dfee 6 years ago | | |

Oh, interesting: https://en.m.wikipedia.org/wiki/Consistent_hashing

> Consistent hashing maps objects to the same cache machine, as far as possible. It means when a cache machine is added, it takes its share of objects from all the other cache machines and when it is removed, its objects are shared among the remaining machines.

I guess the challenge here is that subscriptions are sparse: I.e. one ws connection can carry multiple channel subscriptions, thus undermining the consistent hash.

willvarfar 6 years ago |

This is a bit late to add, but from the description of the problem in the article, the way to make the program faster, irregardless of language, is to use a big array rather than lists and trees. Carve the array up as necessary, so the array of users to offsets in the array where the data is. Basically, be your own memory allocator, with all the loss of safety but the order of magnitude improvement in efficiency that that brings.

thedance 6 years ago |

These kinds of posts would be much more interesting if they discussed alternatives considered and rejected. For example why did they choose Rust over C++?

therockhead 6 years ago | |

The article mentioned that they have already used Rust successfully in house, so when you consider that Rust is inherently safer than C++, it seems like they picked the right language.

luord 6 years ago |

Usually this kind of article is about "migrating from massively popular language to more niche language that we like better".

This is more from niche to niche. Thought that was interesting, but yet the discussion here wasn't all that different to the usual. Guess it's flamewars always, regardless of popularity.

mister_hn 6 years ago |

Why not C++, if performance was an issue?

bluGill 6 years ago | |

Modern C++ is the right choice if you have an existing code base in C++, or you need to use features that only exist in a third party C++ library - there is a large collection of C++ libraries to choose from.

Their use case doesn't seem to have either consideration (note that even when these are considerations a hybrid of languages is often a good idea) so there isn't a compelling reason to choose C++. That doesn't mean C++ is wrong, just that there is nothing wrong with rust. Maybe a great C++ programmer can get a few tenths of a percent faster code (mostly because compiler writers spend more effort figuring out how to optimize C++ - rust uses the same llvm optimizer but it might sometimes do something less optimal because it assumed C++ input), but in general if the difference matters in your environment you are too close to the edge and need to scale.

Rust might be easier/faster to write than modern C++. If so that is a point in favor of rust. They seem to have people who know rust, which is important. There might be more people who know C++, but I can take any great programmer and make them good in any programming language in a few weeks in the worst case (worst case would be writing a large program in intercal or some such intentionally hard language) - not to be confused with expert which takes more experience.

loeg 6 years ago | |

Why would you pick C++ for a new codebase in 2019 or 2020 if Rust met your needs?

terminaljunkid 6 years ago | | |

Programmer productivity Library support

nuclx 6 years ago | | |

Compilation times.

wmf 6 years ago | |

Why C++? Why would you want the same performance as Rust with less safety?

viraptor 6 years ago |

The next step I expected after LRU tunning was to do simple sharding per user, so that there are more services with smaller caches, (cancelling out the impact) with smaller GC spikes, offset in time from each other. I'm curious if that was considered and not done for some reason.

woah 6 years ago |

Switching to Rust is a good idea, but I was wondering- would it be possible to run two identical instances in parallel and return results from the fastest one? This would almost completely eliminate GC pauses from the final output.

highfrequency 6 years ago |

Curious about their definition of “response time” in the graph at the end. They’re quoting ~20 microseconds so I assume this doesn’t involve network hops? Is this just the CPU time it takes a Read State server to do one update?

jhgg 6 years ago | |

Correct. This is internal time it takes to process the message. Since once a node is "warm" thanks to their large caches, it's mostly in memory operations and queueing for persistence which happens in the background.

Sikul 6 years ago | | |

Also worth noting: Most requests to the service have to update many Read States. For instance, when you @everyone in the Minecraft server we have to update over 500,000 Read States.

mc3 6 years ago |

More accurately "Why Discord is switching a service from Go to Rust"

dennisgorelik 6 years ago |

> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.

Why would BTreeMap be faster than HashMap? HashMap performance is O(1), while BTreeMap performance is O(log N).

scott_s 6 years ago | |

This subthread explains why it's more memory efficient to use a tree-based structure: https://news.ycombinator.com/item?id=22239393. Short version is that in order to get good performance out of a hashtable based structure, you want to have more than n slots in order to achieve good performance.

Which brings me to my second point: hashtable based data structures are not worst-case O(1). They are worst-case O(n), because in the worst case, you will either have to scan every entry in your table (open addressing) or walk a list of size n (separate chaining). Of course, good hashtable implementations will not allow a situation with so many collisions, but in order to avoid that, they will need to allocate a new table and copy over the contents of the old, which is also a O(n) operation.

Given two kinds of data structures, one which is average-case O(1), but worst-case O(n) versus best- and worst-case O(log n), which one you choose depends on what kinds of performance you're optimizing for, and how bad the constants are that we've been ignoring. If you care more about throughput, then you usually want average-case O(1), as the occasional latency spikes aren't important to you. But if you care more about latency, then you'll probably want to choose worst-case O(log n), assuming that its implementations constants aren't too bad.

Jweb_Guru 6 years ago | | |

Cuckoo hashmaps are worst case O(1) when implemented correctly, up to resizing (however, they do need more space and perform worse in virtually all real benchmarks).

nemothekid 6 years ago | |

1. They never said it was faster, only that memory usage was better. Regardless, it could be the case that log N < C, if C is sufficiently large.

2. Memory usage on a hash map would be worse especially if the fill ratio is relatively low.

deepsun 6 years ago |

Wait, isn't Go devs said they solved GC latency problems [1]?

(from 2015): "Go is building a garbage collector (GC) not only for 2015 but for 2025 and beyond: A GC that supports today’s software development and scales along with new software and hardware throughout the next decade. Such a future has no place for stop-the-world GC pauses, which have been an impediment to broader uses of safe and secure languages such as Go." [2]

[1] https://www.youtube.com/watch?v=aiv1JOfMjm0

[2] https://blog.golang.org/go15gc

terminaljunkid 6 years ago | |

That seems to be written by some Manager with slight clue of tech, tbh.

Fire-Dragon-DoL 6 years ago |

I run the same question here: can't a memory pool be used in this case?

In gaming industry there are similar problems with GC and they were solved with memory pools

nemo1618 6 years ago |

I wonder if it would be feasible to rewrite the LRU cache (either fully or in part) in a way that does not require the GC to scan the entire cache.

kerkeslager 6 years ago | |

Yes, it's possible: that's generational garbage collection. But last I heard, Google decided writing a modern GC was too complicated.

They're probably right, because Google doesn't need it. But for everyone else who decided to use a language designed to solve Google's fairly-unique problems as if it were a general-purpose language: that kind of sucks, doesn't it?

terminaljunkid 6 years ago | | |

The fact seems to be that the go team is not so well funded as it seems. Go is not Google's language in the sense C# is MS' language or Java was Sun language.

ssoroka 6 years ago | |

Check out this post, that describes exactly that process. https://blog.gopheracademy.com/advent-2018/avoid-gc-overhead...

musicale 6 years ago |

1st Law of Garbage Collection: Consistent speed and efficiency usually requires circumventing the garbage collector.

arjunbajaj 6 years ago |

Question for the Discord team: Was implementing the same service in Elixir an option? Did you try it/why not?

robocat 6 years ago | |

Discord also use Elixir - there are comments elsewhere above for why Elixir might be a bad choice in this case.

arjunbajaj 6 years ago | | |

Thanks!

donatj 6 years ago |

I feel like from the definition of the service, the entire thing could easily be replaced with a Redis cluster.

Sikul 6 years ago | |

We originally cached this data with a Redis cluster but we hit scaling issues. The Read States service only exists because Redis had issues.

hopia 6 years ago | | |

We use Elixir too and ironically, Redis became our bottleneck as well. What a useless dependency it is when running on BEAM.

donatj 6 years ago | | |

Hah, well now I feel like a dufus. Good info!

dis-sys 6 years ago |

I believe the problem described in this blog has been at least partially addressed in the Go 1.12 release.

sayusasugi 6 years ago |

Great, can the client be ported to Rust while you're at it? Electron is such a joke.

raverbashing 6 years ago |

> but because the garbage collector needed to scan the entire LRU cache in order to determine if the memory was truly free from references

Yeah please tell me again how GC is a superior solution to reference counting in cases when you know exactly when you don't need the object anymore.

(Hint: RC is not GC if the object is dealocating itself)

brylie 6 years ago |

What are some recommended resources for a gentle introduction to Rust?

fatbird 6 years ago | |

I read the Rust Programming Language book over Christmas and it's a very good introduction to it, probably one of the best I've seen for any language. It's got a good voice, and it's very good about putting enough context around Rust design decisions to understand the why as well as the how. But's it's not so long that it feels like a slog.

brylie 6 years ago | | |

Link, for convenience:

https://doc.rust-lang.org/book/

steveklabnik 6 years ago | | |

Thank you!

crazypython 6 years ago |

In D, you may explicitly delete memory while having a GC.

pmarreck 6 years ago |

Is the Discord server-side still coded in Elixir?

joseluisq 6 years ago |

That's why the {blazing-fast} term is becoming popular.

Rust won again.

pkolaczk 6 years ago |

This is consistent with my observations of porting Java code to Rust. Much simpler and nicer to read safe Rust code (no unsafe tricks) compiles to programs that outperform carefully tuned Java code.

dang 6 years ago | |

We detached this subthread from https://news.ycombinator.com/item?id=22240978.

FisherGuy44 6 years ago |

This is not a fair comparison. Go 1.9.2 was released over 2 years ago. In that time they have fixed a lot of the GC stutter issues. Comparing rust nightly to a 2 year old compiler is unfair.

blackrock 6 years ago |

Would it have been better if they went with Elixir?

Write their code in a functional style. Get the benefits of the Erlang BEAM platform.

Their system runs over the web, so time sensitivity isn’t as important, in comparison to video games, VR, or AR.

Anyone ever done a performance comparison breakdown between something like Elixir vs. Rust?

jerf 6 years ago | |

"Would it have been better if they went with Elixir?"

No. It would have been unshippably bad. BEAM is generally fairly slow. It was fast at multitasking for a while, but that advantage has been claimed by several other runtimes in 2020. As a language, it is much slower than Rust. Plus, if you tried to implement a gigantic shared cache map in Erlang/Elixir, you'd have two major problems: One is that you'd need huge chunks of the map in single (BEAM) processes, and you'd get hit by the fact BEAM is not set up to GC well in that case. It wants lots of little processes, not a small number of processes holding tons of data. Second is that you'd be trading what in Rust is "accept some bytes, do some hashing, look some stuff up in memory" with generally efficient, low-copy operations, with "copy the network traffic into an Erlang binary, do some hashing, compute the PID that actually has the data, send a message to that PID with the request, wait for the reply message, and then send out the answer", with a whole lot of layers that expect to have time to make copies of lots of things. Adding this sort of coordination into these nominally fast lookups is going to slow this to a crawl. It's like when people try to benchmark Erlang/Elixir/Go's threading by creating processes/goroutines to receive two numbers and add them together "in parallel"; the IPC completely overshadows the tiny amount of work being done. (They mention tokio, but that's still going to add a lot less coordination overhead than Erlang messages.)

Go is a significantly better language for this use case than Elixir/Erlang/BEAM is, let alone Rust.

(This is not a "criticism" of Erlang/Elixir/BEAM. It's an engineering analysis. Erlang/Elixir/BEAM are still suitable for many tasks, just as people still use Python for many things despite the fact it would be a catastrophically bad choice for this particular task. This just isn't one of the tasks it would be suitable for.)

hopia 6 years ago | | |

Not to disagree with your analysis of the performance implications, but I don't think having all that data under a single or a few processes would be the right architectural pattern to handle this in Elixir.

The article says that the data is basically "per-user", indicating that the active client connection process could be used to store the data. It already hosts other data related to the client (connection) anyway. I think updating and querying it globally would be the trouble in that case.

Another could be storing the data in mnesia, BEAM's internal mutable in-memory DB. Probably better, but still not ideal to solve this.

Anyway, you're right in that no matter how you'd try to solve this problem on pure Elixir you'd still be seeing some bottlenecks because BEAM just isn't very well suitable for this kind of problems, hence Rust.

But can you elaborate on what you mean by other platforms catching up with Elixir's inherent concurrency advantages? Which modern platforms give similar features?

blackrock 6 years ago | | |

Good points.

> It wants lots of little processes, not a small number of processes holding tons of data

Elixir/Erlang is good for handling a lot of little processes with a small amount of data. And not for a small number of processes, handling a large amount of data.

The little processes holds smaller data, and it just gets dropped, after the function is done, instead of getting reclaimed by a garbage collector.

This is probably what makes Elixir/Erlang good for telecom equipment, like packet switching hardware, but not good for more complex software applications that may need to fetch and manipulate a lot of structured data in multiple stages.

In this case, does anyone know of Elixir’s maximum throughput?

sergiotapia 6 years ago | | |

>It was fast at multitasking for a while, but that advantage has been claimed by several other runtimes in 2020.

Such as?

steveklabnik 6 years ago | |

Discord is a heavy Elixir user, and even uses it with Rust via NIF: https://blog.discordapp.com/using-rust-to-scale-elixir-for-1...

The_rationalist 6 years ago |

Borrowed from a comment:

Garbage collection has gotten a lot of updates in the last 3 years. Why would you not take the exceedingly trivial step of just upgrading to the latest Go stable in order to at least try for the free win? From the go 1.12 release notes: “Go 1.12 significantly improves the performance of sweeping when a large fraction of the heap remains live. This reduces allocation latency immediately following a garbage collection.” ¯\_(ツ)_/¯ This sounds like “we just wanted to try Rust, ok?” Which is fine. But like, just say that.

moneywoes 6 years ago |

Another blow for Google.

buboard 6 years ago |

maybe next year : why discord is switching to C

adamnemecek 6 years ago |

Rust is maturing. I legit don't think there are too many good reasons to use Go over Rust. You can call Rust from Go but not vice versa.

steveklabnik 6 years ago | |

(You can call Go from Rust: https://blog.arranfrance.com/post/cgo-sqip-rust/ )

LaserToy 6 years ago |

I’m sorry, but isn’t it cashing 101 ? Do not keep long living objects in GC managed memory. And there are ways to do it in both go and even java.

jaten 6 years ago |

just use an off heap hash table. simple. https://github.com/glycerine/offheap

Also, as others have said, lots of big GC improvements were ignored by insisting on go1.9.2 and not the latest.

favorited 6 years ago | |

The graphs are from 1.9.2, but the author said they tried 1.8, 1.9, and 1.10 and saw the same thing.

nottorp 6 years ago |

Can someone wake me up when they switch from javascript to something native in the client?

I just checked and as usually, I have an entry labeled "Discord Helper (Not Responding)" in my process list. I don't think i've ever seen it in a normal state.

zlynx 6 years ago | |

That is kind of bad Windows programming but easy to do when writing an app that doesn't need to handle Windows event messages. It probably sits in a loop waiting on socket events and doesn't care if you sent it a WM_QUIT or not. It would be easy to pump the message loop and ignore all, but why bother?

nottorp 6 years ago | | |

Lol it's a javascript thing that instantiates a copy of Chrome, not a Windows program. I doubt they know what a WM_QUIT is...

blazespin 6 years ago |

Confused, aren't they losing memory safety?

I get for certain core code situations, you want to manage all memory safety yourself (or use built in static GC), but beyond that it seems to me at a higher level you'd rather have the automatic GC. Why burden all of your developers rather than just a core few?

I don't think GC issues is a compelling argument to move everything to Rust. I'm not saying there aren't compelling arguments, but that just seems a bit odd that that's their main argument.

buzzerbetrayed 6 years ago | |

I’ve never heard the argument that moving to rust reduces memory safety. Isn’t memory safety what rust is known for?

Matthias247 6 years ago | | |

It is! But in Rust you still have an escape hatch in the form of the `unsafe` annotation which allows for mistakes which break memory safety. I don't think Go has something like that, unless you use the FFI. So saying that Go is at least as memory safe as Rust might not be too wrong of a statement.

However I think in total Rust is safer. E.g. Rust prevents a ton of race conditions in multithreaded code, which Go can not do.

echeese 6 years ago | |

Nah, guaranteed memory safety is actually one of Rust's main selling points

tonyferguson 6 years ago |

Wow, Rust is amazing, so fast! It is like these people never learnt c? Why did they spend all this time trying to optimise such a high level language? Surely they can afford a more experienced engineer who will tell them that is a path that isn't worth it? I jump straight to c when there is anything like this, although I guess Rust is an option these days.

fxtentacle 6 years ago |

Sounds like badly reinventing the wheel. If you need a large in-memory LRU cache, use memcached. Problem solved, because then Go doesn't need to allocate much memory anymore. And I'd wager that JSON serialization for sending a reply back to the client will dominate CPU load anyway, so that the overhead for Go to talk to Memcached will be barely noticeable.

shanev 6 years ago |

When a company switches languages like this, it's usually because the engineers want to learn something new on the VC's dime. They'll make any excuse to do it. As many comments here show, there are other ways to solve this problem.

dancemethis1 6 years ago |

Well, none of it matters since Discord is hostile software. No language will solve their privacy-trampling deeds.

harikb 6 years ago |

> Discord has never been afraid of embracing new technologies that look promising.

> Embracing the new async features in Rust nightly is another example of our willingness to embrace new, promising technology. As an engineering team, we decided it was worth using nightly Rust and we committed to running on nightly until async was fully supported on stable.

> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.

It is always an algorithm change

esjeon 6 years ago |

I wonder if they actually did their homework. Doesn't matter if they like it, but they could have avoided rewriting, if they wanted.

The thing is, you can allocate memory outside of Go, and GC will simply ignore such regions, since GC only scan regions known to it. (Mmap should work like a charm here.) A drawback is that pointers in such regions will not be counted, but it's easy to workaround by copying whole data, which is encouraged by the language itself.

TBH, Go sucks for storing a large amount of data. As you can see here, even the simplest cache can be problematic. The language is biased towards large datacenters, where the amount of available resources are less of a concern. Say, this problem can be solved by having external cache servers and extra nodes around them. Latency will not be idealistic, but the service will survive with minimal changes.

h2odragon 6 years ago |

Excellent write up, and effective argument for Rust in this application and others. My cynical side sums it up as:

"Go sucked for us because we refused to own our tooling and make a special allocator for this service. Switching to Rust forced us to do that, and life got better"

staticassertion 6 years ago | |

I'm confused. Build a special allocator for Go you mean? That feels like going well beyond typical "own your tooling".

h2odragon 6 years ago | | |

I'm outdated. I used to have 4 different python interpreter builds, for different purposes, where the modern world would be using lua as a glue language. I had nothing like the scale, staff, or budget of Discord; all I had was need and tools that could bend to fill it.

I think this is a great write up of why they chose a different tool. I don't say it was the wrong decision, they make that argument pretty well too. I'm still surprised that either Go isn't malleable enough to have bent around the need, or they didn't feel it worth more effort than parameter tweaking to bend it so.

monocasa 6 years ago | |

They were already not allocating, they were just stuck with a GC cycle that'd scan, not find any garbage, and scan again in two minutes.

echopom 6 years ago |

This was an extremely interesting read.

I'm quiet disappointed though they did not update their Go Version to 1.13[0][1] which would normally have remove the spike issue and thus he latency before they move to Rust...

Rust seems more performant with proper usage ( tokio + async ) but I'm more worried about the ecosystem that doesn't seem has mature has Go.

We could quote the recent[2] Drama with Actix...

[0]https://golang.org/doc/go1.13#runtime [1]https://golang.org/doc/go1.12#runtime [2]https://github.com/fafhrd91/actix-web-postmortem

chc 6 years ago | |

Why would you want to bring up the Actix author's drama? That doesn't seem like something that should reflect on a language one way or the other.

Communitivity 6 years ago | | |

Agreed. One could argue that a level of drama in the community is a sign of growing maturity and wider interest in the language, because it is evidence there is no longer a niche monoculture of devs all thinking the same way.

In the words of Steve Klabnik "Rust has been an experiment in community building as much as an experiment in language building. Can we reject the idea of a BDFL? Can we include as many people as possible? Can we be welcoming to folks who historically have not had great representation in open source? Can we reject contempt culture? Can we be inclusive of beginners?" https://words.steveklabnik.com/a-sad-day-for-rust

The Actix issue was resolved, and Actix will continue under new maintainers (https://github.com/actix/actix-web/issues/1289). So I'd argue the answer to those questions is a 'yes'.

deweller 6 years ago | | |

As an outsider to both the Go and Rust cultures, I read the Actix news and walked away with the impression that the Rust ecosystem is less mature.

bradhe 6 years ago |

Replatforming to solve this problem was a bit silly in my opinion. The solution to the problem was "do fewer allocations" which can be done in any language.

staticassertion 6 years ago | |

A) They had spent a lot of time optimizing the Go service

B) They weren't allocating a lot, and Go was enforcing a GC sweep every 2 minutes, and it was spending a lot of time on their LRU cache. To "reduce allocations" they had to cut their cache down, which negatively impacted latency.

jhgg 6 years ago | |

Your reply misses the point. We were already doing so few allocations that the GC only ran because it "had to" at every 2 minute mark. The issue was the large heap of many long lived objects.

_ph_ 6 years ago | | |

Did you try to change that interval to a much larger time?

xyzzyz 6 years ago | |

You haven't read the post carefully. Their garbage collection in Go was spiking every 2 minutes precisely because they were doing too few allocations to have it run more often.

qidydl 6 years ago | |

They addressed this in the article:

> These latency spikes definitely smelled like garbage collection performance impact, but we had written the Go code very efficiently and had very few allocations. We were not creating a lot of garbage.

The problem was due to the GC scanning all of their allocated memory and taking a long time to do so, regardless of it all being necessary and valid memory usage.

nabla9 6 years ago | |

I wonder if they attempted manual memory allocation in Go?

In many languages with GC you can actually do manual memory management relatively easily with few helper functions. You write your own allocate() and free() functions/methods. When you allocate, you check the free list first, if nothing is available, you do normal allocation. When you call free you add the object into a free list. If you memory management leaks, it triggers GC.

Usually you need to do that kind of stuff to only in few places and few data structures to cut GC 90%.

romaniitedomum 6 years ago |

You're switching to Rust because Go is too slow? Colour me sceptical, but this seems more like an excuse to adopt a trendy language than a considered technical decision. Rust is designed first and foremost for memory safety, and it sacrifices a lot of developer time to achieve this, so if memory safety isn't high in your list of concerns Rust is probably not going to bring many benefits.

hajile 6 years ago | |

Did you read the article? The naive Rust version was better than the tuned golang version in every metric. The most important one (latency) simply wasn't fixable due to golang's GC (something that is a bit of a general GC issue I might add).

romaniitedomum 6 years ago | | |

Did you read my comment? I don't dispute that the Rust version is faster in every way. I am disputing that rewriting in Rust was a sensible technical decision, and in support of this I point you to where the author describes having to use a nightly build of the compiler to get async support. Given that they had to jump through a lot of hoops to make this work, I am saying they could have achieved the same speed increase with less effort using a stable C or C++ compiler. Hell, had they invested a fraction of the time spent rewriting in Rust in the Go version, I'll bet they could have improved it to the point where there was no need to rewrite it at all.

It's clear that Discord use Rust a lot, and that they are looking for any excuse to replace existing code with Rust code.

lllr_finger 6 years ago | |

The goals of Rust are stated boldly right on the official website - "Performance" is one of them. In Discord's case, the hit in productivity was worth avoiding the GC issues in Go. I read the article and didn't come to the same conclusion, so I'm curious which passages led you to believe this was done to "adopt a trendy language"?

romaniitedomum 6 years ago | | |

Performance is not the core raison d'etre of Rust, and there are no shortage of testimonials to the difficulty new developers have with it, not to mention the slowness of its compiler. Given that, it's too much of a leap for me to get from "GC is too slow in Go" to "rewrite in Rust", at least when considered as a purely technical decision. There is no mention, for instance, of what other languages were considered. My guess is none were considered. Finally, the author states that Discord pride themselves in embracing new things, and cites having to work with the nightly build of the compiler to get async. All of this tells me that they chose Rust for non-technical reasons and were prepared to jump through all kinds of hoops to make it work. Which is fine, it's their business to run however they want, but I find the premise that Rust is an obvious choice for speed entirely unpersuasive. In most businesses, introducing unstable nightly builds of compilers to build production services would be a major red flag.

anon4242 6 years ago | |

Turns out that that boring stuff - type theory - they throw at you in uni, can be quite useful. Not only can it help with things like memory safety but also with speed. This is why for instance C++ std::sort() is faster than C qsort(), better type information available to the compiler allows it to make better optimizations. In rust the type system is king.

romaniitedomum 6 years ago | | |

No, in Ada the type system is king. If type theory is the solution, then Ada and SPARK, Ada's restricted subset for extra safety, leave Rust in the dust.

iruoy 6 years ago | |

So decreasing workload on the servers and avoiding spikes in the read states queue is bad business?

The article also states that is was quite easy to port over and didn't need any quirky tuning.

smabie 6 years ago | |

What would you recommend that doesn’t have a GC? Zig? C? Rust is a fine choice. Besides if you really don’t care, just make the entire program unsafe and you’ll still reap benefits over C or C++.

romaniitedomum 6 years ago | | |

Whatever language and toolset gets the job done with the least amount of effort. Given the hoops that Discourse had to jump through to get Rust working, that wasn't a good technical decision. They'd have got the same result with less pain with C++.

_--___-___ 6 years ago |

"We want to make sure Discord feels super snappy all the time" is hilarious coming from a program that is infamous for making you read 'quirky' loading lines while a basic chat application takes several seconds to start up.

Don't really know about Go versus Rust for this purpose, but don't really care because read states (like nearly everything that makes Discord less like IRC) is an anti-feature in any remotely busy server. Anything important enough that it shouldn't be missed can be pinned, and it encourages people to derail conversations by replying out of context to things posted hours or days ago.

penagwin 6 years ago | |

To be fair it really doesn't take that long, and often it's because it's auto updating, but it's not more then a couple seconds.

The big thing IMO is that once started I normally leave discord running, and most actions within discord itself feel very snappy - E.g. You click on a voice channel and you're instantly there. I think that's what they mean, they're trying to keep the delay for such an action low. Sometimes you click a voice Chanel and there's a few seconds of delay, those for some reason more annoying then the long (ish) startup time

anchpop 6 years ago | |

I don't see why that's hilarious. Lots of programs take a second or two to load and it only happens once on boot for me. "Read states" is just discord telling you which channels and servers you have unread messages in

wvenable 6 years ago | | |

Discord takes longer to start up than Microsoft Word.

Desktop development is a total wasteland these days -- there isn't nearly as much effort put into optimization as server side. They're not paying for your local compute, so they can waste as much of it as they want.