Why Discord is switching from Go to Rust (2020)(discord.com) |
Why Discord is switching from Go to Rust (2020)(discord.com) |
If golang had this, then it might not ever need to run its GC because it could just fire up a new region for every request. The request will likely end and blast away its memory before it needs to collect, or it could choose to collect only when that particular goroutine/region is blocked.
Extra benefit: if there's an error in one region, we can blast it away and the rest of the program continues!
[0] https://tutorial.ponylang.io/types/actors.html#concurrent
[1] https://cone.jondgoodwin.com/fast.html
[2] https://verdagon.dev/blog/seamless-fearless-structured-concu...
edit: reading comprehension skills are lacking, please see comment below for why I'm wrong
Turns out no one on the team actually looked into issues in the Go repo to see if it was being addressed. Looks like they just wanted to write Rust, which is fine Rust is cool, but let’s not deceive ourselves.
(Anyone know if they're still using Rust?)
It has been an incredible success. I plan to blog more about it in the coming months. Our usage of Rust is continuing to grow, and if you check out our jobs page, you might notice all backend / infra jobs list Rust in them now :)
I think probably 40% of requests are handled directly by rust services now, with the rest involving one or more rust service called from our Python API layer.
Makes it pretty hard to find stuff!
Yes, Rust provides a more predictable, faster memory management model than Go. At the expense of unpredictable, expensive memory leaks triggering application termination.
Curious how much time and effort was dedicated to improving gc, which is a useful endeavor in its own right.
Not saying this is the case here but highly likely.
That said, I've worked on several embedded systems, and the never allocate memory rule that most of them had for runtime was critical to maintaining real-time-like performance. One was written in C++, which meant that we basically couldn't make use of most of the stl and boost. We had to roll our own implementations of plenty of data structures used on the performance critical threads as a result. I couldn't imagine using a language with GC baked in for such a system. But the results spoke for themselves: microsecond level latencies and performance that scaled well with increased CPU core counts.
With discord, I imagine a big reason why Rust was considered as an alternative to Go is because they already have a substantial Elixir codebase. Rust and Elixir have a very easy time communicating with one another via Erlang NIFs (native function interfaces). You can embed languages like C/C++/Rust into elixir without much overhead. While I've never personally tried do to such a thing with go, I can't imagine its a smooth experience. You'd probably need to use Ports or CNodes for Go simply for this reason.
I love go myself, but one of the biggest turn offs for the language is its FFI support for C and other C connected languages. CGo is a relatively expensive investment when compared to many other comparable alternatives and it should be avoided if possible.
We use Rust over Go, not only because of the garbage collection issues, but because it's truly a better language in almost every way (once you learn it!)
I will say, Go is much easier to pick up, but in exchange you pay in the long term having a language that actively works against you when you start working on more advanced programs, and a mountain of code that's accumulated over the years that you have to maintain.
We work on high concurrency systems here, and I very much enjoy not ever having to think about "is this thing thread safe" because the compiler is checking that for you. I love being able to use the type system to offer my co-workers powerful, but difficult to misuse libraries. I like having sensible abstractions around concurrent execution.
Like, for example, if you create a channel in go, and for whatever reason, don't try to read from the channel, or give up (because you're racing a timeout), then the goroutine that tries to write to that channel will block forever and leak. In Rust, if you try to write to a channel where there is no longer a receiver, the write to channel will return an error, which you can then choose to handle, or simply ignore depending on your use-case. Of course, you can be wise and allocate your channels with a capacity of 1, but you can also just completely forget that, and start a steady leak of goroutines for the lifetime of your program that the garbage collector won't save you from!
Want to execute many futures with bounded concurrency in Rust and collect the results back into a Vec, but give up if any of the futures fail, or if a timeout is elapsed, and also make sure that all allocated resources are properly dropped and closed in the event that any errors happen? Just combine a futures::stream::StreamExt::{buffer_unordered, collect}, and a tokio::time::timeout, and in a few lines of code you've done it.
Want to do the same in Go? Spawn a pool of goroutines I guess, distribute two channels, one for sending them work, one for receiving work, and don't forget to throw in a WaitGroup, pass a context along to all the goroutines, make sure you don't forget any defers, if you are using a shared resource, make sure it's thread safe, or make sure you're locking/unlocking the appropriate mutex, make sure you size your result channel appropriately or you might leak goroutines and any allocations they hold if your main goroutine that's spawned all that work timed out waiting for the results to come in. Is there a library that does all this for you in Go? I googled "golang run many goroutines and collect their results" and looked at the first page of results, and it's basically the above...
It is no surprise then that we've picked to use Rust pretty seriously. When you're looking to build reliable systems with serious speeds and massive concurrency, you pick the best tool for the job. That for us is Rust, not Go. And for our real time Distributed systems, we pick Elixir, because BEAM/OTP is just so dang good.
Rust’s borrow checker does defend against it as well for safe rust, unless you are doing something very stupid, this is just false.
What about complexity? How does "enterprise Go" code compare to "enterprise Rust" code? And what about the tooling. The other threads here are dwelling on GC, and latency, and threading and so on.
One of Go's selling points is that it tends to force writing simple-to-read code.
My general experience in C++/C#/Java/Kotlin is once a code base gets beyond a certain size and number of developers, without any discipline, it becomes a hot mess.
I've used both languages extensively. I like both languages. I still reach for golang first because its faster to develop with simply due to the compile times.
Rust vs Go discussions are pretty silly. More folks should think of Rust + Go. There are a lot of glue-services that can take the trade-off of a GC, be magnitudes better than Python, and be maintained by a short-staffed team.
Plus, Rust is still plagued by Python (and others) Async problem — it's easy to accidentally block the event loop. When you have a ton of tiny glue services, esp. if one has a lot of contributors / internal libs, it's an easy mistake. Critical path things are worth the attention to detail. Others... ¯\_(ツ)_/¯
Okay.
The documentation page for std::mem::forget goes through all the alternatives you should try before resorting to std::mem::forget.
Now, perhaps std::mem::forget should be marked unsafe. However, you don't just "accidentally" run std::mem::forget.
BTW, one of the problems with GC languages is the fact that you never know when your destructor might get run (ie. object gets reclaimed) so your GC thinks life is just fine but ... oops ... you just ran out of file descriptors because they are all waiting to be reclaimed.
It was in the past IIRC. The problem is that there are multiple ways to leak resources in safe Rust (e.g. creating `Rc` cycles), and the compiler cannot prevent them all. So leaking is safe, and because it's also useful sometimes, `std::mem::forget()` and friends (e.g. `Box::leak()`) are also safe. That being said, it is pretty hard to accidentally leak memory.
That's how they solve it, C# is my fav language, but it's probably because I've only really spent time in C++ and C#, the rest is just "scripting". So the syntax is familiar.
Leaking is pretty safe, not always desired but definitely not dangerous like random pointers derefs.
But as OP says, leaking accidentally is uncommon.
[0] https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html
Leaking in Rust requires either explicit leaking function call, leaking in C/FFI code, or a type that is 1. recursive 2. refcounted 3. with interior mutability, and 4. a programming error in use of such type. If any of these four conditions aren’t met, it can’t leak.
The discussion at the time on Reddit [1] mentions this. The general discussion as well talked about if the improvements, which were big in many cases, would have even improved this particular case. We’ll never truly know.
That said it is important to recognize that Go’s GC has received significant upgrades over the years, and remember that what’s true in the past may not be true today.
1: https://www.reddit.com/r/programming/comments/eyuebc/why_dis...
That being said it's not something that 100% is guaranteed to fix the issue so maybe they did test this and just didn't mention it in the blog.
But I still don't understand, because....
(NB: I'm not a GC expert, just a curious amateur, so my apologies if there are errors in the following, and the opportunity to be corrected in these errors is part of why I'm posting this.)
Regarding the "not much garbage => theoretically times would be shorter", my understanding is that this is actually not how GC works. The GC time is a function of the size of the GC pool, because GC works by walking ("tracing") the tree of live references. So the only way to make GC faster is to have not less garbage, but less stuff allocated at all.
Multi-generational GC works by dividing the whole pool into smaller pools, so that most GC passes only visit the high-churn nursery, but even then some GC passes need to read the
TFA mentions this, where they say "the spikes were huge not because of a massive amount of ready-to-free memory, but because the garbage collector needed to scan the entire [thing we were keeping track of]".
That is, they had virtually no garbage to collect, and that wasn't speeding up the GC. Which is consistent with how all tracing GC works, as far as I know.
Comments/corrections/clarifications are requested!!
Perhaps if the intent wasn't to convince their managers to let them write it in Rust, they would have tried using the latest Go version at the time?
Not to mention, the article made no effort to establish that it's describing the world 2 years prior to this being written