Using Rust in non-Rust servers to improve performance

Using Rust in non-Rust servers to improve performance(github.com)

406 points by amatheus 1 year ago | 273 comments

jchw 1 year ago |

Haha, I was flabbergasted to see the results of the subprocess approach, incredible. I'm guessing the memory usage being lower for that approach (versus later ones) is because a lot of the heavy lifting is being done in the subprocess which then gets entirely freed once the request is over. Neat.

I have a couple of things I'm wondering about though:

- Node.js is pretty good at IO-bound workloads, but I wonder if this holds up as well when comparing e.g. Go or PHP. I have run into embarrassing situations where my RiiR adventure ended with less performance against even PHP, which makes some sense: PHP has tons of relatively fast C modules for doing some heavy lifting like image processing, so it's not quite so clear-cut.

- The "caveman" approach is a nice one just to show off that it still works, but it obviously has a lot of overhead just because of all of the forking and whatnot. You can do a lot better by not spawning a new process each time. Even a rudimentary approach like having requests and responses stream synchronously and spawning N workers would probably work pretty well. For computationally expensive stuff, this might be a worthwhile approach because it is so relatively simple compared to approaches that reach for native code binding.

tln 1 year ago | |

The native code binding was impressively simple!

7 lines of rust, 1 small JS change. It looks like napi-rs supports Buffer so that JS change could be easily eliminated too.

jchw 1 year ago | | |

I've used napi-rs a bit ago, it's pretty awesome. That said though, the main issue is that the Rust bindings story is not always that nice. It really depends. Internally, Node modules have quite a lot of complexity, and when you try to do more interesting things you could wind up facing some of the complexity of how it is implemented.

sunshowers 1 year ago | |

Depends on the situation, but posix_spawn is really fast on Linux (much faster than the traditional fork/exec), and independent processes provide fault isolation boundaries.

VMG 1 year ago | |

> You can do a lot better by not spawning a new process each time. Even a rudimentary approach like having requests and responses stream synchronously and spawning N workers would probably work pretty well

And with just a tiny bit of extra work you can give the worker an http interface.... Wait a minute.,.

tialaramex 1 year ago | |

Caveman approach has several nice features - I think I'd be tempted even if it didn't have better performance.

eandre 1 year ago |

Encore.ts is doing something similar for TypeScript backend frameworks, by moving most of the request/response lifecycle into Async Rust: https://encore.dev/blog/event-loops

Disclaimer: I'm one of the maintainers

internetter 1 year ago | |

What's your response to this? https://github.com/encoredev/ts-benchmarks/issues/2

eandre 1 year ago | | |

I've published proper instructions for benchmarking Encore.ts now: https://github.com/encoredev/ts-benchmarks/blob/main/README..... Thanks!

uncomplexity 1 year ago | | |

not gp bot first time seeing this encore ts.

i've been a user of uwebsockets.js, uwebsockets is used underneath by bun.

i hope encore does benchmark compared to encore, uwsjs, bun, and fastify.

express is just so damn slow.

https://github.com/uNetworking/uWebSockets.js

isodev 1 year ago |

This is a really cool comparison, thank you for sharing!

Beyond performance, Rust also brings a high level of portability and these examples show just how versatile a pice of code can be. Even beyond the server, running this on iOS or Android is also straightforward.

Rust is definitely a happy path.

jvanderbot 1 year ago | |

Rust deployment is a happy path, with few caveats. Writing is sometimes less happy than it might otherwise be, but that's the tradeoff.

My favorite thing about Rust, however, is Rust dependency management. Cargo is a dream, coming from C++ land.

krick 1 year ago | | |

Everything is a dream, when coming from C++ land. I'm still incredibly salty about how packages are managed in Rust, compared to golang or even PHP (composer). crates.io looks fine today, because Rust is still relatively unpopular, but 1 common namespace for all packages encourages name squatting, so in some years it will be a dumpster worse than pypi, I guarantee you that. Doing that in a brand-new package manager was incredibly stupid. It really came late to the market, only golang's modules are newer IIRC (which are really great). Yet it repeats all the same old mistakes.

csomar 1 year ago | | |

Cargo is also a fantasy dream coming from npm/yarn/etc.. whatever garbage they keep adding. Being able to go to docs.rs and get the method signature is invaluable.

xyst 1 year ago |

In my opinion, the significant drop in memory footprint is truly underrated (13 MB vs 1300 MB). If everybody cared about optimizing for efficiency and performance, the cost of computing wouldn’t be so burdensome.

Even self-hosting on an rpi becomes viable.

rwaksmunski 1 year ago |

Pretty sure Tier 4 should be faster than that. I wonder if the CPU was fully utilized on this benchmark. I did some performance work with Axum a while back and was bitten by Nagle algorithm. Setting TCP_NODELAY pushed the benchmark from 90,000 req/s to 700,000 req/s in a VM on my laptop.

pjmlp 1 year ago |

And so what we were doing with Apache, mod_<pick your lang> and C back in 2000, is new again.

At least with Rust it is safer.

ports543u 1 year ago |

While I agree the enhancement is significant, the title of this post makes it seem more like an advertisement for Rust than an optimization article. If you rewrite js code into a native language, be it Rust or C, of course it's gonna be faster and use less resources.

mplanchard 1 year ago | |

Is there an equivalently easy way to expose a native interface from C to JS as the example in the post? Relatedly, is it as easy to generate a QR code in C as it is in Rust (11 LoC)?

ports543u 1 year ago | | |

> Is there an equivalently easy way to expose a native interface from C to JS as the example in the post?

Yes, for most languages. For example, in Zig (https://ziglang.org/documentation/master/#WebAssembly) or in C (https://developer.mozilla.org/en-US/docs/WebAssembly/C_to_Wa...)

> Relatedly, is it as easy to generate a QR code in C as it is in Rust (11 LoC)?

Yes, there are plenty of easy to use QR-code libraries available, for pretty much every relevant language. Buffer in, buffer out.

AndrewDucker 1 year ago | | |

It's that simple in Rust because it's using a library. C also has libraries for generating QR codes: https://github.com/ricmoo/QRCode

(Obviously there are other advantages to Rust)

baq 1 year ago | |

'of course' is not really that obvious except for microbenchmarks like this one.

ports543u 1 year ago | | |

I think it is pretty obvious. Native languages are expected to be faster than interpreted or jitted, or automatic-memory-management languages in 99.9% of cases, where the programmer has far less control over the operations the processor is doing or the memory it is copying or using.

echelon 1 year ago |

Rust is simply amazing to do web backend development in. It's the biggest secret in the world right now. It's why people are writing so many different web frameworks and utilities - it's popular, practical, and growing fast.

Writing Rust for web (Actix, Axum) is no different than writing Go, Jetty, Flask, etc. in terms of developer productivity. It's super easy to write server code in Rust.

Unlike writing Python HTTP backends, the Rust code is so much more defect free.

I've absorbed 10,000+ qps on a couple of cheap tiny VPS instances. My server bill is practically non-existent and I'm serving up crazy volumes without effort.

Dowwie 1 year ago |

Beware the risks of using NIFs with Elixir. They run in the same memory space as the BEAM and can crash not just the process but the entire BEAM. Granted, well-written, safe Rust could lower the chances of this happening, but you need to consider the risk.

mijoharas 1 year ago | |

I believe that by using rustler[0] to build the bindings that shouldn't be possible. (at the very least that's stated in the readme.)

> Safety : The code you write in a Rust NIF should never be able to crash the BEAM.

I tried to find some documentation stating how it works but couldn't. I think they use a dirty scheduler, and catch panics at the boundaries or something? wasn't able to find a clear reference.

[0] https://github.com/rusterlium/rustler

junon 1 year ago | | |

I have no evidence of this but they may be liberally using catch_unwind: https://doc.rust-lang.org/std/panic/fn.catch_unwind.html

voiper1 1 year ago |

Wow, that's an incredible writeup.

Super surprised that shelling out was nearly as good any any other method.

Why is the average bytes smaller? Shouldn't it be the same size file? And if not, it's a different alorithm so not necessarily better?

djoldman 1 year ago |

Not trying to be snarky, but for this example, if we can compile to wasm, why not have the client compute this locally?

This would entail zero network hops, probably 100,000+ QRs per second.

IF it is 100,000+ QRs per second, isn't most of the thing we're measuring here dominated by network calls?

munificent 1 year ago | |

It's a synthetic example to conjure up something CPU bound on the server.

jeroenhd 1 year ago | |

WASM blobs for programs like these can easily turn into megabytes of difficult to compress binary blobs once transitive dependencies start getting pulled in. That can mean seconds of extra load time to generate an image that can be represented by maybe a kilobyte in size.

Not a bad idea for an internal office network where every computer is hooked up with a gigabit or better, but not great for cloud hosted web applications.

nemetroid 1 year ago | |

The fastest code in the article has an average latency of 14 ms, benchmarking against localhost. On my computer, "ping localhost" has an average latency of 20 µs. I don't have a lot of experience writing network services, but those numbers sound CPU bound to me.

bdahz 1 year ago |

I'm curious what if we replace Rust with C/C++ in those tiers. Would the results be even better or worse than Rust?

znpy 1 year ago | |

It should be pretty much the same.

The article is mostly about exemplifying the various leve of optimisation you can get by moving “hot code paths” to native code (irrespective whether you write that code in rust/c++/c.

Worth noting that if you’re optimising for memory usage, rust (or some other native code) might not help you very much until you throw away your whole codebase, which might not be always feasible.

kelnos 1 year ago | |

It should be about the same, though the main differences are likely to be caused by the speed of the QR code generator, and the PNG compressor.

But assuming that the hypothetical C and C++ versions would be using generators and compressors of similar quality, it performance characteristics should be similar.

The big plus(es) to using Rust over C/C++ are a) the C and C++ versions would not be memory-safe, and b) it looks like Rust's WASM tooling (if that's the approach you were to use) is excellent.

(As someone who has written C code for more than 20 years, and used to write older-standard C++ code, I would never ever write an internet-facing server in either of those languages. But I would feel just as confident about the security properties of my Rust code as I would for my Java code.)

Imustaskforhelp 1 year ago | |

also maybe checking out bun ffi / I have heard they recently added their own compiler

jinnko 1 year ago |

I'm curious how many cores the server the tests ran on had, and what the performance would be of handling the requests in native node with worker threads[1]? I suspect there's an aspect of being tied to a single main thread that explains the difference at least between tier 0 and 1.

1: https://nodejs.org/api/worker_threads.html

pretzelhammer 1 year ago | |

As the article mentions, the test server had 12 cores. The Node.js server ran in "cluster mode" so that all 12 cores were utilized during benchmarking. You can see the implementation here (just ~20 lines of JS): https://github.com/pretzelhammer/using-rust-in-non-rust-serv...

tialaramex 1 year ago | |

Doesn't "the 12 CPU cores on my test machine" answer your question ?

bhelx 1 year ago |

If you have a Java library, take a look at Chicory: https://github.com/dylibso/chicory

It runs on any JVM and has a couple flavors of "ahead-of-time" bytecode compilation.

bluejekyll 1 year ago | |

This is great to see. I had my own effort around this that I could never quite get done.

I didn’t notice this on the front page, what JVM versions is this compatible with?

evacchi 1 year ago | | |

Java 11+ :)

Already__Taken 1 year ago |

Shelling out to a CLI is quite an interesting path because often that functionality could be useful handed out as a separate utility to power users or non-automation tasks. Rust makes cross-platform distribution easy.

dyzdyz010 1 year ago |

Make Rustler great again!

demarq 1 year ago |

I didn’t realize calling to the cli is that fast.

kelnos 1 year ago | |

I doubt it's actually calling out to the CLI (aka the shell); presumably it's just fork()ing and exec()ing.

On Linux, fork() is actually reasonably fast, and if you're exec()ing a binary that's fairly small and doesn't need to do a lot of shared library loading, relocations, or initialization, that part of the cost is also fairly low (for a Rust program, this will usually be the case, as they are mostly-statically-linked). Won't be as low as crossing a FFI boundary in the same process (or not having a FFI boundary and doing it all in the same process) of course, but it's not as bad as you might think.

lsofzz 1 year ago |

bebna 1 year ago |

For me a "Non-Rust Server" would be something like a PHP webhoster. If I can run my own node instance, I can possible run everything I want.

bluejekyll 1 year ago | |

The article links to two PHP and Rust integration strategies, WASM[1] or native[2].

[1] https://github.com/wasmerio/wasmer-php

[2] https://github.com/davidcole1340/ext-php-rs