Something that stood out to me is the author’s conclusion that "Node.js wins." However, both the Node.js and C++ versions use the same library, uWebSockets! I suspect the actual takeaway is this:
"uWebSockets wins, and the uWebSockets authors know their library well enough that even their JavaScript wrapper outperforms my own implementation in plain C++ using the same library!" :-p
Makes me wonder if there’s something different that could be done in Go to achieve better performance. Alternatively, this may highlight which language/library makes it easier to do the right thing out of the box (for example, it seems easier to use uWebsockets in nodejs than in C++). TechEmpower controversies also come to mind, where "winning" implementations often don’t reflect how developers typically write code in a given language, framework, or library.
--
1: https://github.com/matttomasetti?tab=repositories&q=websocke...
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.
To me, the whole paper is full of misunderstanding, at least the analysis. There's just speculation based on caricatures of the language, like "node is async", "c++ is low level" etc. The fact that their C++ impl using uWebSocket was significantly slower than then Node, which used uWebSocket bindings, should have led them to question the test setup (they probably used threads which defeats the purpose of uWebSocket.
Anyway.. The "connection time" is just HTTP handshake. It could be included as a side note. What's important in WS deployments are:
- Unique message throughput (the only thing measured afaik).
- Broadcast/"multicast" throughput, i.e. say you have 1k subscribers you wanna send the same message.
- Idle memory usage (for say chat apps that have low traffic - how many peers can a node maintain)
To me, the champion is uWebSocket. That's the entire reason why "Node" wins - those language bindings were written by the same genius who wrote that lib. Note that uWebSocket doesn't have TLS support, so whatever reverse proxy you put in front is gonna dominate usage because all of them have higher overheads, even nginx.
Interesting to note is that uWebSocket perf (especially memory footprint) can't be achieved even in Go, because of the goroutine overhead (there's no way in Go to read/write from multiple sockets from a single goroutine, so you have to spend 2 gorountines for realtime r/w). It could probably be achieved with Tokio though.
- Rust is run in debug mode, by omitting the --release flag. This is a very basic mistake.
- Some implementations is logging to stdout on each message, which will lead to a lot of noise not only due to the overhead of doing so, but also due to lock contention for multi-threaded benchmarks.
- It states that the Go implementation is blocking and single-threaded, while it in fact is non-blocking and multi-threaded (concurrent).
- It implies the Rust implementation is not multi-threaded, while it in fact is because the implementation spawns a thread per connection. On that note, why not use an async websocket library for Rust instead? They're used much more.
- Gives VM-based languages zero time to warm up, giving them very little chance to do one of their jobs; runtime optimizations.
- It is not benchmarking websocket implementations specifically, it is benchmarking websocket implementations, JSON serialization and stdout logging all at once. This adds so much noise to the result that the result should be considered entirely invalid.
> To me, the champion is uWebSocket. That's the entire reason why "Node" wins [...]
A big part of why Node wins is because its implementation is not logging to stdout on each message like the other implementations do. Add a console.log in there and its performance tanks.
The distinction is important because assuming HTTP implies WebSockets is a channel riding over an HTTP server. Neither the client or server cares if you provide any support for HTTP so long as the connection is achieved. This is easily provable.
It also seems you misunderstand the relationship between WebSockets and TLS. TLS is TCP layer 4 while WebSockets is TCP layers 5 and 6. As such WebSockets work the same way regardless of TLS but TLS does provide an extra step of message fragmentation.
There is a difference in interpreting how a thing works and building a thing that does work.
Also, this paper is from Feb 2021.
Well, they did use the standard library for that, so quite a bit, I suppose. That thing is slow. I've got no idea how fast those functions are in other languages, but you're right that it would ruin the idea behind the benchmark.
I had a blog post somewhere to show the testing and results, but can't seem to find it at the moment though.
It's also interesting that https://github.com/websockets/ws does not appear in this study, given that in the node ecosystem it is ~3x more likely to be used (not a perfect measurement but ws has 28k github stars vs uWebSockets 8k stars)
The title is misleading because exactly one implementation was chosen for each of the tested languages. They conclude “do not us e Python” because the Python websockets library performs pretty poorly.
Each language is scored based on the library chosen. I have to believe there are more options for some of these languages.
As someone who is implementing an Elixir LiveView app right now, I was particularly curious to see how Elixir performed given LiveViews reliance on websockets, but as Elixir didn’t make the cut.
On the contrary, they tried autobahn and aiohttp as well:
For the Python websocket, a generic module is used which is simply named "websockets". ... This is most likely a module that offers the simplest of websocket functionality. Now, it was mentioned that this only partly explains the poor performance. While writing this report, it seemed unjust not to give Python a fighting chance. So, the websocket server has been rebuilt with the more trusted Autobahn library and the benchmark test has been rerun. This new server does lead to better results ... still unable to finish the benchmark test.... [T]he Python server is rebuilt one more time, this time with a library by the name of "aiohttp." At last, all 100 rounds of the benchmark are able to be completed, though not very well. Aiohttp still takes longer than Go, and becomes substantially unreliable after round 50, dropping anywhere from 30-50% of the messages. It can only be concluded that the reason for this dreadful performance is Python itself.
The author couldn't tell why he didn't manage to make run the C or python program but figured it is probably the blame of the language for some obscure reasons.
He also mentioned that he should have implemented multithreading in C++ to be comparable with Node, but meh that's probably also not of his concern, let compare them as is ^^`
Also he doesn't mention the actual language of the library used, but that would have voided the interest of the article, so I quite may understand that omission :P
But at the end, nothing can be learned from this and it is hard to believe it is what "research" can produce
* I was able to send a little under 11x faster than I could process the messages on the receiving end. I suspected this was due to accounting for processing of frame headers with consideration of the various forms of message fragmentation. I also ran both send and receive operations on the same machine which could have biased the numbers
* I was able to send messages on my hardware at 280,000 messages per second. Bun claimed, at that time, a send rate of about 780,000 messages per second. My hardware is old with DDR3 memory. I suspect faster memory would increase those numbers more than anything else, but I never validated that
* In real world practical use switching from HTTP for data messaging to WebSockets made my big application about 8x faster overall in test automation.
Things I suspect, my other assumptions:
* A WebSocket library can achieve superior performance if written in a strongly typed language that is statically compiled and without garbage collection. Bun achieved far superior numbers and is written in Zig.
* I suspect that faster memory would lower the performance gap between sending and receiving when perf testing on a single machine
TLDR; NodeJS is the clear winner, and Python far and away the worst of the bunch.
https://github.com/reactphp/event-loop
That library can use either select, libuv, libev or libevent if I'm not mistaken. Fibers are not used at this point, although other libraries have explored the idea (revoltphp).
If we're assuming the paper author installed a typical PHP, then it's using select for async I/O. It's the slowest implementation of the event loop. Using something like swoole would extract even more performance out of PHP for async io scenarios.
The issue with Python that the author faced takes 2 minutes to identify and fix: raise ulimits.
Finally, one can question the value of such benchmark in real world applications, especially when the supporting article is so poorly researched as other already pointed.
Also I was wrong about uWebSocket, they do have tls support so you can skip reverse proxy. It deals with raw tcp conns and thus to encrypt you need tls support there. It is also a barebones http/1.1 server because why not. The thing I misremembered is I confused tls with http/2 which it does not support. This is unrelated to WS.
What you would do in go is:
- either a new goroutine per message
- or installing a worker pool with a predefined goroutine size accepting messages for processing
The client fires off all the requests before waiting for a response: https://github.com/matttomasetti/NodeJS_Websocket-Benchmark-... so the comparison isn't quite apples to apples.
Edit to add: looks like the same goes for the c++ and rust implementations. So I think what we might be seeing in this benchmark (particularly the node vs c++ since it is the same library) is that asynchronously handling each message is beneficial, and the go standard libraries json parser is slow.
Edit 2: Actually I think the c++ version is async for each message! Dont know how to explain that then.