Comparing Rust and JavaScript(chaotic.netlify.app) |
Comparing Rust and JavaScript(chaotic.netlify.app) |
Rust is a language that compiles to WASM. Sure, there are other options to compile to WASM, but they differ by interface (programming language, library, tooling) and (likely to a lesser degree) - performance.
From the user-experience perspective, none of the page loads without javascript, so it is all javascript from a customer's perspective.
And I'm not sure if we can assume that all languages that offer compilation to WASM would be roughly "rust fast": I wouldn't be surprised if we already had some examples that aren't really good matches for that tight inner loop case you might want to wasmify. Perhaps there are already some that just want to get the "runs in browser" checkbox ticked?
But havent looked into the details on how much difference that alone makes in this case.
I think its also a testament on how fast JS has gotten.
> Rust is faster than JS in some cases, but those cases are rarer than we expected, and the performance gain is on the order of 2x some of the time, not 10x most of the time.
Separately, though, on very large applications wasm does have an advantage on startup times: there is no warmup period as the JIT learns the types, no sluggish first frames for the entire application or the first time you click on a feature that hits a new code path. For something like Photoshop or a game engine that can be a very big deal. But, again, this type of application is not the most common, so JS will remain the best option for most things.
I'm really not so sure about this (see my comment further down[1]) but I'd love someone more knowledgeable to chime in.
It's not a bad use case for JS too. I'm sure there will be problems more biased into one of the languages than this.
Although it's obvious that you're comparing JavaScript and WASM performance, the devil is in the details. What exactly are you comparing?
There's quite a bit of overhead in calling out from WASM to the DOM; how are you making WASM faster? How much "JavaScript" is involved in the WASM version? Are you manipulating a Canvas? Generating a bitmap?
If you check out the code, you will notice that wasm isn't talking to the DOM. Its purpose is to generate the image data as a Uint8Array, which is then passed to the canvas [0], same way the javascript implementation does it
[0] - https://github.com/dmaynard/chaos-screen-saver/blob/93187f84...
That's why you need an "about" blurb.
On my 4k screen (which I run at 100% scaling) on Firefox on Linux I get around 2700 on ES and 7500 on WASM.
But if I reduce the window size to half the screen, I get 3000 for ES and 10000 for WASM.
JavaScript: https://github.com/dmaynard/chaos-screen-saver/blob/master/s...
Rust: https://github.com/dmaynard/rust-wasm-attractor/blob/master/...
1247 px/ms Chrome JS
1681 px/ms Firefox JS
5878 px/ms Chrome WASM
6382 px/ms Firefox WASMChrome
JS: 2900
WASM: 13000
Firefox
JS: 5000
WASM: 16000
Safari
JS: 5500
WASM: 18000
I would be more interested in a more realistic test with scenarios replacing web frameworks like React. This benchmark seems more like something you would do with shaders anyway.
If you have a few algs. you need to run under the hood, preferably with few cross domain calls, that can be expressed mostly using a small bunch of native types and not a lot of maps/string etc. then something low-level like WASM might help.
But that doesn't match most real world scenarios very well.
I'd love for someone who knows this problem a little better to chime in, but this doesn't look like that heavy a task to me. I've read through the code very briefly. The core of the attractor logic[1] seems to be a few trig functions with a lot of bookkeeping around it (e.g. measuring performance to know how many iterations will fit within each frame budget, I think?). But each iteration depends on the state of previous iterations anyway (stored in module-globals[2]) so there's a data dependency that a CPU isn't going to have a great time with.
I'd be interested to know where most of the time is being spent in this program, but I don't think it's really playing to the strengths of Rust/WASM.
[1]: https://github.com/dmaynard/attractor-iterator/blob/b9274289... [2]: https://github.com/dmaynard/attractor-iterator/blob/b9274289...
What exactly do you mean by that? WASM is assembly, the memory management is entirely manual, and it's currently a very bad target for garbage collection.
Anyway, Rust has more advantages than memory safety.
On my Linux laptop, Firefox: 1.7K vs 3K
On my Linux laptop, Brave: 1.5K vs 6K
It does seem to vary quite a bit based on the pattern being drawn, though. These ratios seem to roughly hold, but there's wide variance.
Atomics have universal support in modern browsers.
This is primarily a single-threaded problem. Each x and y update depends directly on the previous x and y values with the x and y being used to update pixels.
At most (barring changes to the algorithm), it looks like you could calculate these on one thread and update the image on another thread.
IMO we need to start building UI frameworks that are powered by WASM, and then benchmark those as compared to JS equivalent.
The link below pre-filters to my subjective "picks" that I think give a birds-eye view of the wasm-to-js comparison for DOM manipulation:
My picks, ranked descending by performance: SolidJS > Vue > Sycamore-rs > Svelte > React > Yew
SolidJS is a truly reactive JS framework. Sycamore-rs can be taught of as the Rust-clone of SolidJS, and Yew can be taught of as the Rust-clone of React.
[0] https://krausest.github.io/js-framework-benchmark/current.ht...
See https://surma.dev/things/js-to-asc/
Not that relevant, but it looks pretty terrible on a hi-dpi screen, it's probably not accounting for the pixel density when creating the canvas. Looks much better on a normal screen.
In theory, static typing would correct the biggest performance issues in JS (use monomorphic functions, don't mutate objects, and limit arrays of just one primitive/object type).
In practice, TypeScript allows and encourages you to create types that are horrendous for performance.
I'd love to see a comparison using AssemblyScript (basically stricter TS for WASM). I'd bet it's nearly the same speed as Rust while still being a third of the size.
It was my understanding that the V8 GC frankly was rarely used, and that they generally just let memory pile up quite a lot before it's used, in the hopes that it may never have to be run during application lifetime.
A quick profiling seemed to indicate that just a bit less than 10% of the JS time is being spent on the DOM rather than the calculations at hand. I wonder how much of that could be reclaimed simply by running the calculation in a web worker.
I suspect the bitwise AND operator every loop is another big performance issue. Normally, the JIT would leave the loop iterator as a 31-bit int, but because it is stored in a shared object, I suspect it must (f64 -> i31 -> AND -> f64) every time. A local variable that updates the object variable every 64ms and resets to zero would probably be faster.
The decPixel function should use a switch statement with values of 0, 1, 255, and default so it only needs to branch one time. This is probably a decent performance win too as around 15-20% of all time is spent here.
EDIT: I should praise the author for using a ternary instead of Math.max() as very few people know that the ternary is literally 100x faster. I wonder why this optimization was never made as it seems common enough.
I'm trying to work out how my interpretation of the calculations (in JS)[1] compare with the authors code, but trying to measure performance in CodePen is ... difficult to work out. My approach was to: 1. Run the CodePen with the inspector open; 2. Start recording performance; 3. Right click on the display panel and select 'Reload Frame'; 4: Stop recording performance after the images reappear.
... But when I look at the results nothing is making sense. Clearly my approach was wrong.
It would probably have to recognize that the _usage_ of this function can be translated into a local mutation without allocating additional arrays. But from just looking at the function locally it isn't clear whether that is a safe assumption.
Which will basically relegate WASM to games, cryptography and parallel data processing (i.e SIMD).
Right now it's not looking like a DOM API will eventuate.
My picks: SolidJS > Vue > Sycamore-rs > Svelte > React > Yew
SolidJS is a truly reactive JS framework. Sycamore-rs can be thought of as the Rust-clone of SolidJS, and Yew can be thought of as the Rust-clone of React.
">" means "is faster than".
[0] https://krausest.github.io/js-framework-benchmark/current.ht...
It often is, but unless your code is purely numerical (in which case V8 is fast by default) you have write very awkward code to make JavaScript fast. On the other hand you can often transliterate JS code 1:1 into Rust and it'll be 10x faster.
mraleph then went through a pretty epic bout of algorithmic improvement and engine-specific optimisations, reaching not quite parity with the rust/wasm version but close (Maybe you don’t need Rust and WASM to speed up JS).
Nick Fitzgerald was then able to relatively easily add the algorithmic improvements to the WASM version for an other 3x gain (~10x total from the original) (Speed Without Wizardry).
One thing I appreciate about say, Rust, vs C# is that in both using iterators to do operations on collections is the standard idiomatic thing to do. However in C# there is always some overhead, and at big scale you have to question whether that overhead is ok here. Usually is fine but sometimes you have to drop back to for loops.
With Rust the iterators pretty reliably get compiled down to very efficient code. So you just do the usual easy thing and get the good performance, no need to worry about it.
One could equally say that there's value in "things will be general by default" vs "things can be general if you use obscure features of a language and thirty libraries to generalize the things that aren't general yet".
Yet most programs seem to be following the 80/20 rule, so the majority of code that has been ever written seems to prefer generality over raw speed. Of course one possible solution is to write the general parts and the fast parts of your program in different languages, but interoperation may be tedious.
You learned that Rust compiled into WASM is faster than JS. That won't be the case for every language. (We had an example yesterday on the front-page where it wasn't. Ironically, its title implied that it was a generic comparison with WASM.)
but the core loop in Rust is pretty straightforward. It could have been shortened and optimized further.
Also keep in mind that the larger the project, the harder it gets to keep JS in the performance sweet spot without tipping over any JIT heuristic, using GC, or accidentally causing a perf cliff, while the Rust has pretty stable and deterministic optimizations and keeps its memory management control at any scale.
This is a pretty good rundown of expected comparisons, but I doubt there will be any surprises here.
- Oxidising Source Maps with Rust and WebAssembly: https://hacks.mozilla.org/2018/01/oxidizing-source-maps-with...
- Maybe you don’t need Rust and WASM to speed up JS: https://mrale.ph/blog/2018/02/03/maybe-you-dont-need-rust-to...
- Speed Without Wizardry: https://fitzgeraldnick.com/2018/02/26/speed-without-wizardry...
https://www.measurethat.net/Benchmarks/Show/6528/0/mathmin-v...
JavaScriptCore has been compiled to WASM though:
https://mbbill.github.io/JSC.js/
People usually ask why, the answer is usually for fun or for sandboxed plugins.
If you don't have a runtime in Rust you only get core.
https://doc.rust-lang.org/core/
This has obvious effects (you don't have an allocator so you can't have String) and more subtle effects (your slices don't have a sort() method! However they do have a sort_unstable() method) but while it's probably a reasonable environment for the firmware inside a custom $25 gizmo it's not a very comfortable one for general purpose programming.
To deliver a bit more, for example an allocator, you're bringing in a bunch of platform specific code, which, just like the Garbage Collector for Go, you did not write.
If there is a salient distinction, it's whether the runtime is easily and practically distributed as part of the compiled artifact, which is true for C, Rust, Go, etc but not so much for Python or JS (nor probably Java or C#, although there are efforts for static, native compilation for those platforms which probably come with significant caveats).
Larger runtimes not only add bloat to the final WebAssembly, but they also can make interoperating harder because they often require more (i.e. any) bookkeeping when sending stuff back and forth.
also worth mentioning is that this is why people build projects like TinyGo and MicroPython – they love the language, but can't work with the trade-offs that the designers chose
It's not like small runtimes are fundamentally better in this regard. They work better for constrained environments, but they often make tradeoffs which are inappropriate for other use cases. It's hard to make a runtime that works for everything, and it's not necessarily even a worthwhile goal--you can use TinyGo when you're working in a constrained environment, use Go when you're not.
C's free standing mode is even thinner than Rust's core, because it doesn't supply a library of code, you just get the primitive types, the operators and the language features that don't involve any libraries. To give a very concrete example, Rust's core depends on memcmp() existing, Rust assumes your toolchain knows how to memcmp() on your target architecture inherently, but in C you could write memcmp() if you had to. You won't these days, because your C compiler invariably provides this feature, but in principle you could.
In principle C++ also defines a "free standing" mode, but it's a mess and so people don't write for C++ free standing mode in the real world. C++ in an environment where you can't have the actual standard library is likely to be specified in terms of which features from the standard library you can have and which you cannot, for each such environment. For example maybe you can have threading but no filesystem APIs, or you can have a heap allocator but no threads.