Weld: Accelerating numpy, scikit and pandas as much as 100x with Rust and LLVM(notamonadtutorial.com) |
Weld: Accelerating numpy, scikit and pandas as much as 100x with Rust and LLVM(notamonadtutorial.com) |
There is an important "contemporary history of computing" article to write about the evolution of the Spark project from "let's build a distributed filesystem for MapReduce in Java because we read those early Google papers" to "SQL is the right model for working with data so DataFrames" to "meet data scientists where they are: Python (and R)" to "make machine learning easy" and now to "LLVM, but for crunching big numeric arrays".
It has its own runtime, but it's not difficult to call Haskell code from C or ATS or whatever.
In addition to the JVM, Scala has had JS [1] and native (via LLVM) [2] targets for years.
(And that's not even mentioning any second-order compilations; e.g. Scala -> JVM bytecode -> native)
There's a number of reasons to not choose Scala, but portability is far from one of them.
https://github.com/tensorflow/mlir
https://www.youtube.com/watch?v=qzljG6DKgic
Exciting times for the future of parallel computing!
I think Julia is a more interesting language for this space, with the built in matrix support, easier prototyping, a REPL, etc...
I don't really know why you'd use Rust instead of a GC language from the ML family.
Have fun entertaining your Patreon
Numpy et al of course already have N python acceleration frameworks hammering at their doorsteps to integrate more closely...
Weld and XLA seem to have similar optimization steps though.
I also want to mention that this benchmark is from a while back (around 2017 I believe), so its possible improvements in both XLA and Weld will make the numbers look different today :)
>> We chose Rust because:
>> It has a very minimal runtime (essentially just bounds checks on arrays) and is easy to embed into other languages such as Java and Python
>> It contains functional programming paradigms such as pattern matching that make writing code such as pattern matching compiler optimizations easier
>> It has a great community and high quality packages (called “crates” in Rust) that made developing our system easier.
They could've used an ML with GC and it would've been better (for a compiler).
It doesn't really have any functional programming paradigms. Pattern matching is present in imperative languages like past versions of ATS
With regard to Pandas this makes me pause slightly, since, while pandas contains lots of high quality and high performance implementations, the API of pandas in some places doesn’t feel well-designed (the most obvious example is indexing of data frames via square brackets and the various properties like iloc).
I think there might be something interesting for this strategy also in the WebAssembly space :)
Overall, we think that the accelerating the kinds of data science apps Weld and Numba target will not only involve tricks such as compilation that make user-defined code faster, but also systems that can just schedule and call code that people have already hand-optimized in a more efficient and transparent way (e.g., by pipelining data).
Rust is just a IPO driver of sorts here.
I'm not critizing Numba btw, I use it regularly, but your comment seems a little off here, considering that Weld has different goal in mind.
I don't really know why you'd write a project of this sort in C.
A lot of people would like to use Rust for data-analysis / machine learning, but there are not really any good batteries-included frameworks for getting started with this.
No, they originally wrote the runtime in C++, but ended up re-implementing it in Rust because the C++ runtime had too many bugs.
> They could've used an ML with GC and it would've been better (for a compiler).
They originally wrote the compiler in Scala with the JVM GC, and they said it was much slower and much harder to embed.
> It doesn't really have any functional programming paradigms. Pattern matching is present in imperative languages like past versions of ATS
When choosing a language for such a project, there are many engineering trade-offs that must be evaluated.
The Weld project has dozens of developers that need onboarding, documentation, examples, tooling, etc. Rust has a lower barrier of entry than ATS. One of the main things Weld does is interfacing with LLVM: this is one of the main things the Rust compiler does and Rust has great libraries for this. Another thing Weld does is interfacing with many dynamic languages (Python, R, etc.). Rust does not just have "C FFI". It also has _a lot_ of great tooling for automatically generating and validating all the boilerplate automatically. Finally, performance and code size of Weld is one of the main advantages over the alternatives. Rust generates reasonable code with LLVM, ATS has its own code machine code generator, which while reasonable, isn't as good.
Finally, it is hard to find people who enjoy writing ATS code. They exist, but are not many. Even when you do find them, they often don't like collaborating with people (I only know one ATS user, vmchale on github, and they don't really like interacting with others). OTOH it is trivial to find lots of people that enjoy writing Rust with others. It doesn't matter if this is due to technical reasons, marketing, or hype, but it's a fact that you have to consider if you want a project to grow fast.
It’s really asking the same questions over and over again. Can we do better ? Does this tool allow me to be efficient, write safer and faster code, how good is the adjacent libraries and ecosystem ? What other kinds of things does it make it possible to solve?
If you are a professional, you will use the most effective tool for the job - to get results. What tool will produce the best results - schedule, budget, quality, maintainability, scalabi, portability, etc.?
Other than outliers that will crush your productivity, or multiply it, your feelings are pretty irrelevant.
Similarly, when you get into a racecar, your feelings about your preferred driving style are irrelevant - if you can change the setup to accommodate your style without slowing jt down, great = but if not, your job is to adapt to the situation and reliably get the best possible result.
Either way, you have fun and produce a crap result, you will not be congratulated (or re-hired), and if you have little fun and produce a great result, you'll get both.
If it's a hobby, do whatever you want.
Obviously, in terms of professional development, you want to use more forward looking tools, but what is the best measure of that - your feelings or results?
As somebody who uses both, I don’t understand the whole territorial conflicts in this space. Any C programmer can learn a ton by the paradigm Rust uses. Even if Rust would fade into oblivion tomorrow, the lessons I learned by using it will remain valueable enough for me not to regret having learned it.
C is always going to be needed, given the size of the codebase and the amount of embedded stuff written in it. It comes – however – with a waggon full of dangerous traps and gotchas and in practise very few people are good enough to always avoid them or mitigate the risks created by them. I don’t see any reason why C shouldn’t get better in these areas for the benefit of everyone involved.
I can relate to this, it is not unreasonable for me.
I've noticed the younger people are the more likely they are to be passionate, which I put down mainly to not knowing any better. Once one has more of the experience that places you higher up where you can see further, suddenly one's own plot of land doesn't seem so special.
RustPython seems to be modestly aiming for a reimplementation of CPython.
You would have to keep everything in variants, or wrap/unwrap manually all over the place to get similar functionality.
And C has tagged unions.
If you really want to know, spend one afternoon learning any programming language with built in support for that (Rust, Ocaml, Haskell, ...). ADTs is one of the first things one learns.
In Rust, the features you'd need to learn are enums, patterns, and pattern matching.
But be warned that using C++ variant and std::visit will feel like you are being forced to only write C instead of C++ for the rest of your life, knowing that life could be much better. Once you learn this, there is no way to un-learn it.
Haha, that's like saying that C has templates because it has the _Generic macro.
I choose the best tools for me, invest a lot of time in getting better at them, and choose jobs I can do with minimal changes to my toolset.
To expand your analogy, if I have spent the last few years of my life getting better at driving bulldozers, I will not take any job requiring me to get into a racecar.
Old, boring, mature tools where the limitations are well understood.
This isn’t mutually exclusive, surely. If anything you’d expect it to be quite closely correlated.
I'm actually looking for a different job for that reason.
We are required to used Java on K8s, Kafka & Cassandra for every single solution big or small because it is fashionable, not because it gets the job done well or for any other reason. I can even demonstrate how a couple of Python scripts and Pandas could do all the same work with far less overhead and achieve the same results. Crickets. Python is not sexy where I am, it is the language of peasants, apparently. Not sure what to make of it all, but that is my reality right now.
Also, I don't think you know anything at all about driving race cars. The driver has a tremendous amount of input into the car's setup because it's his life on the line out on the track. "Adapting to the situation" gets finishes, not wins.
Good reason to seek a new situation, since you have neither appropriate tools selected for you nor input to select better ones.
Racecars? Yeah, I've only won some SCCA super-regional championships. Yes the driver does have a very large inptut into the setup, BUT it is within the constraint of the combination of the setup change and the improved driver feel must make the combination of car/driver faster. And yes, sometimes a change that makes the car technically a bit slower but gives the driver more confidence will result in faster net lap times -- and those are OK. But whatever the setup is, at the end of the test sessions, whether the car feels great or feels like crap, it's the driver's job to get the most out of it.
And I've had many situations both in the racecar and in international alpine ski racing where something felt weird/odd/unfamiliar/scary, but was fast as heck, so it was my job to adapt, rather than go back into my comfort zone.
Better to keep pushing outside your comfort zone, use tools/setups that get better results, and change your 'feel' to appreciate the better setup.
It's not at all helpful to say that there's something so complicated on these other languages that you can't possibly get the idea across without using them.
enum A { Foo{ x: i32, y: f32 }, Bar(B), Baz([u32; 4]), Moo(i32), Mooz(i32) }
struct B { z: f32, w: (f64, f32) }
let b = B { z: 42.0, ..}; // create a b with z == 42 and default w
let B{ w: (first, _), .. } = b; // get B.w.0 field
let a = A::Foo{ x: 42.0, ..};
if let A::Bar(Bar { z, ..}) = a {
// if a is an A::Bar, get b.z field of A::Bar(b)
}
if let A::Baz([0, 1, 2, 3]) = a {
// if a is an A::Baz containing an array
// with value [0, 1, 2, 3]
}
match a {
A::Moo(1..3 @ v) => {
// if a is an A::Moo where x in A::Moo(x) is in range [0, 3) and put the value in the local v variable
}
A::Moo(x) | A::Mooz(x) => {
// Either A::Moo or A::mooz, gives you the value of x
}
// ERROR: I forgot to match some patterns
}
foo(b);
fn foo(B{ z, ..}: B) -> f64 {
// get the z field of the first function argument
z
}
What in Rust are one liners, and can be used anywhere (let bindings, constructors, match, if-let, while-let, function arguments, ...) is a pain to write in C++ using `std::visit` and `std::variant`. The error messages of `std::visit` + `std::variant` are quite bad as well. And well, then there are also other fundamental problems with variant like `variant<int>` having two variants, `variant<int, int>` having 3 variants, but you can't reach the second `int`, etc.You can translate all the code above to C++ to use std::visit + std::variant instead. I personally find that C++ is unusable for programming like this, and almost never use std::variant in C++ as a consequence, while I use ADTs in Rust all the time.
> dismissing people for being passionate is not
Seems I didn't say that well either. Passion that comes from knowledge and experience is good. If it comes from inexperience then maybe not so good because it isn't "Healthy discussion around facts" but around existing biases.
Edit: genuinely no offence intended.