Managing mutable data in Elixir with Rust(lambdafunctions.com) |
Managing mutable data in Elixir with Rust(lambdafunctions.com) |
For example: I wonder if it wouldn't be more "erlangy"/"elixiry" to model the mutable ops behind a genserver that you send messages to. In the Elixir world it's perfectly normal to make GenServer.call/3 and expect the target PID to change its internal state in a non-deterministic way. It's one of the only APIs that explicitly blesses this. The ETS API is another.
Alternatively, you could have the ref store both a DB sequence and a ref ID (set to the last DB sequence), and compare them on operations. If you call FeGraph.set/2 with the same db ref two times, you compare the ref ID to the sequence and panic if they aren't equal. They always need to operate with the latest ref. Then at last the local semantics are maintained.
Maybe this is less relevant for the FeGraph example, since Elixir libs dealing with data are more willing to treat the DB as a mutable thing (ETS, Digraph). But the it's not universal. Postgrex, for example, follows the DB-as-PID convention. Defaulting to an Elixiry pattern by default for Rustler implementation is probably a good practice.
The real code that this is based on is in fact hidden behind a GenServer for this exact reason -- to maintain the expectations of other Elixir code that has to interact with it. The advantage of the escape hatch, as another commenter mentions, is allowing efficient sparse mutations of a large chunk of data, without having to pay a copy penalty every time. I definitely wouldn't recommend sharing the db handle widely.
When you're presenting a GenServer like message passing interface a port is a natural fit, with none of the risks related to linking a NIF into the VM itself.
(admittedly those risks are much lower with Rust than C)
It depends on the use case. For example, when creating a resource (basically a refcounted datastructure), it might make sense to allow mutable access only through a process as the "owner" of the resource. But if you have only read-only data behind that resource, sharing the resource similar to ETS might be what you want.
Integrating Elixir and Rust has been delightfully straightforward and is a great choice for calling into libraries not available in Elixir, or offloading CPU intensive tasks.
[0]: https://www.doctave.com/blog/2021/08/19/using-rust-with-elix...
We had some inconsistent build results (ours is an umbrella app) but apart from forcing a compilation and losing the ability to cache the rust builds, everything else has worked so well so we’re happy to get access to the massive rust ecosystem.
Unfortunately, I haven’t had a project where I’ve needed to use Rustler yet, though.
So using C and Zig libraries without fully understanding them can be a death trap while in Rust as long as it doesn't use unsafe code you can feel pretty good about using it.
It's entirely the rustler project's effort (and goal) to wrap any kind of Rust program so that it will not bring down the BEAM under any circumstance, which they have done a great job achieving.
I don't normally see people consider (D)ETS tables as mutable, however.
It may not be the only way to get to scalability and robustness. But it certainly is the cornerstone of how Erlang gets there.
1. First, the way Erlang treats data ensures that every piece of data can be sent over the wire by default. This helps pave the way for another amazing characteristic of Erlang, and that is when you refer to and use an object, it's essentially transparent to your code whether that object is on this machine or another machine in the cluster. This would not be possible without the fact that all data structures are remotable, which is enabled by the immutable data. (See also side note below.)
2. The immutable data also leads to clean rollback semantics, making it easy to always have a self-consistent state of the system ready to use even after some kind of fault.
3. The immutable data also leads to very clean and easy ways to handle multithreading because you never have to worry about making object copies. You can be assured that it's ok for two threads to use the same memory object because there's no way either of them can change it.
Side note: Alan Kay, the inventory of OO, has said that people get the entire idea of what he was talking about all wrong. He said that object orientation isn't about objects, but its about communication. He was talking about the idea of an object being more like what we'd call a web endpoint today, where when you instantiate it you communicate with it by sending it messages. It's funny to me that a functional language like Erlang best embodies that OO idea today. Go code can, too.
"I'm sorry that I long ago coined the term 'objects' for this topic because it gets many people to focus on the lesser idea. The big idea is 'messaging'" - Alan Kay <https://en.wikipedia.org/wiki/Alan_Kay>
He goes on in the original underlying document to say "OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things." All of these ideas are front-and-center in Erlang (and by extension Elixir).
Does this mean a web-endpoint has to be immutable? If you send it the same parameters multiple times, is it required to respond with the same response every time? If not, does that not mean it is in fact mutable?
I read elsewhere that in Elixir programs, there is no difference in messaging a local "agent" or a remote one? The caller does not know whether the other party is remote or not. Is it still guaranteed to be immutable?
Just asking since I don't know much about Elixir.
https://discord.com/blog/using-rust-to-scale-elixir-for-11-m...
Personally I think that if you can stomach the additional complexity (which is a non-trivial "if", but a doable one), Rust's approach supercedes immutability. Full immutability was an interesting theory in the 1990s, and I mean that respectfully and not sarcastically, but in the end I think it was overkill and overcompensation. The correct thing to do is not to eliminate mutability, but to firmly, firmly control it. Rust has a sophisticated method for doing so, with compiler support. It may not be the only one, but it seems a very solid one. Immutability is another method of controlling it, but in my view, it's actually kind of a blunt instrument applied to a complex problem.
In my considered opinion, in the end, immutability isn't even important to Erlang. What matters in Erlang is that you can't send references across messages, so there is no way to witness mutation done by another process. It was not necessary within a given process to be immutable, and I suspect that has been a non-trivial ball and chain around Erlang's legs in terms of adoption even to this day. There was never any need for a newbie Erlang developer to also have to learn how to program immutably within a process.
Immutability is a tool, not a rule, and I am free to reject any assertion otherwise when those assertions provide no evidence, or shitty anecdotes.
Prove your claims.
Certainly, immutability is a foundation for performance problems.
Another provable rule in computing is that more lines of code = more bugs. Immutability uses more lines of code.
Another demonstrable fact is that Haskell based programs have just as many bugs as any other programming language whether you have immutability or not. Therefore, immutability is not a bastion of robustness.
You’re going to have significant difficulty proving to me that immutability = scalability and robustness when both are demonstrably not true just by taking measurements of thing you expect to improve out of those foundations.
Immutability is not a silver bullet. It is a tool that is sometimes useful, but has significant drawbacks, including shitty performance, and significantly limiting how your data can be managed (without that limitation paying off in any significant way)
This is a concept called purity, and it's only loosely related to immutability. Immutability makes purity easier to implement and reason about, but does not guarantee it. Erlang/Elixer are not pure. For example, `DateTime.now("Etc/UTC")` will return different things at different times.
As a counter-example, Haskell functions are pure, so the `getCurrentTime` function cannot return a value directly as it would be different every time. Instead it uses the return type `IO UTCTime` which act like instructions on how to calculate the time, rather than the time itself.
State or data that changes values is typically put in one of 3 different places:
1. On the stack. It's pretty typical to have a function like handle_message(app_state, request). The current state of the app is in the call parameters. At the end of this message, handle_call() would call itself again with the new updated state. Somewhere else in the system we keep track of the last value of that state, and if handle_call crashes, we just use the last state.
2. Another place to hold state is in external storage somewhere.
3. The third main place to hold state is via references to other objects, which do #1 or #2 above.
Regarding whether there is a difference between messaging local versus remote objects-- there's an operator for sending messages to another object. It works the same for local and remote. I think it's possible to inspect the actor address and see where it is, but the mechanism works the same.
I agree that Joe was a great explorer of ideas. I'm not sure if he expressed thoughts on Rust, but he would probably look at it again from time to time.
> In my considered opinion, in the end, immutability isn't even important to Erlang. What matters in Erlang is that you can't send references across messages, so there is no way to witness mutation done by another process.
In some ways, you may be right. But you can always mutate the process dictionary if immutable data really bothers you. But even with the process dictionary, it's not possible to construct a self refering datastructure as an Erlang term, which is important! That makes garbage collection simple.
Also, functional programming makes Erlang processes effectively preemptive, when they're built from cooperative user space threads. Tokio tasks can loop and tie up the OS thread it's running on; but an Erlang process will always come to a function call in finite time and can be descheduled at that time, so all runable processes will get a share of cpu.
Edit to add: It's also important to note that Immutability is a property of Erlang (and Elixir), not a property of BEAM. The BEAM vm has opcodes for mutation, and the Erlang compiler will emit them in certain sequences --- if you never use the old value again, it's ok to mutate it rather that create a new modified value; you're most likely to see that with Tuples, IIRC.
- https://discord.com/blog/how-discord-scaled-elixir-to-5-000-... — continually improving elixir
- https://discord.com/blog/how-discord-stores-trillions-of-mes... — moving to scyllaDB
Very nice work.
In my case the data I'm dealing with is more of a store than a single data item, so I'm leaning on the example of things like ETS. Also it's within a single application rather than being a large generally-available library, so the trade-offs are different. It would be interesting to know if they did tests though.
Because the rest of your post is pretty LOL-worthy in light of your opening sentence.
1) immutability has performance problems: source: literally every measurement of immutable vs not data structures ever performed.
Source 2: logic - copying data is slower than not copying it
Source 3: cache lines: modern CPUs rely pretty heavily on cache lines and branch prediction to improve performance. Immutability measurably harms both.
2) immutability requires more code and loc is the best predictor of defects
Clarification: runtime immutability requires more code
Source: it takes more lines of code to return deep copies of objects than to not do that.
Source: https://www.researchgate.net/publication/316922118_An_Invest...
Package densities are the best predictors of defects
3) Haskell projects have as many bugs as any other language
Source: the best evidence we have here is “the large scale study of programming languages on GitHub”, but I suggest that you look deeper here, as the authors qualifications of defects is somewhat questionable (a project that never fixes defects would have low defect rates in this study, it additionally doesn’t properly compare projects sizes and other things). Anyways, in responses that do have better controls in place (and hilariously even in this paper itself, where we see Haskell programs tend of see higher defects as projects go on while c projects tend to have fewer), we see that Haskell does absolutely no better than anything else for bugs and defects.
"Table 7: Functional languages have a smaller relationship to defects than other language classes where as procedural languages are either greater than average or similar to the average."
"The data indicates functional languages are better than procedural languages; it suggests that strong typing is better than weak typing; that static typing is better than dynamic; and that managed memory usage is better than un-managed."
You got owned by your own source.
As for your un-sourced claim that "copying data is slower than not copying it", I'd suggest learning how immutable-first languages practice data sharing between objects to minimize the amount of copying needed.
So lets disabuse your mistrust of immutability in another domain!
Here is some typical "go fast and mutable!" nonsense code:
int foo(int i, int j) {
while (i < 10) {
j += i;
i++;
}
return j;
}
Let's compile it with https://godbolt.org/, turn on some optimisations and inspect the IR (-O2 -emit-llvm). Copying out the part that corresponds to the while loop: 4:
%5 = sub i32 9, %0, !dbg !20
%6 = add nsw i32 %0, 1, !dbg !20
%7 = mul i32 %5, %6, !dbg !20
%8 = zext i32 %5 to i33, !dbg !20
%9 = sub i32 8, %0, !dbg !20
%10 = zext i32 %9 to i33, !dbg !20
%11 = mul i33 %8, %10, !dbg !20
%12 = lshr i33 %11, 1, !dbg !20
%13 = trunc i33 %12 to i32, !dbg !20
tail call void @llvm.dbg.value(metadata i32 poison, metadata !17, metadata !DIExpression()), !dbg !18
tail call void @llvm.dbg.value(metadata i32 poison, metadata !16, metadata !DIExpression()), !dbg !18
%14 = add i32 %1, %0, !dbg !20
%15 = add i32 %14, %7, !dbg !20
%16 = add i32 %15, %13, !dbg !20
br label %17, !dbg !21
17:
%18 = phi i32 [ %1, %2 ], [ %16, %4 ]
Well, would you look at that! Clang decided (even in this hot loop) never to re-assign any of the left-hand-sides, even though my instructions were just: "mutate j in-place. mutate i in-place."> Source: it takes more lines of code to return deep copies of objects than to not do that.
Defensive copying and deep copying is not a thing you have to do in immutable languages. Even under the covers, it's not happening the way you seem to think it is. If I had a large immutable map in use by some other process, and needed a version of it with an element changed or added, why would I deep copy it when I can just point to that same map instance, and add a pointer to the key-value pair I want to substitute [1]? I think this is a common reservation people have about immutable programming because they come into it with a OO mindset. At least, I know I did.
In a really simplified example, a = (1, 2, 3, ..., 100) and b = (2, 3, ..., 100) are not allocated as two full lists in memory space. a contains 1 followed by a pointer to b. Because you have guarantees that b will never change, the single instance of b can be recycled in other data structures (or passed to many other functions and threads) and you avoid the complexity of managing race conditions, mutexes, semaphores, which are a significant source of bugs in other languages.
See [2] for a more realistic implementation.
You have posted nothing else besides your own assertions.
Not everything people say on a discussion board is some scientific claim, subject to scientific inquiry and in need of a thesis defense. But if you really off-the-cuff dismiss Joe Armstrong's opinion on a matter because it hasn't met your criteria of proof, despite you thinking you are somehow being the rational scientist here, you are actually revealing your own stupidity.
Bugs happen when you think you can program something correctly, but can't.
If you look at the implementations of transactions in any other language... Oh wait there aren't any!
People keep trying to implement it in their own languages, figure out it's a non-starter (because of uncontrolled mutation), and give up.
- Clojure doesn't enforce purity (it can't), but from what I hear its STM seems to work pretty well (aside from some perf issues possibly? haven't used it). That's because "mostly pure" functional programming is encouraged by both the language itself and the culture and ecosystem around it, so uncontrolled side effects are less likely to be a problem.
- I think STM can work "well enough" in unmanaged languages as long as you don't try to boil the ocean and make it perfectly transparent, safe, and fast under all circumstances (Microsoft, IBM, Intel and several others tried for years and failed). That means there will inevitably be huge footguns for non-expert programmers (e.g., any side effect might be invoked every time a transaction is optimistically and transparently retried). These footguns can be mitigated by affordances like commit/abort handlers and infallible transactions.
I reject that claim.
In this comment, you simultaneously agree and disagree with me.
I don’t give a shit what Joe Armstrong says about immutability because the facts are the facts:
1) immutability cause performance problems
2) immutability significantly limits how you can manage data, which is counter to what computers are meant to do
3) immutability measurably does not reduce bugs in programs
I am not dismissing <insert name> off the cuff. I am dismissing them because their claim does not align with metrics you expect to improve as a result of their claim.
>it doesn’t have to be a scientific claim
When you are telling people to “make immutability a foundation of their programming”, you 100% are opening yourself to scientific scrutiny. If you cannot back up this claim with actual metrics, and you’re just going to say “hurr durr, just let me make claims without calling me out to providing evidence please”, why should anyone believe you?
Have you ever heard the saying, “assertions made without evidence can be dismissed without evidence?” My experience differs.
Immutability has been the foundation of many of our large scale programs. It makes safe concurrent programming easier, and languages built around immutable data structures usually optimize memory handling in ways that are not available when simply writing “functional style” code in non-functional languages. ie under the hood they’re using persistent data structures, structural sharing, tail call optimization, etc.
I'd be curious to see how you back this claim up. Are you referring to something published that we can all go and read?
For your points:
1) Yes, immutability can cause performance problems in some contexts. However, it can also help in the whole. Mutability in concurrent systems requires all sorts of complications such as mutexes that slow things down considerably. In even single-threaded systems, mutability leads to defensive copying in practice. Furthermore, persistent data structures[0] exist for lists, dictionaries, etc., that achieve very good space and time performance by mutating internally while exposing an immutable interface.
At any rate, even if it is slower, most of the time the performance difference just doesn't matter.
2) How does it limit how you can manage data? It's still possible to mix immutable and mutable data if necessary, but immutable data can be transformed just as mutable data can.
3) You say it measurably does not reduce bugs in programs, again with no evidence. Immutability eliminates entire classes of commonly-encountered bugs, including many pernicious ones related to concurrency. These are bugs that happen commonly with mutable data, but simply don't for immutable data.
In addition, there is some limited empirical evidence to the contrary, which is rare for this kind of thing. Immutable-first Clojure had the lowest proportion of Github issues labeled as bugs, even beating out static languages. [1]
I'm not GP, but these traditions are usually not backed by any evidence but by cargo-culting and cult-of-personalities. Not to mention people who over-hype their favourite technologies to high heavens, poisoning the well for everyone else (no, most telecom industry doesn't run Erlang, Naughty Dog didn't ship Lisp on PlayStation 2, and Prolog didn't lead to fifth-generation computing).
I don't believe in blindly believing things without evidence either, especially if I have never encountered them before, but I also don't believe in blindly dismissing experience of world renowned experts in their field because they didn't provide me a point by point prooftext of every claim they made (Again we aren't sitting here discussing a dissertation or mathematical proof). Their experience and what they've provided to the world is the evidence. We took this 19th century german ultra-materialist philosophy too far here in the west, and that's what gave us post modernism/poststructuralism with its disastrous consequences, but it still seems like we haven't learned anything from that.
The ancients had it right that theres different types of knowledge, and different ways of knowing things (and knowing them to be true, at least as far as it mattered). We here in the modern era with the most unfettered access to information have quite possibly the narrowest definition, ironically.
There’s a reason that the fastest Haskell game engine looks like ps2 graphics on modern hardware.
People hawking these traditions usually do for consulting money, not spiritual fulfilment. I am all for non-materialism, but only as long as it's not used to exploit me. Belonging to a post-Colonial country, I know exactly where that leads.
Immutability is measurably slower. Full stop.
The fact that you can come up with silly, overly simplistic, non-idiomatic anecdotes showing that sometimes a compiler will prefer calculation doesn’t change that. It is a commonly known fact in low level programming that just because you’re calculating something doesn’t make it slower by default.
When Haskell devs can produce a game engine that doesn’t look like PS2 on a 4090, we can chat again about how immutability is supposedly not slow.