Managing mutable data in Elixir with Rust

Managing mutable data in Elixir with Rust(lambdafunctions.com)

131 points by clarkema 2 years ago | 58 comments

mgdev 2 years ago |

Rustler is great. Though this gets me thinking about how you can maintain as many Elixir invariants and conventions as possible, even while escaping them under the covers. Being able to call FeGraph.set/2 and have db actually be mutated violates Elixir's common call patterns, even if it's technically allowed.

For example: I wonder if it wouldn't be more "erlangy"/"elixiry" to model the mutable ops behind a genserver that you send messages to. In the Elixir world it's perfectly normal to make GenServer.call/3 and expect the target PID to change its internal state in a non-deterministic way. It's one of the only APIs that explicitly blesses this. The ETS API is another.

Alternatively, you could have the ref store both a DB sequence and a ref ID (set to the last DB sequence), and compare them on operations. If you call FeGraph.set/2 with the same db ref two times, you compare the ref ID to the sequence and panic if they aren't equal. They always need to operate with the latest ref. Then at last the local semantics are maintained.

Maybe this is less relevant for the FeGraph example, since Elixir libs dealing with data are more willing to treat the DB as a mutable thing (ETS, Digraph). But the it's not universal. Postgrex, for example, follows the DB-as-PID convention. Defaulting to an Elixiry pattern by default for Rustler implementation is probably a good practice.

clarkema 2 years ago | |

That's an interesting point that I should perhaps have covered in the original article.

The real code that this is based on is in fact hidden behind a GenServer for this exact reason -- to maintain the expectations of other Elixir code that has to interact with it. The advantage of the escape hatch, as another commenter mentions, is allowing efficient sparse mutations of a large chunk of data, without having to pay a copy penalty every time. I definitely wouldn't recommend sharing the db handle widely.

rkangel 2 years ago | | |

Did you consider a port (written in Rust) instead of a NIF?

When you're presenting a GenServer like message passing interface a port is a natural fit, with none of the risks related to linking a NIF into the VM itself.

(admittedly those risks are much lower with Rust than C)

lvass 2 years ago | | |

Have you measured performance? If mutating from Elixir like this can bring serious benefits, maybe there's a place for mutable versions of libraries like Explorer and Nx.

evnu 2 years ago | |

> For example: I wonder if it wouldn't be more "erlangy"/"elixiry" to model the mutable ops behind a genserver that you send messages to.

It depends on the use case. For example, when creating a resource (basically a refcounted datastructure), it might make sense to allow mutable access only through a process as the "owner" of the resource. But if you have only read-only data behind that resource, sharing the resource similar to ETS might be what you want.

NiklasBegley 2 years ago |

I also want to give a shout out to the Rustler folks for creating a great library! We use Rustler quite extensively at Doctave, and have written about our experiences with Rustler before [0] (though our architecture has advanced quite a bit since the article was written).

Integrating Elixir and Rust has been delightfully straightforward and is a great choice for calling into libraries not available in Elixir, or offloading CPU intensive tasks.

[0]: https://www.doctave.com/blog/2021/08/19/using-rust-with-elix...

atonse 2 years ago |

Getting rustler up and running for us was very easy. Thank you to the team for making this excellent library.

We had some inconsistent build results (ours is an umbrella app) but apart from forcing a compilation and losing the ability to cache the rust builds, everything else has worked so well so we’re happy to get access to the massive rust ecosystem.

AlchemistCamp 2 years ago |

It’s exactly this use case that nudged me (primarily an Elixir dev) to start learning Rust a few years back.

Unfortunately, I haven’t had a project where I’ve needed to use Rustler yet, though.

doctor_phil 2 years ago |

Nice. I thought that Zig would be a nice language for writing NIFs - but of course Rust would be good too. Cool!

impulser_ 2 years ago | |

Rust perfect for this because Rust code can be very reliable which is needed for NIFs in Erlang because a NIF can crash the whole VM.

So using C and Zig libraries without fully understanding them can be a death trap while in Rust as long as it doesn't use unsafe code you can feel pretty good about using it.

cybrox 2 years ago | | |

This has nothing to do with Rust itself. While the compiler does prevent a lot of common pitfalls, you can still write erroneous code with it.

It's entirely the rustler project's effort (and goal) to wrap any kind of Rust program so that it will not bring down the BEAM under any circumstance, which they have done a great job achieving.

rubin55 2 years ago | |

Zigler! https://github.com/E-xyza/zigler

elbasti 2 years ago |

Cool writeup. A little ironic, since Erlang's `digraphs` are also mutable!

Miner49er 2 years ago | |

Erlang's digraphs are stored in an ETS table, so aren't they only mutable in the same way that ETS tables are mutable?

I don't normally see people consider (D)ETS tables as mutable, however.

filmor 2 years ago | | |

ETS tables are absolutely mutable, they even have specific functions to iterate over them while being mutated (https://www.erlang.org/doc/man/ets#safe_fixtable-2). I use them extensively to share data in a "lock-free" fashion with other processes (a `gen_server` that gets all messages and aggregates data in ETS tables, retrieval via direct reads on a known table name instead of gen_server:call). Mnesia is also (usually) ETS down below.

h0l0cube 2 years ago | | |

Yeah. I think even though the article doesn’t use it as an example, what’s really desirable about escape hatching to a systems language is the ability to in-place mutate lots of data. Specifically sparse mutations of a large chunk of data where a copy penalty would be wasteful. ETS is basically just swapping pointers (which I hope is mutation under the hood)

hpeter 2 years ago |

This is super cool. I learn something new every day.

wredue 2 years ago |

Immutable data is not a “foundation of scalability and robustness”.

4: %5 = sub i32 9, %0, !dbg !20 %6 = add nsw i32 %0, 1, !dbg !20 %7 = mul i32 %5, %6, !dbg !20 %8 = zext i32 %5 to i33, !dbg !20 %9 = sub i32 8, %0, !dbg !20 %10 = zext i32 %9 to i33, !dbg !20 %11 = mul i33 %8, %10, !dbg !20 %12 = lshr i33 %11, 1, !dbg !20 %13 = trunc i33 %12 to i32, !dbg !20 tail call void @llvm.dbg.value(metadata i32 poison, metadata !17, metadata !DIExpression()), !dbg !18 tail call void @llvm.dbg.value(metadata i32 poison, metadata !16, metadata !DIExpression()), !dbg !18 %14 = add i32 %1, %0, !dbg !20 %15 = add i32 %14, %7, !dbg !20 %16 = add i32 %15, %13, !dbg !20 br label %17, !dbg !21 17: %18 = phi i32 [ %1, %2 ], [ %16, %4 ]