Garbage collection without unsafe code(fitzgen.com) |
Garbage collection without unsafe code(fitzgen.com) |
[1] https://soft-dev.org/pubs/html/hughes_tratt__garbage_collect...
Hence all the Chapel, Swift, D, Linear Haskel, Ox, Idris2, Scala Capture Checking, Koka, and probably many others, efforts going on.
This is also noticeable among rustaceans, otherwise ergonomics for all the .clone() and related machinery, wouldn't be part of the 2026 roadmap.
Rust's ownership system is automated resource management. What you're asking for is dynamic lifetime determination, which Rust provides via types that opt out of the hierarchical single-ownership paradigm.
Or the experiements Niko Matsakis is doing with Dada.
Weirdly, I picked this screen name about a decade before Swift the language (and several years before Swift the singer) debuted
Either use unsafe and think about using raw pointers carefully, respecting the soundness rules, or truly redesign it using idiomatic Rust constructs. But don't hide complexity under the rug by using indexes instead of pointers, it's mostly the same thing.
I really enjoyed the write-up though, I learned a lot from it, not to discount that.
Of course, it does mean we should compare them 1-to-1, which means we should treat the indices code like we would treat unsafe code. The most important conclusion is to encapsulate it in a data structure without business logic.
In this example, you wrap Gc types in Option, I think that having the Gc type be nullable instead would be an interesting experiment. Having to introduce a lot of places that branch both puts more into the instruction cache, and adds more places that the branch predictor has to track. Besides, you can always add in missing checks if you know that you have a sparse subgraph somewhere. Total potential micro optimization, but it's a bit of fun :-).
I also like how this shows how freelists are a smidge less efficient in safe Rust, as compared to unsafe Rust. In this solution, we have to encode the tag for the discriminated union, this is unnecessary! The unoccupied slots forms its own separate list, and so you know that you will always have unoccupied slots as long as you trace along that path. I assume that that will add up to the maximum alignment of bytes, as arrays need to be well-aligned? So, probably 8 extra bytes per element (you're storing stuff that probably references other stuff).
So maybe eliminate type and concurrency unsafeties also then in the next decades or so.
Other than UB, using indexes instead of pointers can be exactly as error-prone and leads to the same kind of unexpected runtime crashes, memory corruption and security issues. It’s a false sense of security.
If you inspect the OP implementation in detail, you’ll notice that it is still plenty open to use-after-free bugs and lots of places where it can panic in an equivalent way to a segmentation fault.
It is quite unsafe while not being “unsafe”.
I think the simple fact that pointers are not guaranteed aligned/valid even if they are in range of a particular slice/collection etc. actually makes it very different
More generally, there are various situations where garbage collection is effectively a requirement: namely, when you've got a mutable graph where anything can point to anything else, where some nodes are root nodes and others are not, and so it's possible for cycles of unreachable nodes to occur and you don't want this to result in memory leaks.
You could in theory imagine a language like Rust where garbage collection was integrated more deeply on a syntactic level, such that you might sometimes use it for convenience and not just where it's a requirement (and Rust in fact worked like this for a while pre-1.0), but the way things currently work, integrating a garbage-collection library basically always makes your code more unwieldy and you only do it if you have to.
I just feel like not having GC is sort of a deliberate, core design choice for the language. Having strict ownership rules and being forced to think about references feels like a feature not a bug you know? Adding GC feels analogous to the "++" in C++ to me.
Not that I have anything against the efforts people are putting into it though - I'm genuinely curious about what it lets me do better/faster within rust.
GCs are useful in the same way that Python is useful even though it is slow. Simplicity of use has intrinsic value but that is a tradeoff against other objectives.
I have also worked on projects (web apps) where GC pauses became annoying to end-users. I wouldn't say this is the fault of GC perse, but writing code without a GC, seems to prevent some performance problems for me, and can also make performance problems stand out sooner in the dev cycle.
Of course, many GCs have indeed gotten better in their own right, but not using the GC at all remains the most potent optimization.
Not in my experience.
If anything it’s the opposite: issues demonstrated by cve-rs are _language bugs_ and are _fixable_ in principle. “Safe Rust should be memory-safe” is a well-defined, falsifiable contract that the compiler can be measured against. Meanwhile memory unsafety is a feature of the semantics of C++ and so it would be absurd to file a bug against gcc complaining that it compiled your faulty code.
I see that repo is two years old - are there flaws in Rust that aren’t edge cases that would make it not memory safe?
If the repo is what I think it is, the underlying bug is over a decade old [0]. From what I understand the devs have known how to solve it for pretty much this entire time, but their desired solution is blocked on some rather large refactorings elsewhere in the compiler.
> are there flaws in Rust that aren’t edge cases that would make it not memory safe?
cve-rs is probably the most well-known one, but it's definitely not the only one [1]. That being said, I think there's a not-unreasonable argument to be made that cve-rs counts as an edge case since (as far as I know) there's no publicly-known examples of code that has "organically" run into this.
On a more practical note, I think if you're worried about memory safety in Rust code type system holes are vanishingly unlikely to be the cause compared to more "normal" things like mistakes in-around `unsafe` blocks.
[0]: https://github.com/rust-lang/rust/issues/25860
[1]: https://github.com/rust-lang/rust/issues?q=state%3Aopen%20la...
In terms of practical, yeah a doubly linked list (or trees with bidirectional pointers to both parent and children, etc) is especially easier to implement in a GC environment than with Rust borrow checking alone. You can do it without a GC, but a GC can be a helpful intermediary.
Such a corrupted array should never cause a program crash, unless there is a second bug where unexpected values in the corrupted array can cause crashes or unless it is intended that any detected bug must abort the program, when it does not matter whether pointers or indexes were used.
Thus using indexes should always be safer than using pointers.
In modern CPUs, indexed addressing is as fast as addressing through a register that holds a pointer, which removes the main reason why pointers were preferred in the seventies of the last century, by the time of the C language design, when in most minicomputers and microcomputers using pointers was faster than using indexes.
For some use-cases, you will probably keep all dynamically allocated objects in this memory.
At that point you have circumvented most of Rust’s protections and ergonomics. The user will effectively be working with raw pointers into this memory. The implementation itself also doesn’t have the proper checks because it is manipulating indexes instead of pointers.
The post itself notes around the end how it is fairly easy to end up with a dangling pointer or one that points to a different object allocated to the same position after the old one was freed. It also has quite a few places where it can panic. And implementing its Trace trait wrong or not using Roots as intended can have complex consequences. And all that while feeling like this is the most safe GC because it has no unsafe.
I don’t disagree with anything you said, and I also quite like the work from OP. But I hope you can see that this is not fully aligned with Rust’s philosophy.
So, like bounds-checked array access? That's also a logic error. Common pointer manipulation (in the presence of discriminated union types), allocation/deallocation errors and data races (when involving array indexes, pointers or tagged data, not just simple valye types) can all lead to UB; bounds-checked array access on its own cannot.
My point is that you can start out without a GC and do a series of sane and effective optimizations that end you up with a GC. Just as you can start with a GC and optimize by moving more and more of your allocations to non-GC memory and end up without a GC. Which endpoint is faster depends on the workload.
You can say the same about any abstraction, such as having a generic memory allocator, using a standard library, actually about any code reuse. For example, many programs use printf, but don’t need all its features, so hand-rolling a custom version may lead to smaller, faster code (things get complicated here on systems with shared libraries)
The question always is whether lost performance (execution time, memory usage) is worth faster development.
There is nothing automatic out of it.
Write Rust code, compile error if done incorrectly, manually fix the data structure or algorithm root cause, loop.
For me its ideal use case, are places where any form of automated resource management is either not technically possible, or it is a waste of time trying to change mindsets.
Rust isn't a language for a special case, Rust is universal. Those other languages are for special cases, you're simply a special case developer.
Picking a different language for each special case is the primary problem Rust solves, it's the problem of being a jack of all trades and a master of none.
>> If anything all those attempts prove that for many scenarios, it is better having automated resource management + (affine, linear, dependent, effects) than the pure affine types approach taken by Rust.
Can you write the proof down for us please? I'm curious, how do you prove that "for many scenarios" A+B is better than B+A?
The OP is about doing A in safe Rust, which isn't even an option for many of those other languages which are, at the very least, bootstrapped from unsafe code.
The proof is the amount of research that is ongoing for plenty of people that don't consider Rust as it stands today, the ultimate answer in programming languages, including people like Niko Matsakis.
Or are you going to reply his opinion has no value, and should abandon the language ergonomic proposals from the improvements roadmap?
False. The language design safe by default, something that you can confirm super easily doing just the Rust tutorials and compare the same with C or C++.
Read the repo well:
cve-rs implements the following bugs in safe Rust:
Use after free
Buffer overflow
Segmentation fault
NOT REFUTE IT.> There are unsafe blocks all over the stdlib
Unsafe blocks is not the same that unsafe code. Are marked areas that are required to do escape automated checks, and there, you are at the level of a C/C++ programmer (where in that languages ALL THE CODE IS MARKED UNSAFE).
If you complaint against that, is the same as complaint against ALL THE CODE writer on C/C++.
---
One thing important to understand about Rust: Rust is a system language and SHOULD be able to implement everyting including, Buffer overflow, Use after free , Segmentation fault and such. You should be able to implement a terrible OS, malware, faulty drivers, etc, minimally because that is required to test safe programs!
(example: Deterministic Simulation Testing https://turso.tech/blog/introducing-limbo-a-complete-rewrite...).
But what Rust gives is that not assume that you want to do it for most programs.
Physics is unsafe. Something, somewhere needs to provide the safe core.
> And concurrency safety would need to get rid of their blocking IO, which they haven't even acknowledged.
Is your position that blocking IO can't be compatible with concurrency safety? That's a strange claim. Can you explain?
No, that's common knowledge. I fixed concurrency safety by forbidding blocking IO. Others also. Maybe there are other ways, but never heard of other ways.
This seems like a non-sequitur to me? The presence/absence of a GC is not dispositive with respect to determining "safety", especially when the GC itself involves unsafe code.
It’s like double entry accounting when you only have one pen and one writing hand. The system is broken if you ever write down only half of a transaction in one ledger: you always need to record both the flow in and flow out on both ledgers. Writing either one of them is an unsafe operation. But you can only write one thing at a time. So write them in pairs in an unsafe {} block, and reuse that block safely.
It seems like your complaint is that Rust is referred to as a safe language. Which is fine; it's more correct to use the phrase "in safe Rust" rather than assuming that "in Rust" fully implies "safe". That is true, but that's a crack in a sidewalk compared to the chasm of difference between Rust and C++. Why obsess over that crack?
Should we all refer to "Python without FFI or any extensions written in C or another unsafe language" instead of "Python", to avoid asserting that Python-as-it-is-used is a safe language?
[1] Assuming it's FFI to an unsafe language, and that's the main purpose of FFI.
That is not "back in C++ land."
I don't deny the existence of special cases or the value of looking for better ergonomics even at the expense of university. However, at the moment Rust is the best option for practitioners who don't have enough time to juggle multiple languages.
"C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user." Ellis and Stroustrup, The Annotated C++ Reference Manual.
- The presence of a GC doesn't guarantee memory safety since there are sources of memory unsafety that GCs don't/can't cover (i.e., escape hatches and/or FFI), not to mention the possibility of bugs in the GC implementation itself.
- The absence of a GC doesn't preclude memory safety since you can "just" refuse to compile anything which you can't prove to be memory-safe modulo escape hatches and/or axioms and/or FFI (and bugs, unfortunately). Formal verification toolchains for C (Frama-C, seL4's setup, etc.) and Ada/SPARK's stricter modes are good examples of this.
In the case of Rust:
- `unsafe` blocks (or at least their precursors) were added in 2011 [0].
- Rust's reference-counting GC was removed in 2014 [1].
That's why I think "ripped out their GC" is a bit of a non-sequitur for "got rid of the safeties". Rust wasn't entirely safe before the GC was removed because `unsafe` already existed. And even after the GC was removed, the entire point of Rust's infamous strictness is to reject programs for which safety can't be proved (modulo the same sources that existed before the GC was removed), so the removal of GC does not necessarily imply losing memory safety either.
Just not the cheaters: Rust, Java (until recently), and of course Javascript with its unsafe implementations.
Memory safety bug in a proper lisp? Unheard of, unless you break the GC or do wrong FFI calls.
In the mean time, in the messy world of writing software today, one does frequently enough come across the need for new safety primitives. Things that are provably correct but which the type system of your language does not support. In these instances, unsafe lets you lower down into systems code to build and safely wrap these new components.
But there exist safe systems programming languages. Safe systems were done in these languages. Just nobody cared, so they died or have no market share.
Mesa/Cedar: LOOPHOLE inside an UNSAFE module
Smalltalk 80: You can access the built-in VM instructions directly using the <primitive: N> syntax. E.g. basicAt:put: at <primitive: 61> avoids bounds checks and type checks.
Common Lisp: (safety 0)
Ada: Unchecked_Conversion, Unchecked_Deallocation
C#/M#: unsafe class MyClass<T> where T : unmanaged
Oberon: the SYSTEM module
Rust isn't any different, and unsafe exists in Rust for the same reason it exists in all these other languages. You use it to create new constructs the language authors didn't foresee. Concurrent Pascal is the singular exception: You can't do anything the language doesn't provide out of the box. Need something Brinch Hansen didn't think of? Sucks to be you.
If you want the rusty version of Concurrent Pascal though, you can have that. Put #![forbid(unsafe_code)] in your crate's lib.rs, or your Cargo.toml. Done.
There's a lot you can do in Ada without resorting to them, and even with using them it can be perfectly fine, such as (eg) "view conversion" of a register or memory-mapped location -- remember that a lot of APIs (and ABIs) have been kneecapped by catering to C's inabilities, so even if the idea is directly expressible in some higher level language it will be exposed at the lower level.
I'm not trying to fight some religious war here. You want to use Ada? Great! But don't pretend that the existence of the unsafe keyword somehow makes Rust unacceptably impure. The same escape hatches exist in every other "safe" language outside of the one toy example whose entire niche purpose was to avoid unsafe, and which no one uses now because the language is ossified and non-extensible.