Why Rust's ownership/borrowing is hard(softwaremaniacs.org) |
Why Rust's ownership/borrowing is hard(softwaremaniacs.org) |
I'd submit it doesn't create them... it reveals them. They've always been there. Almost every other language fails to shine the light on them, but that doesn't mean they aren't there. All GC'ed languages still have issues of ownership, especially if threaded, and all non-GC'ed languages have all the issues Rust has... it's just that the language doesn't help you.
Sure, in practice, if you keep your code neat and well designed, and hopefully not too big, it's hard to really trigger such bugs, but in a bigger codebase it's so easy to overlook them...
Code that takes a full structure when it only needs to operate on a part of the structure is badly designed. It's not conveying the full information about the data that it actually needs, which means that unexpected dependencies can crop up, implicit in the body of the function, as the code is modified later on. This is behind a lot of long-term maintenance messes; I remember a few multi-year projects at Google to break up "data whales" where a single class had become a dumping ground for all the information needed within a request.
Thing is, we all do it, because taking a reference to a general object and then pulling out the specific parts you need means that you don't have to change the function signature if the specific parts you need change. This saves a lot of work when you're iterating quickly and discovering new requirements. You're trading ease of modification now for difficulty of comprehension later, which is usually the economically wise choice for you but means that the people who come after you will have a mess to untangle.
This makes me think that Rust will be a very poor language for exploratory programming, but a very good one for programming-in-the-large, where you're building a massive system for requirements that are largely known.
So today this helper function borrows one member and tomorrow it might need to borrow two, and every time you're supposed to change the interface to explain exactly what's happening. This is really tedious for a private helper and can be seriously problematic for a public interface, you don't do it in garbage-collected languages (which move the issue from the static type system into the dynamic run time system) and you don't do it in unsafe static languages (which move the issue from the static type system into your brain, which gets things right, some of the time.) The cost of moving it into the type system is as real as the benefit (not necessarily as big or small - it's hard to quantify these things and you need context - but certainly just as real) and it'd be great to see both acknowledged instead of having one or the other denied.
In a GC'd language without an effects system, passing in a reference to the outer structure would still prevent the GC from freeing the outer structure (depending on the ABI, even if you null out the reference passed outer reference inside the function, the value passed on the stack might be immutable) and would also mean that you need to be careful about later changes causing non-thread-safe mutations to the outer structure. If your language isn't GC'd and doesn't have an effects system, then you need to manually keep track of the borrowing.
These aren't artifacts of Rust's type system; they're genuine side effects that are present but more subtle in other languages.
However, I'd still say that issues arising in sharing a composite structure in pieces have always existed. Rust's solution may or may not be correct, but the issue is not something it is creating. Adopting Rust does not mean adopting a brand new set of problems that never existed before. It means seeing them clearly for what is probably the first time.
In the example in the article, "is_origin(point)", the code for which is not shown, is clearly bogus. A function that's just a predicate should not consume its input. It should use read-only access by reference.
One big advantage of Rust is that, because the ownership checking is safe, you don't have to make copies of things just to simplify memory allocation control. In some C++ GUI libraries, strings are copied again and again to prevent memory allocation errors. Rust should be more efficient. It's going to be interesting to see if Servo puts a dent in browser memory consumption. It's insane that browsers now can need more than 1GB of RAM. There have to be multiple copies of the same data.
Most of a browser's memory consumption usually consists of JS and DOM objects. In effect, the pages you're visiting are actually doing the bulk of the allocations, not your browser.
Hopefully, but not necessarily. Any (decidable) type system rejects well-typed programs (i.e., it "uncovers" problems that are not actually there), and the borrow checker is no exception. You will write some correct Rust programs that the borrow-checker just can't verify. This means that you will need to explain yourself in more detail (through more work) to the type checker, even though there was no mistake in your program. I think this is a good thing (depending on your requirements), but it isn't free.
This is not true. Ownership is only relevant in the presence of mutability. In a GC'ed language where most or all data is immutable, one rarely needs to think about ownership. In Rust, one needs to think about it all the time.
Rust is certainly a leg up compared to trying to write multithreaded C or C++ code, but that doesn't mean its approach is free of drawbacks.
Empirically, even in Python - a mutable-by-default language - I find myself rarely having to think about ownership. That could be an artifact of the kind of Python programs I find myself writing, though. I'd be interested to hear other people's experiences on that front.
That said, programming languages (on their own) can't really do anything about other processes/computers interacting poorly with yours! The ultimate semantic of ownership is still there, though.
There's also the ST Monad, which provides safe mutability in Haskell by enforcing that the mutated state doesn't escape a region of the program. This is literally the same idea as the borrow checker.
"I find myself rarely having to think about ownership. That could be an artifact of the kind of Python programs..."
Python still can have action-at-a-distance, where things unexpectedly change as a result of executing code. The single-threaded analog to a race condition is less severe because it's at least deterministic, but can still make programming difficult to understand.
If mutability isn't needed, then you would declare the self parameter as a simple (non-exclusive) borrow, and you can borrow from it or parts of it freely without inteferance from the compiler.
I have six other things that will kill my app faster than a memory leak, but I have to design this shit first? No thank you. Rust is on my list of things to learn in 2016 and I'm hoping its borrowing semantics will feel like a solution to this problem without having to sign up for nondeterministic application pauses in the bargain.
Now when I code C++ my Rust knowledge is a double edged sword. On one hand, I have a much better idea on how to manage my data in C++. I had this discipline before learning Rust, but I didn't have explicit rules to it; it was just a ... nebulous bunch of idea about how data works. Now it's explicit. On the other hand, I am absolutely terrified when writing C++ code (something I would do with ease in the past). Well, not exactly, but it's hard to accept somewhat-unsafe code (which is probably safe at a macro level -- i.e. safe when looked at in the context of its use) and while I can see that something is safe, I can also see how it could become unsafe. And I fret about it. Rust trains you to fret about it (more than you would in C++, that is), and simultaneously Rust handles it for you so you don't have to fret about it :) C++ doesn't handle it, but you still fret about it sicne Rust taught you to.
I guess it's a "Living is easy with eyes closed" thing :P
https://www.reddit.com/r/rust/comments/45gcmh/why_rusts_owne...
"Given all that, I wonder if it makes sense to prefer plain old functions most of the time. Is that right, or am I overlooking something?"
The response was yes. Avoid impl methods which take a mutable self.
EDIT: There still is a `move` keyword, but it is used to indicate that closures should take ownership of their environment vs. just borrow values from it, not to move individual values.
pcwalton said that back in the 0.1 days this was actually implemented, and it was very confusing.
Edit: I'm referring to the `&` operator which creates a reference/pointer to the memory it precedes.
I'm not even a Rust programmer and only read three paragraphs about the borrow checker and I instantly saw that point is moved.
Of course, using Rust is even cooler.
If you're used to writing systems languages, you might want to learn it because it's the highest-level language I know that prays to the gods of zero cost abstraction. You still get all the control from C/C++ while gaining several convenient features. ADTs and pattern matching are personal favourites. It's widely known that Rust automates a lot of manual memory management without a GC, but it's not as widely known that the same language features also provide very very good assurances for concurrency: it forces you to acquire locks before you can touch synchronised data, it ensures that good behaviour around multiple consumers or single producers for shared memory, etc.
If you're used to writing in the likes of javascript, python or ruby, Rust is a wonderful gateway drug to systems programming, and it's probably the most accessible alternative. Instead of seeing the compiler/borrow checker as bitching and moaning at _everything_, you should see it as holding your hand and helping you navigate some of the trickier bits in writing safe code. As a bonus, it's probably one of the sanest languages you can use for writing FFI code for your language of choice when you just need the extra performance.
If you're a functional programming fan, you'll soon get the sneaking suspicion that Rust is a wolf in sheep's clothing. It's an expression based language with immutability by default, it features Algebraic Data Types and Pattern Matching, uses Result<> (think Haskell's Either) for error signalling instead of exceptions. The closest thing to "object orientation" are traits, which are actually much closer to type classes.
So what do you think of productivity in Rust vs python/ruby/js? Is it a lot slower to implement something in Rust? While it has a lot of high level features, it doesn't strike me as particularly concise compared to, say, haskell or python..
I'm currently building a CI tool (which will hopefully be open sourced once I get approval from our legal department) with a colleague of mine who mostly chose to use Rust because he needed an excuse to learn it...
As for why Rust in particular, over the years I've convinced myself that software quality is one of the big ails of, well, anything which concerns itself with the creation of SW. Which is more or less anything these days in technical professions. Strong static typing, memory safety, data race freedom etc. are of course no panacea, but are at least IMHO a step in the right direction.
FWIW, I also think Haskell is totally awesome, but with my background in mostly procedural/OO languages and with small kids at home, I feel I cannot at the moment afford the time to become really productive in a pure lazy functional language in a reasonable time frame. It's on my someday TODO list, though. Oh, and I'm somewhat of a speed freak. So hence Rust.
For what it's worth, Haskell can be compiled to code that's quite fast. Somewhat unfortunately, the difference between GHC's vanilla output and its speed-optimized output can be quite vast, which I think gives the impression that the language or a given program is necessarily slow.
On my case, just dabbling given my background as language geek.
I guess what I'm referring to is that I think reasoning about the borrow checker is easier when references are opt-in instead of being the default. That behavior is re-enforced by the fact that the language's default is ownership. So I only end up needing to think hard about the lifetime of a variable when I've decided to (or been explicitly forced to) use references. I think it's a good way to reduce the cognitive burden of an already unfamiliar mechanism.
EDIT: This also gets at one of my favorite parts of Rust: so many choices have been made in designing the language and standard library that make it so easy to do things "the right way" (either through those things being the default, or making it hard to do stupid things, etc.).
Gecko has troubles with leaking the DOM because it CCs the DOM. We don't need to worry about that since Spidermonkey's GC manages the Servo DOM, too.
is it? I am no haskeller, but I think I have seen some presentations or read papers in which lazyness by default was mentioned as something that haskell got right.
Do you have some links to read up on what you say?
C++ is used by research algorithms used in HPC context or by the device drivers for the readers.
In this case the data sets are around 1GB, but I don't know what they are actually loading into memory.
I really don't agree with this comment (I program in Rust a lot). Borrowck is a godsend in many ways, but this is a weakness (that can be improved!). It prevents things like `self.mutate_my_foo(self.access_my_bar())`. There are workarounds for the problems this presents, but they should have to be.
It is _great_ that borrowck helps you control aliasing and mutation of state. It is _frustrating_ that borrowck can't distinguish a borrow of `self.foo` from a borrow of `self`.
"Law of Demeter" is enforced through privacy - as the `self` in the example above shows, these are all happening within the private scope of the 'object.'
1. You can play with yourself
2. You can play with your own toys (but you can't take them apart)
3. You can play with the toys that were given to you.
4. And you can play with toys you've made yourself.
http://c2.com/cgi/wiki/LawOfDemeter?LawOfDemeterPut simply, it means that you shouldn't attempt to destructure or inspect the arguments that were passed to you. If you're passed a point and need to access point.x and point.y, then you're a method on the wrong class; you should be a method on Point instead. If you're passed a file but only need to access file.path, your parameter type is wrong: you should take a filepath instead and let your caller destructure for you. If you need to access foo.bar and foo.baz but foo has 20 data members, you should collect bar and baz on its own sub-structure and pass that in directly, or better yet, make your function a method on the sub structure. If you need to self.mutate_my_foo(self.access_my_bar()), you should call self.foo.mutate(self.access_my_bar()). And so on - the point is for each function to have the minimal knowledge necessary to complete its task, and any decisions unrelated to that task should be propagated up to higher levels of the program.
I won't deny that this is frustrating, and I thought I acknowledged that in the original comment. The Law of Demeter has been very controversial in OO circles, because it's so restrictive that pretty much nobody can actually adhere to it without creating so much work for themselves that their project ships late. In forcing your code to always use the minimal set of data necessary, you force yourself to change the code (including many potentially highly-used APIs) every time you add or remove a data dependency, which is usually impractical. The whole category of dependency injection frameworks was invented to automate much of this plumbing work.
But I find it fascinating that Rust's borrow-checker has basically forced it down on one side of the tradeoff. It has a bunch of implications for what Rust is good at and what Rust is not good at.
The issue here has to do with 1. borrowck's inability to infer the delimitation of mutable access to members of product types vs to the entire type, 2. borrowck's limited understanding of the order of evaluation.
I love Rust and want to write in Rust all the time, but you are overhyping the benefits of borrowck.
EDIT: A reason the law of Demeter is not great, in my opinion, is exactly a strength of borrowck - the issue isn't how much state a scope can read, but how much state a scope can write.
No, it is not; this is done all the time with methods and it improves encapsulation - you may not want your clients to be able to decompose your data structures. Do you really mark every member of your data structures as pub??
Sorry but this is a poor ad-hoc defense of an actual annoyance in the borrow-checker.
I think this was a sensible statement, especially in context; I strongly agree that "this is behind a lot of long-term maintenance messes". And lost performance.
For encapsulation in Rust, traits are used to abstract and separate concerns, but they don't force you to bundle your data into large structures.
And encapsulation isn't an end in itself. Privacy has its uses (maintaining invariants, minimizing the exposed surface area of a library, etc.) but I find often in OO codebases that encapsulation creates its own problems. There is no substitute for careful data-oriented design; no amount of `private` will prevent your teammates from working around or ripping apart your carefully shrink-wrapped objects.
There is certainly some awkwardness in the borrow checker, but also great value.
Conciseness isn't really an issue IMO. You can type fast, and your IDE can probably autocomplete. Note that the explicitness in Rust also makes it harder to make mistakes, so I feel that some extra code here and there isn't a bad price to pay.
As for conciseness, I do think it matters; it's not about how fast you type really, but how fast you can read what you wrote a week later. But conciseness is perhaps a bit of a blunt term, and maybe needs a bit of qualifiers. Now, take this with a large grain of salt due to my very limited knowledge of Rust. But, say, for a short program/script I guess python "wins" clearly since you can just let it bomb out with a stack trace if something goes wrong, whereas in Rust you have more explicit type conversions (.to_string() etc.), and you have to handle errors somehow, even if it's just with try!/unwrap() etc. which adds clutter. But for a larger program, maybe the difference isn't so big anymore, since Rust, unusually for a "close-to-the-metal" language, has a lot of higher-level functional-style features, and for python you probably want to handle errors in some way anyhow?
Eventually you'll learn to glance over unwrap() and try!() (well, unwrap() less so, since it's a bad idea to keep it around in your code) and the code you're reading becomes pretty straightforward to read.
My one annoyance in reading Rust code is that because of type inference (which is a super awesome feature otherwise and this annoyance is nowhere close to being enough to make me dislike it) you sometimes don't know the type of the thing being worked on, so getting an idea of what the code is doing is harder (of course, python has the same issue with dynamic types). I've recently started using YouCompleteMe, though, which has pretty decent "Jump To Definition" support which lets me quickly jump around the types and figure out what's happening in such cases.
It's a much younger ecosystem, so there's not always a library though. Then it clearly takes longer, because there's a lot more to do. We do have a pretty respectable number of libraries for our age though, I'm constantly surprised at what DOES exist.
My meaning was that you should favour a usage pattern that looks like you either move/copy things into the called method, or lend a reference to something you own (which is, presumably, not going to be held on to for very long), and then you're either given ownership of whatever return value you get, or get a reference whose lifetime depends on the arguments you passed in (but not the object itself). All of this ends up being quite clean, and you don't end up tying yourself into a borrowing knot.
You do end up in a weird place when your methods return references to fields of the owning object. When that happens, you're restricted in what you can do with the owning object until the reference goes out of scope. Rust mutexes are implemented precisely like that, which highlights what sort of behaviour you're getting from this usage pattern.
The former provides better encapsulation and more closely resembles the message-passing approach to OOP, whereas the latter pattern is not only not very ergonomic, it's quite indicative of poor encapsulation (because you're, by necessity, asking for internal state).
All it does is force you to use unnecessary temporaries, like `let baz = self.baz(); self.foo.bar(baz)`
Note that there are cases where such code is invalid even with the temporary, and they can be related to Demeter. Ish. Also to API contracts; the guarantee should be embedded in the signature (so changing the internals shouldn't cause its usage to stop compiling), which is unweildy to do.