Implications of Rewriting a Browser Component in Rust(hacks.mozilla.org) |
Implications of Rewriting a Browser Component in Rust(hacks.mozilla.org) |
But the idea that I don't have to think about certain classes of problems allows me to give these stupid things more focus, which is surprisingly refreshing.
The predictable nature of Rust was so refreshing for me that I ended up using it even for smaller reusable scripts where I would happily have used Python before but soon got annoyed with obvious errors that would only show up once you run a program.
If you e.g. have a `print foo` in some obscure branch that rarely happens, that print will ruin your day if you use Python 3. If python would be a little like Rust you would get on save (or at least on compile) a hint or error, that the print should look like this: `print(foo)` for Python 3. You can be incredibly careful and rust will still catch things now and then, that would have gone unnoticed into production unless you have immense test coverage.
I like Rust for the experience I had with it. It definitly changed how I approach certain problems in a very good and productive way, even when I don't use it.
Fast forward to ~2 months ago, a work project dictated tight control over memory which, while possible in Go, had me looking at alternatives. I decided to give Rust another try. This time it wasn't just an evaluation, it was needed to work so I bought and Rust book and spent some after hours time learning/etc.
This time, Rust has been an absolute joy. I have no understanding why last time was so painful, and this time it's been so amazing. Maybe it was the book[1]? Maybe it was just a 2nd round of learning based on my previous experience? Regardless, it's been great.
There's just so many mental overheads like what is concurrent safe, what is non-null, etc that are just great to not think about anymore. On top of that, the formatter and LSP are just great. It highlights in my text editor (Kakoune) what variable caused an error, where it gets moved incorrectly, etc. So much just works, it's great.
My only complaint these days is:
1. I find it odd that some things like slice reads can still panic by default. Yes, I can use `foo.get(1)` to avoid panics, but still - it's a bit odd to me. 2. I'm anxiously awaiting async/await. It's quite difficult to be patient.
[1]: Programming Rust: Fast, Safe Systems Development
Panicking is a perfectly good way to handle the situation where an invariant of the program is violated. That is, it is perfectly fine to index using [] in cases where there cannot possibly be an out-of-bound access unless the program has a logic bug. And if it does have a bug, there's usually not much that you can do except abort because you can't trust the program anymore.
On the other hand, if you know that your input might be OOB without violating a precondition, as a part of the normal execution of the program, and you can handle it gracefully, use `get` instead.
This distinction between "expected" and "unexpected" errors is an extremely valuable one, and one unnecessarily muddled by languages where all errors are signaled with exceptions. Rust gets it right.
It has been my experience that if I try to learn something hard and give up and then come back to it a few months later, I have a way easier time even with no apparent changes in process. I suspect some of it is giving your brain time to process fundamentals or something.
I wonder if this is similar to C++'s `[]` vs `at`. `at` does implicit bounds checking but, as an optimization, if you are already doing an explicit bounds check, you can elide the implicit check via `[]`.
From what I gather you are comparing static vs dynamic typing, and I agree. I do not use Python because I prefer a subset of errors caught at compile-time. However, it is silly to make it sound like that this is somehow limited to Rust. You could just as well have used Go or OCaml and feel "refreshed" because obvious errors would have been caught at compile-time.
> The predictable nature of Rust
Again, given the context, it is static vs dynamic typing. This is not something limited to Rust. In case you were not talking about static typing and its pros, could you please tell me what kind of official formal proof tools exist for Rust that makes it predictable?
My experience was is: I have never seen a language/environment where these rules were so well thought out, so well communicated and so well enforced. The package managment works like a charm, the module system (since edition 2018) is the best I have seen etc.
These are all subjective – I am a film/philosophy student who codes and I never received any formal education on any CS related topic. I was speaking about why Rust feels sound to me. Good chances Haskell will look sound to me too, very good chance I might check out more CS topics. No hard feelings please.
I feel like you aren't really asking a question, that your intent was really to lecture someone, but in case you really are:
* a borrow checker * option types
So OCaml would catch a similar set of obvious errors, though Rust would also catch all data races at compile time, as the borrow checker does that as well as prevent memory mistakes.
It’s not a panacea—and to be blunt, I find Rust more work than it is worth for the sort of projects I typically work on. But any tools which can eliminate whole error categories are worth looking at for sure!
It is more a question of semantics.
So the scenario described is possible to escape simple tests, I suppose.
You can use any compiled language for that though, and even some uncompiled ones (PHP will refuse to run the file)
I'm a big fan of Rust, but the one sided 'Rust makes all the problems go away' articles don't provide any value.
It is great to see people rewriting in Rust where it makes sense.
I choose Rust because taken as a whole, Rust changed the way I approached laying out my memory and how I composed my code. I think this more than anything leads to less issues than the equivalent C++. The article points out that a Rust vs C++ solution to any given problem are going to be completely different.
My only desire for Rust is to see compile times speed up and the C++ interop to improve.
Is this from a project you did? This is weakness, for sure, but I'd be interested in hearing more about why it failed for you! We have some people working on this.
The potential party spoiler being third party code that's not practical to replace. In some cases even decades won't change it's nature, as a few examples have shown.
(It marks all untrusted input as tainted and programmer must explicitly parse the data or mark them untainted to pass them further.)
And of course Rust is a great technology, etc.
In general, we don't want to make it easy to turn checks off. They get removed if the compiler can prove they're not needed; if they're there, they're almost always for good reason.
That said, this would be a user implementation and wouldn't be likely to be provided by the standard Library
We totally tried to write this twice then we switched to language x and everything is great. Feels like something a language zealot would say. I would scoff if someone at my company rewrote a core section in a different language. It is their language so maybe they just told them to do it that way ha.
And here I thought rust was all about zero cost abstractions.
Concretely, the article claims that in Rust the vulnerability doesn’t become a bigger problem because it simply crashes at run-time due to built-in bounds checking. True, but that is alao the case as well with C++ if you were using the equivalent Vec type with mandatory bounds checking - which many projects do (and, critically, enforce).
Personally, I like what Rust brought to the compiler/language world. However, some people is definitely overstating the case. Most non-trivial memory-safety errors and vulnerabilities are related to runtime problems like the example shown. In these, no language can help in the general case - we are not solving the Halting Problem. Therefore, saying Rust is immune to memory-related problems is not true. It is true, however, that those bugs will not trigger anything worse than a crash if there is no unsafe blocks. The same way that many other common languages out there do (Java, C# and many others).
The same way, I have seen people (and even the linked blog) to claim Rust is free of race conditions or thread-safety issues (even if it introduced great ideas to write correct code).
Giving a false sense of security is the worst thing we can do.
It is not realistic either to ask everyone and every company to rewrite all their C/C++ code in Rust. Even if it were financially doable and a rewrite were to happen, in many cases it would simply be best to move to a language like C# anyway, not Rust; for productivity reasons. Where performance allows, of course.
In my opinion, the realistic and pragmatic solution is, instead, to strive to make all languages (in particular C and C++) embrace security-first approaches/types/mechanisms like Rust does. The compiler tecnology is already written - now retrofit as much as possible into C++ (even to the point of introducing a “safe” scope if needed) and allow companies to embrace it at minimal cost and progressively.
Also of note was that there was already an automated test for one of the high priority bugs that got reintroduced but the that particular tests was turned off.
https://www.reddit.com/r/programming/comments/atyzz4/halley_...
For one thing, idiomatic C++ bounds checks by default. You need to use at(). If you don’t like typing at(), you can implement an array type that always bounds checks fairly easily. On that note, the vulnerable c++ code should be using accessors, not indexing to access the oddly packed and laid out array. Even the fixed version wouldn’t pass a code review from me. You could write equivalently bad code in any language that supports array types, and get similarly broken results.
For another thing, there’s no evidence that you couldn’t achieve the same improved data structures in C++ using its type system (which is turing complete...)
The “thread safe by default” property sounds interesting; I’d be interested in reading more about that.
Sounds like it doesn't check by default then. It checks if you remember to check using the more verbose bounds-checking method. Not unlike the issues with subscripting std::map.
You wouldn't get "similarly broken results." The results for doing this in C/C++ are far more serious, which was a point the article made.
If you do this in Rust/C#/Java/etc they will safely crash. If you do this in C/C++ it is undefined behavior, it may crash, but it also could allow remote code execution.
The Rust version of this bug is Security-Low (crash), the C++ version is Security-Critical (potential RCE).
This property is simply a corollary to Rust's ownership invariants enforced by the borrow checker. There can only be a single live mutable reference to any object, therefore two threads can never hold mutable references to the same object simultaneously. Similarly, all objects must have a lifetime at least as long as the objects which reference them. Therefore no thread can ever hold a stale reference (mutable or immutable) to an object. If code passes Rust's borrow checker it must necessarily be thread-safe.
The borrow checker doesn't need any special knowledge of threading, though AFAIU there are traits that permit the compiler to check that you're using the correct boxing type when passing objects to threads. Objects which implement these and other traits are responsible for maintaining ownership invariants using unsafe code.
TL;DR: Threads cannot share mutable references because no code can share mutable references in Rust. It follows that any Rust code is thread-safe, ignoring bugs within or induced by unsafe code.
This is also the normative approach to writing "thread-safe" code in Unix programming more generally. Most libraries that grew from the Unix culture are only written to be re-entrant--they never hold references to objects shared outside their encapsulation or execution scope. Therefore most such libraries claim thread-safety provided that callers maintain the same re-entrancy invariants for library-defined objects. Without needing to use mutexes such code is thread-safe, you just don't normally get compiler verification. Contrast that with Windows programming or, especially, Java, where the expectation is that objects are shareable and guarantee thread-safety internally. Depending on which programming culture you grew up in, Rust's thread-safety is either obvious ("oh, it's just enforcing re-entrant-safe APIs") or magical ("how does it insert locks in all the right places?").
Caveat lector: I've never written any Rust code.
That's not "by default". std::array_view still hasn't landed so I can't even wrap ptr+size s provided by third party libraries or across standardized C ABIs without going beyond the standard library (really not by default now)... and I believe I have yet to see a single solitary use of at() in a production C++ codebase. Some "default".
For code where overflows are expected, C++ exceptions are way too heavyweight and some other kind of "attempt dereferencing" pattern is used (if only checking the index manually before invoking operator[]).
For code where overflows are unexpected, uncatchable assert-style checks are used.
If so, how much do they rely on the developers being ideal programmers, who know intimately the intricacies of C++ and how to avoid ending up in UB-land?
Edit: To be clear I'm not saying anyone isn't using it, this is just more of a comment about the impression I can get when I hear about X tech is so cool, but that's most of what I hear and at some point I want to see those same articles about real world use / experiences. I'm aware Mozilla and others are using it.
Huh? Besides Mozilla itself using it in the browser in several backends, there are tons of places where its used in production (and several articles on HN on the topic).
Where do you see all those demos who say "I'm not using this for anything"?
I've seen articles on Mozilla using Rust, Dropbox using Rust, MS using Rust (rustgrep used in VSC but also something in Azure IIRC), Google using Rust (Google Fuchsia), Facebook using Rust (Mononoke, etc), and so on.
Been a lot of fun to work with. Having spent 10 years doing C++ professionally before, I can't see myself ever looking back.
Once product is public later this year, I may see if I can talk more publicly about it.
It works really well.
If we were to do it again, the core Rust lib would do much more and the platform native code would only be things that need to be like UI and notifications.
Java’s executors improve the world, but I can’t count the number of concurrency bugs that I had in Java that are not possible in safe Rust. This isn’t just my code, btw, but large teams where it’s hard to disseminate good practices when building threadsafe code. If it had been in Rust, those issues wouldn’t have happened.
Other things that I appreciate about Rust over Java is the error handling combined with RAII, doing away with nasty bugs around try’s lacking proper finally statements for closing file handles, etc.
Java has its warts, not everything is just about memory safety.
Data race is one other thing, I do have data race issues but that is very rare. Some of the static analysis tools even catch these anti patterns.
Compared to highly optimized Java and C# I could often get a quite naive rust implementation to be 10x faster.
Naive rust means I didn't spend much time optimizing but I do use appropriate algorithms and to avoid unnecessary allocations.
https://developer.android.com/reference/android/util/FloatMa...
For a webapp this doesn't matter, but as we're moving towards more cloud functions it start to make a lot of sense.
That needs to be ponderated by the fact any real life application will have to access the network at bootstrap to load configuration and therefore your bottleneck will most likely be I/O.
Though I doubt that Java usually wins against the Eigen C++ library. Eigen uses some template tricks to fold operations together (which can avoid intermediate storage and unlocks more types of restructuring optimizations) and uses SIMD extensively.
More nuanced ones can often be caught by using all args constructors and requiring the constructor values with Objects.requiresNonNull() or similar. Using spring? Don't use field or setter injection, always constructor so the above applies as well. Making the state of your objects largely immutable means their state is more consistent and what is null and when becomes a lot less surprising.
Lastly, write good tests. If you're exhausting the behavior of your system with tests, these things are much less likely to surprise you later.
NPEs are definitely a problem with Java, but they're a very avoidable one as well.
Edit: I don't understand the downvotes. The parent said they're a huge deal and I'm disagreeing because I think they're relatively simple to avoid?
If you want to combine the two concepts, I guess you could go for an example like this:
if index >= vec.len() {
panic!("out of bounds");
}
let value = unsafe { vec.get_unchecked(index) };
which can be rewritten using higher level abstractions as: let value = vec[index];
The second is more abstract, and that abstraction is zero cost, you do not pay for using the abstraction more than the explicit code above it.if you can't use iterators, and you need to omit bounds checks for perf, you can do so with unsafe blocks.
Rust requires a lot of runtime checks, but that's the price one has to pay for memory safety.
It requires a lot of runtime checks on the boundary between rust and C/C++ code. You cannot trust C/C++ code, so you are forced to check everything. C-function returns a enum value? Check that this value is in the enum, before converting it to the rust enum. Check that there are no situations like
enum MyEnum {
val1, val2=5
};
enum MyEnum foo() {
return 42;
}
But after you've done all that checks you need to check almost nothing, because if you have a enum value in Rust, you know for sure that it is a valid value. And compiler knows that the value is valid and optimizes accordingly. If you have an &str, you know that it is a valid utf8, you need not to check it on each access, it have been checked already. You got a valid pointer from C? Wrap it into NonNull, so neither you, nor compiler would need to check validity of a pointer once more.It is unclear thing, who needs more checks -- safe code or unsafe code. I'm gravitating to a belief that unsafe code needs much more runtime checks.
I remember when C# came out almost 20 years ago. People said "I can forget about managing memory so I can focus on the logic". Programs kept crashing, memory problems were still there.
The article goes with "...remove the burden of memory safety from our shoulders, allowing us to focus on logical correctness and soundness instead...". More or less the same, and admitting that said problems won't go away.
But here we are, it's 2019 and we're still using C/C++ as if nothing happened.
These days I'm wanting out of the Terminal - but wherever I go, I'll need to take Kakoune's style of modal navigation with me. I love it too much.
_(For reference, I'm hoping Xi editor implements the foundation of modal editing, and then I can make Kakoune style editing work in Xi Editor)_
> I never claimed that Rust was the only language that has these properties.
Yeah, that is my mistake. Sorry for misunderstanding you!
> I never received any formal education on any CS related topic
Same here actually! :P
> No hard feelings please.
Not at all!
I understand that you like coding in Rust, and that is completely okay with me, and even if it was not, that should not deter you from continuing what you are doing.
The hard thing is not understanding the concepts but learning what this implies for your coding style, especially when you come from a garbage collected language like Python.
Both concept can at times give you hard to solve tasks, but it usually is because the problem you are trying to solve is hard and you cannot just naively hack away without thinking long and hard about it.
Other than that I learned liking the borrow checker and lifetimes because you will get a incredible narrow scope of which variables can matter when — which makes debugging much easier
That's not an expectation on either. For example, the single most common phrase you can see on MSDN in class docs is: "Public static members of this type are thread safe. Any instance members are not guaranteed to be thread safe."
There was a period when it was different for some things. In particular, both Java and .NET had thread-safe standard collection classes initially - e.g. Vector in Java, ArrayList in .NET. This has proven to be a bad trade-off in both cases - it's a massive perf hit for something that's not even all that useful even to threaded code, because in practice you often need to perform multiple operations on the collection atomically, and then you still need your own lock.
So they have since been obsoleted by new collections that do not attempt to do any thread synchronization; the old collections remain for backwards compatibility purposes. Modern idiomatic Java or C# code doesn't do any kind of synchronization or locking to protect the caller, unless that is specifically the purpose of this class or function to provide such things.
So I wouldn't say there's significant cultural differences between Windows, Unix and Java in that regard.
From my experience it is not hard to make it a habit (e.g. using `::at()` and declaring variables as `auto const x = ...;` is muscle memory at this point) but the code becomes very verbose and looks like programmed defensively to a sometimes ridiculous extent.
So I understand your skepticism. To address the point of relying on developers being "ideal": just turn the compiler warning flags up to eleven, make all warnings errors, and run your tests two times - one time with sanitisers, and the other under Valgrind. This won't catch all errors, sure, but will still make you more confident in your code's reliability.
Issues arose at the interface between java and kotlin. Unless there are @Nullable @NonNull annotations (and they need to be truthful), the kotlin compiler cannot know the nullability of something coming from a java method.
It can be pretty pernicious : if you use some java written libraries like Moshi (json parsing), it can also lead to crashes : IIRC if you declare a moshi generated property to be non null but it is absent from the json, it will generate an object with null, creating a crash.
Still, null is now an anecdotal issue in Kotlin. The Android framework team is working on annotating all their APIs with the corresponding nullability annotations and more and more JVM libs are also working on handling it gracefully.
It was never a huge issue to begin with even in java, just pretty cumbersome to have to add annotations everywhere and have to add some policies like 'no null collections, only empty' in a codebase to have sane handling.
pub fn decode_12be(buf: &[u8], width: usize, height: usize) -> Vec<u16> {
decode_threaded(width, height, &(|out: &mut [u16], row| {
let inb = &buf[(row*width*12/8)..];
for (o, i) in out.chunks_exact_mut(2).zip(inb.chunks_exact(3)) {
let g1: u16 = i[0] as u16;
let g2: u16 = i[1] as u16;
let g3: u16 = i[2] as u16;
o[0] = (g1 << 4) | (g2 >> 4);
o[1] = ((g2 & 0x0f) << 8) | g3;
}
}))
}
The "decode_threaded()" call is a function call that passes in a closure with the inner loop to a generic function that is used to multithread a bunch of similar decoders, instead of repeating that code. And the for loop is actually describing what I want (process every 3 bytes into 2 output values) instead of having me manage some iteration variables and have off-by-one bugs. "chunks_exact" and "chunks_exact_mut" are recent additions that allow me to say that I only want to receive exactly 3 bytes and output exactly 2 values, so if the array is improperly sized the extra at the end just gets skipped. This not only matches the intention of this code (I only ever process 2 pixels from 3 bytes and nothing else will work) but also gives the compiler a better way to lift the bounds checks out of the inner loop and make the code significantly faster (by 2x in some decoders).Now let's see "decode_threaded":
pub fn decode_threaded<F>(width: usize, height: usize, closure: &F) -> Vec<u16>
where F : Fn(&mut [u16], usize)+Sync {
let mut out: Vec<u16> = alloc_image!(width, height);
out.par_chunks_mut(width).enumerate().for_each(|(row, line)| {
closure(line, row);
});
out
}
Besides allocating the image it uses "par_chunks_mut" to make the code threaded. That's a function that's provided by the rayon crate which manages a threadpool and it's scheduling for me. So I'm even using code written by other people, to build something that's apparently deeply nested even though it needs to be fast. And yet after all this indirection and syntax goodies the end result is efficient machine code that matches or is even better than the original C++ code and not the dynamic runtime like behavior of Ruby/Python that you'd expect from such code.That's what zero cost abstractions are. After memory and data race safety I think rust shines because it gives me a lot of the ergonomics of Ruby with the speed of C.
try:
import collections
except ImportError:
# fallback impl of collections:
import something_that_uses_print_statementhttps://www.reddit.com/r/programming/comments/atyzz4/halley_... has some more context too
Similarly, I could complain that rust arrays are too slow, and produce an array implementation that uses unsafe under the covers.
https://github.com/dropbox/rust-brotli-decompressor/blob/mas...
Toggle the feature, remove the bounds checks unconditionally.
I think the problem is solved by paying less attention to the stuff you care about less, and/or seeking out more production(y) uses if desired by looking in places where it's already common, such as /r/rust, where the content ratio will naturally fall more favorably towards production uses, libraries and advanced discussions since a minimum level of experience (or at least interest) can be assumed.
"C++ implementations obey the zero-overhead principle: What you don't use, you don't pay for".
In Rust you pay for e.g. bounds checking or integer overflow handling or optionals. But I don't understand why everyone's getting so defensive about this, since it's the only way (static verification and proofs aside) to get the desired safety characteristics...
You can disable them in debug builds as well with wrapping integers. So this is again a practice of making the safer option the default, but won't incur runtime costs if it bothers you.
If there is a way to do it at compile time that will usually be used in rust.
You can catch a panic, and you can compile C++ with -fno-exception. at() is not closer to get() than to [].
It's strange to me because a lot of what I value in Rust is writing code that handles all outcomes safely and with confidence. Yet that all seems to be out the window if you access array values - as it may or may not work.
The safety of [] slice/array access is entirely up to how the dev wrote the code.. which largely feels the same as any other language.
Am I thinking about something incorrectly?
With code that indexes slices, the index computation is usually much more proximate to the actual indexing operation. If ensuring the validity of the index is about upholding an invariant internal to the module, it makes no sense to return an "sorry, I have a bug" error to the caller if that invariant fails to hold. What would the caller do about that?
More challenging would be to include stuff like allocation management (how to free a pointer when it's done), GC integration, function boxing, iterator/generator support. Vtables and cross-language inheritance is interesting but difficult.
One point of clarity: this would be an FFI ABI, not necessarily expecting that most datatypes would be laid out according to the ABI.
I have a feeling the a bunch of the tests in that particular category needed to be updated, so it wasn't simply just too long.
Weirdly enough these things should also be a hard problem in other non-Garbagecollected languages, but you get away with a lot more when the compiler doesn’t force you.
From a purely practical level there is so much already that I had a hard time finding some data type that had no existing and tested solution for it. Looking at the code in the std library can be enlightening.
It's the same as with static vs dynamic typing - some things are just easier to model with dynamic, because you don't have to fight with the type checker to prove that your program really is sound when you know that it is. But, of course, you might believe it to be sound when it really isn't...
Rust goals in life towards bugs are basically
- Isolate memory bugs to unsafe code which you rarely write.
- Make as many classes as bugs reasonably possible rarer by encoding as much as is reasonable in the type system and encouraging the programmer to think about all possible cases.
The first gets all the attention because it is the one you can make guarantees about, but the second is really just as important.
In Rust runtime panics are certainly possible (array out of bounds, out of memory, etc), but the ones analogous to Java's NullPointerException have to be explicit opted into (unwrapping optionals) and do not happen implicitly.
Rust lets you handle optionals in nicer ways which lets you be sure you covered the "null" cases at compile time with no runtime panics possible, if you like.
/pedantic
One might expect that set to contain things like crashing (panic! and friends) and leaking memory (forget). It does not, and is not intended to.
"Families are LOCKED OUT of or INSIDE their homes as Yale 'smart' security app crashes leaving dozens stranded"
https://www.dailymail.co.uk/news/article-6268379/People-lock...
"Households up and down the UK were unable to lock or unlock their doors"
An unlocked door it's a security problem too...
> A crash is a bug but not a security problem.
I think that all bugs, the ones that produces crashes and security bugs should be all treated equally. A bug is a bug, whenever it has security implications or not.
To me, the article gives the impression that a system crash is not a security problem, because a Rust program will "terminate in a controlled fashion, preventing any illegal access". But one for example, can fingerprint a system by forcing it to crash.
And of course, nobody expects that Rust will prevent bugs from happening, but at the same time I don't get why the fixation of setting a difference between security bugs and bugs.
"security problems are just bugs" - Linus Torvalds. (http://lkml.iu.edu/hypermail/linux/kernel/1711.2/01701.html)
edit: Linus reference.
> some of which aren’t solved by other languages
You only mentioned null pointers, so I will go with that. In Ada, you can have access types (pointers) that are guaranteed to not be null, and accessibility rules of Ada prevent dangling references to declared objects or data that no longer exists, so this particular issue is solved by a language other than Rust. Please feel free to give me other examples of errors or issues that you may believe is not solved by languages other than Rust.
https://www.adaic.org/resources/add_content/standards/05rat/...
> But any tools which can eliminate whole error categories are worth looking at for sure!
I agree. That is why I think Ada/SPARK is awesome! :P
1/ Consider a file `f` (or a socket, etc.). Using the standard library, Rust will statically ensure that, once the file is closed, you cannot attempt to, say, read from it. This is nothing special to files, just an aspect of the borrow checker.
2/ Consider a communication protocol. You need to send a message `HLO`, expect a message `ACK`, then send something else, etc. It is pretty easy to design your objects such that the operations of sending the message, receiving the message, etc. will change the type of your protocol object, ensuring statically that you never send/expect a message that you're not supposed to send in the current state.
If you're curious, I wrote a blurb last year on the topic: https://yoric.github.io/post/rust-typestate/
3/ I quickly googled "Ada spark phantom types" and didn't find anything. Does Ada support phantom types?
Your examples are possible with contracts.
I think you should look more into Ada. The only "runtime" is for the exception handling and bounds checking, both of which can be turned off if needed.
And I don't know where you got that "3 times" figure from? Do you have an example you are referring to?
You can do everything that has been mentioned without Ada's RTS, you can also disable all run-time checks. You can use static analysis tools (there are many, and available for free), for example, you can formally verify the correctness of the program, no run-time checks required.
Huh! I have a fairly strong vocabulary and thought this might have been a made-up word -- but apparently it means "to weigh down or give substance to", which aligns with your intended point here. TIL
English is my second language. I did a quick google search just to make sure it was correct but it's hard to gauge if a word that happen to be in the dictionary is also widely understood.
Even better is to use iterators and other abstractions that don't require bounds checking at all. Rust has lots of good tools to build efficient code.
But the real benefit of Rust is ownership tracking - and that one is zero overhead.
If you use Rust 2018 now, non-lexical lifetimes are a bigger recent improvement: https://doc.rust-lang.org/nightly/edition-guide/rust-2018/ow...
For Programmers that try Rust out Rust can be hard because they are stubornly trying to program Rust as if it were Python/Java/C/C++/Foo – but it isn't.
Rust as a language makes certain types of approaches (or anti patterns) nearly impossible. In the beginning it happens quite often that after a day of struggle you will erase a Rube-Goldberg-Machine that does something simple and replace it with the right line of code that does exactly the same while beeing way more extensible.
If you find yourself building more and more elaborate structures just to get something very simple done, the problem is very likely your approach (e.g. trying to implement a OOP structure instead of solving the problem) – at least that is what happend to me a lot.
Rust's borrowing and ownership concepts, as well as traits and generics make it extremely powerful, but these concepts lend themselves better to certain ways of structuring code than to others. Learning something is in my opinion always about allowing something to change your perspective. If you are not allowing the thing you are learning to change you, chances are that you are not learning, but judging.
When a C++ programmer starts with python they will write incedibly “unpythonic” code, when they do the same thing in Rust, it just won’t compile.
IMO Rust is very straightforward and the std library is incredibly good. But you have to really understand certain core concepts and what they mean in terms of structuring your code.
Just like that C++ programmer needed to learn what is ideomatic python looks like and why it makes a lot more sense to write python that way, you need to find the rust way of things. But this is the same for every progtamming language out there..
One way is to just keep at it until it clicks. An other way is to not do that, but once you've got the words for something you start seeing the issue everywhere, and next time around it makes a lot more sense.
> a borrow checker
What exactly do you mean? How does it differ from any other language's type system?
> would also catch all data races at compile time
In Ada/SPARK, you can formally verify tasks, too. Please take a look at https://docs.adacore.com/spark2014-docs/html/ug/en/source/co... if you have some time!
> prevent memory mistakes.
Which mistakes are you referring to specifically? I need to know so I can have a meaningful response to it, but all I can say right now is that Ada/SPARK does the same.
> What exactly do you mean? How does it differ from any other language's type system?
Part of rust's type system is sub-structural: by default, Rust's types are affine, which means you can only use them once (at most, not exactly).
Now this is not super convenient to use and could have efficiency issues (e.g. any time you want to check a value in a structure you'd have to return the structure so the caller can get it back), so to complement this you can borrow (create a reference). The borrow checker is the bit which checks that borrows satisfy a bunch of safety rules, mostly that a borrow can't outlive its source, and that you can have a single mutable borrow or any number of immutable borrows concurrently.
The borrow checker provides for memory-safe pointers to or into a structure you're not the owner of, with no runtime cost.
If you're familiar with affine types, this is basically what Rust's borrow checker implements.
If you're not familiar, it's a bit complicated to summarize on Hacker News. You should try it out, because it lets you guarantee statically entire classes of properties that non-academic languages struggle with.
> In Ada/SPARK, you can formally verify tasks, too. Please take a look at https://docs.adacore.com/spark2014-docs/html/ug/en/source/co.... if you have some time!
I'm not an expert in Ada, but yes, that's pretty similar to some of the properties Rust will let you check out of the box.
edit My memory of Ada was incorrect.
There is also "High Integrity Software: The SPARK Approach to Safety and Security", I never read it, but it is written by a well known author in the Ada community.
Not doing so means ending up with situations like the Linux kernel, Windows or console games, which might be ok, when code portability doesn't matter to start with.
You can similarly express ghost properties of your types, even though we don't have ghost fields in SPARK. For more on ghost code in SPARK, you can look at this presentation last year from my colleague Claire Dross: https://www.adacore.com/uploads/products/SSAS-Presentations/...
As a more extensive example of a useful library with this kind of contracts for proof, Joffrey Huguet added rich contracts of this kind to the Ada.Text_IO standard library just two weeks ago, as part of his current internship with us. This should be in the FSF trunk in the coming weeks. For example, here are some contracts he added:
procedure Open
(File : in out File_Type;
Mode : File_Mode;
Name : String;
Form : String := "")
with
Pre => not Is_Open (File),
Post =>
Is_Open (File)
and then Ada.Text_IO.Mode (File) = Mode
and then (if Mode /= In_File
then (Line_Length (File) = 0
and then Page_Length (File) = 0)),
Global => (In_Out => File_System);
procedure Put (File : File_Type; Item : Character) with
Pre => Is_Open (File) and then Mode (File) /= In_File,
Post =>
Line_Length (File)'Old = Line_Length (File)
and Page_Length (File)'Old = Page_Length (File),
Global => (In_Out => File_System);
procedure Close (File : in out File_Type) with
Pre => Is_Open (File),
Post => not Is_Open (File),
Global => (In_Out => File_System);Yeah, in the phrase "idiomatic C++ bounds checks by default", the term "idiomatic" is a "weasel word" that disqualifies "by default".
Or, maybe, the programmer was aware that operator[] didn't perform bounds checking, but opted to use it for some reason? A good way to dissuade people from making unidiomatic choices is to make them more verbose. IMO calling the at function isn't particularly verbose, but if the member function that didn't check bounds were called something like "at_unchecked," perhaps people would be less inclined to use it.
Also, from the snippet in the blog post, note that you can't tell whether the Firefox code used std::vector, C-style arrays, or some non-STL container type. Projects may use their own container types, but your criticism only applies if the programmers were using the C++ standard library.
Everybody knows you're supposed to check pointers for being null, and yet time and time again developers fail.
As long as you rely on human nature and provide one API which is simple, convenient, obvious and dangerous and one which is complex, inconvenient, non-obvious and safe, you will just drive users towards the former.
> I would assume that the Firefox C++ programmers know this as well. However, maybe I'm wrong or have too much faith?
Just because they know when quizzed doesn't mean they'll always remember when actually doing. Even less so when subscripting is safe in pretty much every other language which provides array subscripting, and ::at… only exists in C++?
> IMO calling the at function isn't particularly verbose
No, but it's still more verbose and less intuitive than [], especially given the above (that tons of languages use [], and very few have an at method)
> A good way to dissuade people from making unidiomatic choices is to make them more verbose.
Indeed.
and has been my experience with unwrap
they dont want people to use it but the alternative is so verbose and clunky
operator[]() does not require bounds checking by ISO C++, however most compilers do actually enable bounds checking in debug builds.
Visual C++ certainly does it for example.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n471...
Here is the Visual C++ documentation for bounds checking in debug builds.
https://docs.microsoft.com/en-us/cpp/standard-library/checke...
It's the exact opposite which is the point. Rust requires extra work and using the non-default and less convenient way for unsafety, C++ requires extra work and using the non-default method and less convenient way for safety.
One of the reasons why when I learn a new language I deliberately research the idiomatic way to write that language, it saves a bunch of trying to hammer a square peg into a round hole time.
Also some languages purely because of the domain they are in can do things in ways that others can't (trivially) do so if you treat each language as "foo" with different syntax you end up hobbling yourself to the lowest common denominator of "foo"'s features across every language you use.
Rust’s idea of safety is similar: it tries to reduce the impact of common risks that happen during programming. It won’t save you from logical errors — just like a seatbelt won’t save you when you decide to maneuver your vehicle into a flaming pit of lava. The only thing that would save you in that situation is not maneuvering into flaming lava pits.
That beeing said rust has a very straightforward implementation for Unit Tests and Integration tests which could help you dealing with remaining risks.
Of course you can always live on the edge, ignore seatbelts and do things like calling unwrap() on results that might not retun Ok(value) at all times. But that will bite you sooner than later.
What Rust is really good at IMO, is to make risks clear at any givrn moment in time. The routes your program can take are very well represented, especially because of the ownership model. Once you understood it you can make very strong assumptions about which part of the software is manipulating which data. These strong assumptions also help mitigating risk. It is harder in Rust to do things just wrong than in most other languages and it is still fast
Making bugs grep'able is huge.
The big difference is that, in Java, you typically get your NPE because you didn't know/didn't check that your pointer could be `null` – in other words, the default behavior of the language is to NPE. In Rust, the default behavior of the language is to inform the developer that they need to check. They may decide to explicitly assert that the pointer is not null, causing a crash if it is, but that's a conscious choice.
For instance, in my Rust code, pretty much every occurrence of `unwrap()` or `expect()` contains a comment explaining which invariant guarantees that the call will succeed. I don't think I have ever seen any comments associated to a member access in Java.
I had a great time writing Go web services, and I definitely agree that it was easy to pick up, both for me and other teammates. Figuring out the idiomatic way to do things wasn't always straightforward though, although getting up and running was easy.
Most issues in software fall into this category, though. The issue with Java’s null is that it’s not type safe. The language is generally a strongly typed language, except in the case of null. And until Value Types land, there is no way to express nullability to the compiler.
The arguments you make aren’t all that different than the undefined behavior arguments with C. “Just follow these simple rules and you’ll avoid all issues”, a compiler should do that for you IMO.
I can’t wait for Value Types in Java, at that point we can have real Option types.
No downvote, but GP post conflates "how code SHOULD be written" with "how existing code was written" in my opinion. Same with one of the earlier comments about "idiomatic C++ does bounds checking." I've seen plenty of NPEs in real, working Java.
If Mozilla isn't managing to write idiomatic C++, I don't hold out much hope for myself either.
> “Just follow these simple rules and you’ll avoid all issues”, a compiler should do that for you IMO.
This was one thing I really liked about Kotlin vs Java. It's nice that my IDE can do the work of generating getters/setters on a POJO for me, but it's not nice that the language requires me to fill my source code with that noise. To a first approximation, any task rote enough the computer could do it for me, it should not ask me to do.
See also: static languages that don't offer type inference.
That said, "write good tests" is impossible. You can write as many tests as you want, you can make sure those loops loop or don't, and those ifs if or don't, but you cannot write "good" tests. Tests will always miss some corner case, some input, some unexpected locale variable.
https://www.infoq.com/presentations/Null-References-The-Bill...
There is more to commercial compilers than winning the benchmarks game.
The benchmarks game is just an easily available source of examples.
Which, for instance, may show an Ada program with much the same measured time as a Rust program —
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It would be great if you would contribute a program that fixes this trivial problem:
https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...
(Also I have to rant here a bit: Yes I know the GC you used in Java or Emacs in 1997 was terrible, but modern GCs are very good indeed)
It would simply iterate through every entry in the temp string stack and then all the entries in the symbol table to find the string with the lowest address above whatever was the previous lowest address. After that full sweep, it would move that string to its new home and adjust pointers to it, then set the new low bound to the string just identified. It would keep iterating until all strings had been visited.
As a result, it required (n2)/2 sweeps of memory (n=# of strings). Having BASIC lock up for 30 seconds every once in awhile was just how it went.
[ and in case someone is going to correct me, it might have swept from top of memory to bottom, I have forgotten the details ]
I think you could use Rc<Thing> in rust too if you want to solve it using rust, but then you have "swift with more boilerplate" as one friend said.
Another actually more serious problem is that typical ref counting implementations mix the allocated data with the references, so that when you inc/dec the reference you dirty a cache line containing the data. There exist implementations which improve on this by storing the references in a separate part of memory, but they are not how it's commonly implemented.
Disclaimers: I'm only lightly familiar with Rust, and linked-lists are perfectly reasonable solutions _in the proper context and paradigm_.
So the problem to be solved is that we want to a) have collection of items which have a very strict ordering; b) be able to directly access the first and last item in that order; c) given a particular item, find the item which is immediately before/after it; d) when removing or adding an item, the relative ordering of the other items is undisturbed. Oh yeah, and e) we want it simple and efficient.
Doubly-linked lists are, of course, a very common solution used for these kinds of problems. They are perfectly suitable _if our paradigm is_ "I hereby assert that somehow, external to the compiler, I've verified that all list manipulation is being done correctly in all circumstances." If, however, our paradigm is "I want the compiler to automatically guarentee lots of these invariants" we run into some problems:
1) The data structure itself (each item has two pointers, and there are two global pointers for "head" and "tail") makes very few guarentees. Just one, actually: "each pointer will either point to an object, or will be null (or its moral equivalent)". There just isn't much compile-time meat for the compiler to chew on.
2) Linked-lists typically have (relatively) persistent scope, and exist outside of the lexical scope of whatever code block is immediately operating upon them. Again, it doesn't give the compiler much to go on.
3) Without managed memory (GC), there's no way for the compiler to guarentee that a pointed-to object still exists.
4) There's no built-in guarentee that the "head" and "tail" pointers actually point to the first/last object.
5) There's no guarentee of overall ordering (if a.next == b, then b.prev == a).
6) There's no guarentee of even a consistent view of the items in the collection (head == a, a.next == b, but b.prev == null/end-of-list).
7) There's no way for the compiler to guarentee, when viewing the collection as a whole, that modifications are atomic or thread-safe.
Yes, it's possible to take care of these issues in a non-Rust paradigm by making the structure of the list itself opaque, and only exposing insert/remove functions which enforce all the invariants. (I think Rust itself can do this with unsafe code.) However, you're still left with difficulties:
8) If list items are opaque containers, how do you guarentee that the payload object continues to exist?
9) How do you guarentee payload object ownership, atomicity, mutability, or other properties?
10) How do you guarentee that these needed invariants are held transitively?
Now the Rust paradigm, of course, isn't the only way to deal with these issues. But whether you're using Rust, managed-memory, C++ smart pointers, immutability guarentees, etc., you're going to need to do things that are unnatural in the other paradigms. Rust has the benefit of automatically enforcing lots of these invariants without having to do copying, worrying about shallow vs. deep copying, transitivey, etc.
Also there are plenty of domains where Java or C++ aren't even taken into consideration, in spite of decades of effort.
Just to cite Microsoft's recent security recommendation out of Blue Hat conference, regarding their own software, "use a mix of C#, Rust and constrained C++".
Yes, chaps, that Result<T,E> type is all but isomorphic with checked exceptions, Java-style.
Amongst other issues it's possible to pipe one through a generic wrapper without that wrapper having to care about it.
e.g. let's say you have an input collection, you map() over it, and the map callback can fail.
In Rust or Haskell you… just do that. And the caller deals with a collection of results however it wants.
In Swift, you need map to be specifically annotated in `rethrow` so it can be transparent to failure (aka can't fail if its callback can't, but can if its callback can).
In Java, you're shit out of luck and jolly well fucked, your generic map can't be generic over generic exceptions, so either it callback can't fail or you need to wrap said callback to convert the checked exception into an unchecked one, and possibly back again outside the map.
So… yeah, they're "all but isomorphic" because they're both implementations of the concept of statically checked fallibility. It's just that java's checked exceptions[0] are a bad implementation of the concept.
Put an other way, a 2018 fiesta or yaris are "all but isomorphic with" a 1960 corvair or a pinto, but you couldn't pay me to take a road trip in a corvair or a pinto.
[0] java's because someone might come up with better ones, though the well's been pretty tainted at this point
However, there's one very important language design difference between this and Java-styled checked exceptions: using sum type gives you effect polymorphism for free. This means you can write (say) a map function which says it has precisely the same exception type of its argument function, using the same machinery you're using for generics everywhere else.
This is a big from the usability side, since AFAICT it was the lack of effect polymorphism that turned people off of Java-style checked exceptions. From a language implementation standpoint, it's also nice not to have to implement type inference twice, once for return values and once for exceptions. :)
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://salsa.debian.org/benchmarksgame-team/benchmarksgame/...
And yeah, it is a valid and very serious issue. If we had more Ada programmers, we would not have to sit in trains or airplanes "powered" by C++, haha.
For params and return values you still can’t know for sure, and in all these cases it -could- be null. But yes, making all fields and variables that are Optional would help as a coding practice, except of course where you need to reassign the value in the Optional, as there is no “set” on the type.
The network error that makes a mockery of pretending something is local. The dynamic link error that discloses a pluggable implementation. And all the steps between giving lie to the idea that errors should be caught and handled anywhere near their occurrence - when the only code that knows about the actual final composition is at the top of the stack, far far away, a perilous journey for a fragment of error info - unless such transport is automated, and not gated at customs, as so many checked exceptions boundaries end up being.
I'm in favour of error values for functions that return errors in normal circumstances. But they should be exceedingly easy to dispatch into an unwind mechanism, if it turns out you're not interested. Almost all of the time, the code in the middle doesn't care. Why should it pay the tariff for the transport?
Checked exceptions were intended for the latter, and unchecked exceptions for the latter.
It absolutely is one of the reasons why they are a bad idea, and why reified results are so much more useable and useful.
> Failure is a function of implementation details
In the original intention, that's what unchecked exceptions were for, with checked exceptions for the reporting of errors rather than failure. That's why java didn't have only checked exceptions.
Checked exceptions were for unavoidable errors. Errors that are fundamental to the operation being attempted. They usually occur because the operation interacts with the world outside the program, which means the program is subject to violations of expectations. Network errors. File not found when you try to open it - you cannot test for file existence first without a race.
The reality is that the latter kind of errors are better off as overloaded call signatures: one call variant when you care about the error and want to catch it, another variant when you don't care about specifics of the error and want the whole stack to unwind when the expectation is violated.
Neither of these approaches require checked exception signatures throughout the stack (or Result types for that matter - you can assume I also mean those, due to the isomorphism).
There's a reason .net modeled number parsing with Parse() and TryParse() instead of throwing NumberFormatException like parseInt(). It's because sometimes you care - input came from user and you need to handle it - and sometimes you don't - input came from configuration file and the stack needs to be torn down if you can't parse it.
Picture a stack that looks like this:
0: <operation that may throw exception of the checked variety>
1: <code that may be interested in handling error>
2: <code that doesn't know about implementation details>
.... could be 10, 100, 200+ methods in this stack dump
N-1: <code that doesn't know about implementation details>
N: <request handler or event loop that catches all exceptions>
The great problem with checked exceptions is the methods for stack frames 2 to N-1. Either exceptions are handled at stack entry 1, or at stack entry N. The only job of all the code between is to pass exceptions unmolested back to N.Those calls may be dynamically bound (whether via vtables or function pointers, objects or closures) and / or dynamically linked (so unavailable to a type system at compile time). In large production programs, control flow will be dynamically determined. It's a fact of life; if it's not for testing purposes, it'll be for deployment flexibility.
I think there's value in the IO monad; in marking functions as the kind of functions that may interact with the outside world. And checked exceptions can work this way. But not unless the error type has a polymorphic storage location, and unwinding the stack is syntactically weightless. I don't ever want to have to change the signature on a dozens of methods just because there's a new implementation detail deep inside a dynamically linked abstraction.
And exception / error wrapping isn't the answer either - it's almost always a bad idea.
md5 5b185f9a67a426baf78aa3bbb5baf8df out_rust
md5 5b185f9a67a426baf78aa3bbb5baf8df out_ada
On top of that, I got this:
Rust:
real 0m0.702s
user 0m0.693s
sys 0m0.007s
Ada:
real 0m0.708s
user 0m0.706s
sys 0m0.000s
Why don't you use this version instead? The output is correct and identical to Rust's, and the performance is significantly better: just as fast as C and Rust, as expected.
Thanks for letting us know about its existence!
Edit: I do not understand your reaction to me thanking you for bringing my (and possibly other people's) attention to the other implementation. It was genuine.
> You're reading-between-the-lines something that isn't there.
Okay, my mistake then. :)
On the benchmarks game website anything underlined is a URL.
On the faster/ada.html page there's a link "all other Ada 2012 GNAT programs & measurements".
On the performance/pidigits.html there's a link "Ada 2012 GNAT #2".
At what N?
> The output is correct
Nope. diff
3c3
< 6264338
\ No newline at end of file
---
> 6264338 :27
PROGRAM OUTPUT:
3141592653 :10
5897932384 :20
6264338https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
:and there's a working Ada regex-redux #5 program:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...