Bugs Rust won't catch

680 points by lwhsiao 65 days ago | 372 comments

collinfunk 65 days ago |

Hi, I am one of the maintainers of GNU Coreutils. Thanks for the article, it covers some interesting topics. In the little Rust that I have used, I have felt that it is far too easy to write TOCTOU races using std::fs. I hope the standard library gets an API similar to openat eventually.

I just want to mention that I disagree with the section titled "Rule: Resolve Paths Before Comparing Them". Generally, it is better to make calls to fstat and compare the st_dev and st_ino. However, that was mentioned in the article. A side effect that seems less often considered is the performance impact. Here is an example in practice:

  $ mkdir -p $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')
  $ while cd $(yes a/ | head -n 1024 | tr -d '\n'); do :; done 2>/dev/null
  $ echo a > file
  $ time cp file copy

  real 0m0.010s
  user 0m0.002s
  sys 0m0.003s
  $ time uu_cp file copy

  real 0m12.857s
  user 0m0.064s
  sys 0m12.702s

I know people are very unlikely to do something like that in real life. However, GNU software tends to work very hard to avoid arbitrary limits [1].

Also, the larger point still stands, but the article says "The Rust rewrite has shipped zero of these [memory saftey bugs], over a comparable window of activity." However, this is not true [2]. :)

[1] https://www.gnu.org/prep/standards/standards.html#Semantics [2] https://github.com/advisories/GHSA-w9vv-q986-vj7x

pornel 64 days ago | |

Indeed, std::fs suffers from being a lowest common denominator. Rust had to have something at 1.0, and unfortunately it stayed like that.

Rust uutils would be a good place to design a more foolproof replacement for Rust's std::fs API.

dapperdrake 64 days ago | | |

Unix embodies this, as well.

When K&R created unix and C there was still the better option of moving changes that were better to have in the "kernel" into the kernel.

Now we have "standards" that even cause headaches between Linux and BSD's.

Linux back-propagates stuff like mmap, io_uring, etc. to where it belongs. In this way it is like the original unix. And deservedly running on most servers out there.

dapperdrake 64 days ago | |

First of all, thank you for presenting a succinct take on this viewpoint from the other side of the fence from where I am at.

So how can I learn from this? (Asking very aggressively, especially for Internet writing, to make the contrast unmistakable. And contrast helps with perceiving differences and mistakes.) (You also don’t owe me any of your time or mental bandwidth, whatsoever.)

So here goes:

Question 1:

How come "speed", "performance", race conditions and st_ino keep getting brought up?

Speed (latency), physically writing things out to storage (sequentially, atomically (ACID), all of HDD NVME SSD ODD FDD tape, "haskell monad", event horizons, finite speed of light and information, whatever) as well as race conditions all seem to boil down to the same thing. For reliable systems like accounting the path seems to be ACID or the highway. And "unreliable" systems forget fast enough that computers don’t seem to really make a difference there.

Question 2:

Does throughput really matter more than latency in everyday application?

Question 3 (explanation first, this time):

The focus on inode numbers is at least understandable with regards to the history of C and unix-like operating systems and GNU coreutils.

What about this basic example? Just make a USB thumb drive "work" for storing files (ignoring nand flash decay and USB). Without getting tripped up in libc IO buffering, fflush, kernel buffering (Hurd if you prefer it over Linux or FreeBSD), more than one application running on a multi-core and/or time-sliced system (to really weed out single-core CPUs running only a single user-land binary with blocking IO).

ericbarrett 64 days ago | | |

Coreutils are not only used in interactive contexts. They are the primitives that make up the countless shell scripts which glue systems together. Any edge case will be encountered and the resulting poor performance will impact somebody, somewhere.

Here's a related example of what happens when you change a shell primitive's behavior - even interactively. Back in the 2000s, Linux distributions started adding color output to the ls command via a default "alias ls=/bin/ls --color=auto". You know: make directories blue, symlinks cyan, executables purple; that kind of thing. Somebody thought it would be a nice user experience upgrade.

I was working at a NAS (NFS remote box) vendor in tech support. We frequently got calls from folks who had just switched to Linux from Solaris, or had just moved their home directories from local disk to NFS. They would complain that listing a directory with a lot of files would hang. If it came back at all, it would be in minutes or hours! The fix? "unalias ls". Because calling "/bin/ls" would execute a single READDIR (the NFS RPC), which was 1 round-trip to the server and only a few network packets; but calling "/bin/ls --color=auto" would add a STAT call for every single file in the directory to figure out what color it should be - sequentially, one-by-one, confirming the success of each before the next iteration. If you had 30,000 files with a round-trip time of 1ms that's 30 seconds. If you had millions...well, either you waited for hours or you power-cycled the box. (This was eventually fixed with NFSv3's READDIRPLUS.)

Now I'm sure whomever changed that alias did not intend it, but they caused thousands of people thousands of hours of lost productivity. I was just one guy in one org's tech support group, and I saw at least a dozen such cases, not all of which were lucky enough to land in the queue of somebody who'd already seen the problem.

So I really appreciate GNU coreutils' commitment to sane behavior even at the edges. If you do systems work long enough, you will ride those edges, and a tool which stays steady in your hand - or script - is invaluable.

dijit 64 days ago | | |

> Does throughput really matter more than latency in everyday application?

In my experience latency and throughput are intrinsically linked unless you have the buffer-space to handle the throughput you want. Which you can't guarantee on all the systems where GNU Coreutils run.

awesome_dude 64 days ago | | |

> Question 2:

> Does throughput really matter more than latency in everyday application?

IME as a user, hell yes

Getting a video I don't mind if it buffers a moment, but once it starts I need all of that data moving to my player as quickly as possible

OTOH if there's no wait, but the data is restricted (the amount coming to my player is less than the player needs to fully render the images), the video is "unwatchable"

duped 64 days ago | | |

Just want to point out that race conditions are a correctness problem, not a performance problem.

dapperdrake 64 days ago | | |

Additional point:

The point of data storage is to be a singleton.

(Backups are desireable, anyhow.)

s20n 65 days ago | |

Sorry, complete noob here. Why didn't you just cd into $(yes a/ | head -n $((32 * 1024)) | tr -d '\n')? Why do you need to use the while loop for cd?

EDIT: got it. -bash: cd: a/a/a/....../a/a/: File name too long

collinfunk 64 days ago | | |

No need to apologize at all. Doing it in one cd invocation would fail since the file name is longer than PATH_MAX. In that case passing it to a system call would fail with errno set to ENAMETOOLONG.

You could probably make the loop more efficient, but it works good enough. Also, some shells don't allow you to enter directories that deep entirely. It doesn't work on mksh, for example.

safercplusplus 64 days ago | |

I don't know if you're aware, but there is a demonstration of wget (a fellow "gnu utility", right?) being auto-translated to a memory-safe subset of C++ [1]. Because the translation essentially does a one-for-one substitution of potentially unsafe C elements with safe C++ counterparts that mirror the behavior, the translation should be much less susceptible to the introduction of new bugs and behaviors in the way a rewrite would be.

With a little cleaning-up of the original code, the code translation ends up being fully automatic and so can be used as a build step to produce (slightly slower) memory-safe executables from the original C source.

[1] https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...

dapperdrake 64 days ago | | |

Filesystem access is mostly treated by users as serialized ACID transactions on "files in directories."

"Managing this resource centrally" is where unix syscalls came from. An OS kernel can be used like a specialized library for ACID transactions on hardware singletons.

People then got fancy with virtual memory, interrupts, signals, time-slicing, re-entrancy, thread-safety, and injectivity.

It doesn’t matter, whether you call the "kernel library" from C, C++, Fortan, BASIC, Golang, bash, Rust, etc.

theteapot 64 days ago | |

Probably a dumb question, but is GNU Core utils interested in / planning on doing its own rust rewrite?

collinfunk 64 days ago | | |

At the current moment I would be against it. The language and library is changing too fast. Also, Rust has some other things that make it hard to use for coreutils. For example, Rust programs always call signal (SIGPIPE, SIG_IGN) or equivalent code before main(). There is no stable way to get the longstanding behavior of inheriting the signal action from the parent process [1]. This is quite annoying, but not unique to Rust [2].

[1] https://doc.rust-lang.org/beta/unstable-book/compiler-flags/... [2] https://www.pixelbeat.org/programming/sigpipe_handling.html

wpollock 64 days ago | | |

Thomas Jefferson famously said that "A coreutils rewrite every now and again is a good thing". Or something like that.

When I was a beta tester for System Vr2 Unix, I collected as many bug reports as possible from Usenet (I used the name "the shell answer man". Looking back I conclude that arrogance is generally inversely proportional to age) and sent a patch for each one I could verify. Something like 100 patches.

So if this rust rewrite cleans up some issues, it's a good thing.

greatgib 64 days ago | | |

The rewrite in Rust is mostly vanity and marketing but not based on a real technical need...

So I don't see why they would want to do that.

gzread 64 days ago | |

I see even the coreutils maintainers find themselves needing -n (no newlines) and -c (count) options to "yes".

dapperdrake 64 days ago | | |

GNU coreutils is known for adding command libe options.

One of the big philosophical differences to the BSD's.

For a human being, it sucks both ways.

joaohaas 64 days ago | |

>the article says "The Rust rewrite has shipped zero of these [memory saftey bugs], over a comparable window of activity." However, this is not true

That bug got fixed before the Ubuntu release, and is from way before Canonical was even involved with the project.

rossvor 64 days ago | | |

In the given list of GNU CVEs in the original article, it included a buffer overrun in tail from 2021. So for a fair comparison 2021 is part of the "window of activity" (the year uu_od CVE was published).

cyberax 64 days ago | |

To be fair, Vec::set_len bug in Rust was in 2021. And even then it had to be annotated as `unsafe`. It was then deprecated and a linter check was added: https://github.com/rust-lang/rust-clippy/issues/7681

Dr_Emann 64 days ago | | |

To be even fair-er, it wasn't actually memory unsafety, it was "just" unsoundness, there was a type, that IF you gave it an io reader implementation that was weird, that implementation could see uninit data, or expose uninit data elsewhere, but the only readers actually used were well behaved readers.

orlp 64 days ago | | |

Vec::set_len is by no means deprecated. The lint you linked only covers a very specific unsound pattern using set_len.

wahern 65 days ago |

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

They knew how to write Rust, but clearly weren't sufficiently experienced with Unix APIs, semantics, and pitfalls. Most of those mistakes are exceedingly amateur from the perspective of long-time GNU coreutils (or BSD or Solaris base) developers, issues that were identified and largely hashed out decades ago, notwithstanding the continued long tail of fixes--mostly just a trickle these days--to the old codebases.

hombre_fatal 64 days ago |

One thing that's hard about rewriting code is that the original code was transformed incrementally over time in response to real world issues only found in production.

The code gets silently encumbered with those lessons, and unless they are documented, there's a lot of hidden work that needs to be done before you actually reach parity.

TFA is a good list of this exact sort of thing.

Before you call people amateur for it, also consider it's one of the most softwarey things about writing software. It was bound to happen unless coreutils had really good technical docs and included tests for these cases that they ignored.

aykutseker 64 days ago | |

good example from the article: the chroot+nss CVE. the rule that nss is dynamic and dlopens libraries from inside the chroot isn't anywhere obvious. it's encoded in 25+ years of sysadmins finding it out. clean-room rewrites end up re-learning that, usually as new CVEs. and LLM ports of the same code inherit the problem: the function signature is what they read, but the scars are what they need.

cataflutter 64 days ago | | |

> the function signature is what they read, but the scars are what they need.

This feels like a golden quote. Don't know if you intended for it to rhyme, but well done :D

einpoklum 64 days ago | |

> The code gets silently encumbered with those lessons, and unless they are documented, there's a lot of hidden work that needs to be done before you actually reach parity.

It should be stressed that failure to document such lessons, or at least the bugs/vulnerabilities avoided, is poor practice. Of course one can't document the bugs/vulnerabilities one has avoided implicitly by writing decent code to begin with, but it is important to share these lessons with the future reader, even if that means "wasting" time and space on a bunch of documentation such as "In here we do foo instead of bar because when we did bar in conditions ABC then baz happens which is bad because XYZ."

TheDong 64 days ago | |

What's even harder is doing that while trying to avoid the GPL, so doing that without reading the original source code.

uutils would be so much better imo if it was GPL and took direct inspiration from the coreutils source code.

dbdr 64 days ago | | |

The GPL prevents you from reading the licensed code before writing related non-GPL code? Which section of the GPL says that?

lionkor 64 days ago |

I struggle to find anything on this post that wouldn't be caught by some kind of unit test or manual review, especially when comparing with the GNU source for the coreutils. The whole coreutils rewrite is a terrible idea[1] and clearly being done in the wrong way (without the knowledge gained from the previous software).

If you do a rewrite, you should fully understand and learn from the predecessor, otherwise youre bound to repeat all the mistakes. Embarassing.

To be clear; I love Rust, I use it for various projects, and it's great. It doesn't save you from bad engineering.

[1]: https://www.joelonsoftware.com/2000/04/06/things-you-should-...

dwattttt 64 days ago | |

> I struggle to find anything on this post that wouldn't be caught by some kind of unit test or manual review, especially when comparing with the GNU source for the coreutils.

> If you do a rewrite, you should fully understand and learn from the predecessor, otherwise youre bound to repeat all the mistakes. Embarassing.

Interestingly, the uutils project uses the GNU coreutils test suite.

EDITED to add: they also have a stated position of not allowing contributions based on reading the GPL'd source.

cwillu 64 days ago | |

I expect nothing less from the creators of unity, upstart, and snap.

a-dub 64 days ago | |

welcome new systems programmers: unix is broken and you must write ugly non-pedagogical workarounds and do empirical testing. this is what reliable software and good software engineering actually is... surprise!@#%

Joker_vD 64 days ago |

> The pattern is always the same. You do one syscall to check something about a path, then another syscall to act on the same path. Between those two calls, an attacker with write access to a parent directory can swap the path component for a symbolic link. The kernel re-resolves the path from scratch on the second call, and the privileged action lands on the attacker’s chosen target.

It's actually even worse than that somewhat, because the attacker with write access to a parent directory can mess with hard links as well... sure, it only messes with the regular files themselves but there is basically no mitigations. See e.g. [0] and other posts on the site.

[0] https://michael.orlitzky.com/articles/posix_hardlink_heartac...

sysguest 64 days ago | |

hmm... maybe a 'write lock' on the directory? though this will become more hairy without timeouts/etc...

masklinn 64 days ago | | |

To the extent that locking exists in posix it is various degrees of useless and broken. And as far as I know while BSDs have extensions which make some use cases workable Linux is completely hopeless.

misja111 64 days ago |

The root cause of some of the bugs seems to be the opaque nature of some of the Unix API. E.g.

> The trap is that get_user_by_name ends up loading shared libraries from the new root filesystem to resolve the username. An attacker who can plant a file in the chroot gets to run code as uid 0.

To me such a get_user_by_name function is like a booby trap, an accident that is waiting to happen. You need to have user data, you have this get_user_by_name function, and then it goes and starts loading shared libraries. This smells like mixing of concerns to me. I'd say, either split getting the user data and loading any shared libraries in two separate functions, or somehow make it clear in the function name what it is doing.

tdiff 64 days ago |

Ok if there were some rust guys rewriting coreutils with no experience in linux, but how come Ubuntu accepted it into its mainline?

Joeboy 64 days ago | |

Because it's Ubuntu policy to replace some foundational part of the system with some janky unfinished experiment in every release.

I agree with you that that's more the story here than "OMG, somebody wrote Rust code with bugs in it".

12_throw_away 64 days ago | | |

Right? Canonical wanted (still wants?) to use a coreutils implementation where "rm ./" would print "invalid input" while silently deleting the directory anyway.

I don't really care that some very amateur enthusiasts wrote some bad code for fun, but how in the world did anyone who knows anything about linux take this seriously as a coreutils replacement?

foobar1274278 64 days ago | |

The original is GPL licensed, while the rewrite is MIT.

tdiff 64 days ago | | |

Was at actually so important to rush with the switch?

eb08a167 64 days ago |

I'm totally fine with people experimenting and making amateur attempts at what adult people do. After all, that's how we grow. What I'm actually curious about is how the decision-making chain at Ubuntu got so messed up that this made it into production.

eviks 64 days ago | |

Sometimes growing is only your height increasing

alkonaut 64 days ago |

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

So does this mean that neither did the original utils have any test harness, the process of rewriting them didn't start by creating one either?

Sure there are many edge cases, but surely the OS and FS can just be abstracted away and you can verify that "rm .//" actually ends up doing what is expected (Such as not deleting the current directory)?

This doesn't seem like sloppy coding, nor a critique of the language, it's just the same old "Oh, this is systems programming, we don't do tests"?

Alternatively: if the original utils _did_ have tests, and there were this many holes in the tests, then maybe there is a massive lack in the original utils test suite?

einpoklum 64 days ago |

Note:

TOCTOU means "Time-of-check to time-of-use"

z3t4 64 days ago |

To be fair these are mostly gotchas with Linux and not Rust itself, but I guess the std in Rust could handle some of these issues, in that a std should not allow you to shoot yourself in the foot by default.

marcosscriven 64 days ago |

That’s a great article, and indeed a very good blog. Just spent ages reading lots of their other articles.

Of the bugs mentioned I think the most unforgivable one is the lossy UTF conversion. The mind boggles at that one!

fschuett 64 days ago |

Thanks for the list. I like these lists, so I can put them into a .md file, then launch "one agent per file" on my codebase and see if they can find anything similar to the mentioned CVEs.

Rust won't catch it, but now the agents will.

Edit: https://gist.github.com/fschutt/cc585703d52a9e1da8a06f9ef93c... for anyone who needs copying this

joaohaas 64 days ago | |

Most (if not all) of these issues do not matter at all outside the scope GNU utils run in.

For example, using filepaths instead of FDs does not matter in most cases in controlled server environments, or in processes that will never run with elevated privilege (most apps).

rstuart4133 64 days ago | | |

> Most (if not all) of these issues do not matter at all outside the scope GNU utils run in.

I suspect that attitude is how we got ourselves into this mess.

You have to assume you ultimately don't control what scope your software runs in. Obviously you do, 99.999% of the time. The other 0.0001% is when someone has found another vulnerability that lets them run your program with elevated privileges in an environment you didn't expect, and then they can use it to exploit one of these bugs. Almost all exploits use a chain of vulnerabilities each one seemingly mostly harmless - your "no one can ever exploit this weakness in my program because I control the environment" will be just one step in the chain.

That sounds far fetched. It is far fetched in the sense that it almost never happens. But nonetheless systems were and are exploited because of it. Once the solution was added in 2006 (openat() and friends), it should have never happened again. And indeed in the GNU utils it can't.

The people who build Rust's std::fs should have been aware of the problem and its solution because it was written in 2015. std::path was written at the same time, and that is where the change has to be made. It's not a big change either: std::path has to translate the path into a OS descriptor use that instead of the path - but only if it was available. I suspect the real issue was they had the same attitude as you, they thought it affects such a small percentage of programs it didn't really matter. That and it's a little bit of extra work.

It was a pity they had that attitude, because the extra work would have avoided this mess.

oconnor663 64 days ago |

> The trap is that get_user_by_name ends up loading shared libraries from the new root filesystem to resolve the username.

That's kind of horrifying. Is there a reliable list somewhere of all the functions that do that? Is that list considered stable?

Joker_vD 64 days ago | |

Nope! But basically, expect anything that resolves usernames, or host names, to be done in the userspace by NSS.

    Sun engineers Thomas Maslen and Sanjay Dani were the first to design and implement
    the Name Service Switch. They fulfilled Solaris requirements with the nsswitch.conf
    file specification and the implementation choice to load database access modules as
    dynamically loaded libraries, which Sun was also the first to introduce.

    Sun engineers' original design of the configuration file and runtime loading of name
    service back-end libraries has withstood the test of time as operating systems have
    evolved and new name services are introduced. Over the years, programmers ported the
    NSS configuration file with nearly identical implementations to many other operating
    systems including FreeBSD, NetBSD, Linux, HP-UX, IRIX and AIX.[citation needed] More
    than two decades after the NSS was invented, GNU libc implements it almost identically.

It's by design, you see.

jsdfasds 64 days ago | | |

This is precisely why I don't link with glibc anymore.

nixpulvis 64 days ago |

> These are noisy in test code where panicking on bad data is exactly what you want. The cleanest way to scope them to non-test code is to put #![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used, clippy::panic, clippy::indexing_slicing, clippy::arithmetic_side_effects))] at the top of each crate root, or to gate #[allow(...)] on the individual #[cfg(test)] modules.

Surely there's a better way.

kibwen 64 days ago | |

Clippy doesn't even run on unit tests by default. Honestly it doesn't seem very useful to have it do so for ordinary development, but maybe you'd want to run Clippy on your unit tests in CI just to be extra safe, in which case you could encode those allowed lints in the line of your CI config where you run `cargo clippy`, e.g. `cargo clippy -- -A unwrap_used -A expect_used -A panic -A indexing_slicing -A arithmetic_side_effects`, if you really didn't want to have them in the source for whatever reason.

estebank 64 days ago | | |

Delaying the run of clippy until CI would be annoying, because then you'd get a build failure for something that was preventable and could have been quickly addresses during development before pushing. Just feels like a pebble in your shoe.

9fwfj9r 64 days ago |

So it's basically failing on - necessary atomicity for filesystem operation - annoying path & string encoding - inertia for historical behaviors

kibwen 64 days ago | |

I'm comfortable saying that "annoying path & string encoding" is encompassed by "inertia for historical behaviors". :P

bluGill 64 days ago |

I have to partially disagree with applying Hyrum's law here. In the case of core utils, there's not just the common GNU version. There's also what POSIX says they should do and what the various BSD does, plus some other implementations from various vendors that we mostly forget about. If in any case what this version of Core Utils does is different from what GNU does in a way that others are also different, it would be a good thing to break behavior because anyone's script already is wrong in ways that are going to matter in the real world and it may matter in the future anyway, so breaking them now is good. If your script depends on GNU's behavior, then you shouldn't be calling the standard version. You should be explicitly specifying the GNU version. That is, don't use CP. Use GNU-CP or whatever it is commonly installed at. Or you check for what version of CP you have.

aragilar 64 days ago | |

But if you seek to replace coreutils (as at least is the case with Canonical it seems), rather than just be another POSIX userland implementation (e.g. busybox), then I would suggest you do need to be bug-compatible? I can apt/dnf/apk install busybox and use that for my user rather than coreutils, but given a significant amount of Linux infrastructure (including likely many personal scripts) are tied to coreutils, the bar is much higher. Given the numerous issues with quality Canonical has had, not just with Ubuntu but their other "commercial" tooling, I'm not sure any rewrite/port, written in rust or otherwise, with Canonical developing, managing, or even being associated with the project can meet the requisite bar.

bluGill 64 days ago | | |

As someone who prefers BSD I would make it my goal to become something reasonably popular on linux that isn't different just to force less reliance of the GNUisms in their core utils. Nothing wrong with the GNUisms on the command line, but there are are a lot of GNU assumptions in scripts that should be portable.

immanuwell 64 days ago |

rust promised you memory safety and delivered - but turns out the filesystem doesn't care about your borrow checker, and these 44 cves are the receipt

streetfighter64 64 days ago |

> That means, even if the tools were (and probably still are) buggy, they never had a bug that could be exploited to read arbitrary memory.

Well, that begs the question, is it worse to read arbitrary memory (which would probably in most cases be prevented by various dynamic protections [0] anyway), or failing to prevent rm -rf /./ and killing every process in the system, etc.?

This is still a good case study of the value of the much-touted rust rewrites. Usually they are performed by people who are domain experts in rust, but (as seen here) lack basic domain knowledge of the tool's environment.

[0] https://en.wikipedia.org/wiki/Buffer_overflow_protection

throw7 64 days ago |

The "kill -1" is hilarious. I wouldn't use ubuntu for production for quite awhile while things shake out or, probably, never (since i don't use ubuntu).

satvikpendem 64 days ago |

Unrelated but also in the category of bugs Rust won't catch (natively), there are crates that allow C++ style contracts, or more generally, dependent typing and can be used to catch issues at compile time rather than runtime. I use this one, anodized.

https://docs.rs/anodized/latest/anodized/

ozgrakkurt 63 days ago | |

What do you think about the mental load and ergonomics this brings into the code? Also compilation time increase?

satvikpendem 63 days ago | | |

There is some compile time increase but it brings a lot more guarantees to the code. There was a recent post by a Rust maintainer that he wanted to bring Rust closer to a theorem prover so that as many things as possible can be caught during compile time over time time which might be more disastrous.

jolt42 65 days ago |

I wonder if Rust becomes more popular with AI as Rust can help catch what AI misses, but then if that's the case then what about Haskell, or Lean, or?

EduardoBautista 64 days ago | |

I think a lower amount of training data for Haskell might be a reason.

tayo42 64 days ago | |

The way Haskell handles memory is weird and can be unpredictable.

hu3 64 days ago | |

For core system functionality maybe. But for most applications Rust slow compiler iteration speed becomes a bottleneck when the likes of TypeScript (with Bun) and Go have sub second iteration times.

Plus AI is also good at catching, in other languages, errors that Rust tooling enforces. Like race conditions, use after free, buffer overflows, lifetimes, etc.

So maybe AI will become to ultimate "rust checker" for any language.

tnova 64 days ago | | |

In my experience developing different types of applications in Rust, the claims of a "slow compiler" are overstated. Sub second iteration times are definitely a thing in Rust as well, unless you're adding a new dependency for the first time or building fresh.

junon 64 days ago | | |

The productivity increase I get overall by not having to worry so much about if my rust code will work if it compiles tends to net faster iteration speeds for me. Compile times have never bothered me.

dlahoda 64 days ago |

Why differential fuzzing did not catch these bugs?

https://github.com/uutils/coreutils/tree/main/fuzz/uufuzz

ozgrakkurt 63 days ago | |

Looks like it doesn't really fuzz much.

https://github.com/uutils/coreutils/tree/main/fuzz/fuzz_targ...

Maybe these tests aren't even fuzz tests?

https://github.com/uutils/coreutils/blob/main/fuzz/fuzz_targ...

Even the tests that look ok are not that good in my opinion because there is no structure to it:

https://github.com/uutils/coreutils/blob/main/fuzz/fuzz_targ...

It should also try to generate mostly correct but slightly wrong things instead of just dumping random data into it.

Seems to also not expect some fuzz tests to even pass in the CI:

https://github.com/uutils/coreutils/blob/a07879b8ab2bb8fe5e0...

bayindirh 64 days ago |

> This is the largest cluster of bugs in the audit. It’s also the reason cp, mv, and rm are still GNU in Ubuntu 26.04 LTS. :(

This is what grinds my gears. Why all the hate against GNU?

Honestly, this is why I don't learn Rust, and why I didn't bother to read the rest of the article.

kibwen 64 days ago | |

Rust does not hate GNU, and I'm not sure why anyone would have that misconception. It would be like saying that C hates GNU because the BSDs aren't GNU. The fact that there is less GNU-licensed Rust software than MIT-licensed Rust software is attributable to the simple fact that, in general, GNU has been ceding ground to MIT for more than 20 years.

Pay08 64 days ago | | |

Nor does the parent comment say that "Rust" hates GNU. A language can't hate anything for that matter.

abbeyj 64 days ago |

> The Python one-liner is there because most modern shells refuse to create a non-UTF-8 filename for you.

Both `echo -ne 'weird\xffname\0' > list0` and `printf 'weird\xffname\0' > list0` seem to work fine for me on Linux. Is this macOS-specific?

deathanatos 64 days ago | |

> Both `echo -ne 'weird\xffname\0' > list0` and `printf 'weird\xffname\0' > list0` seem to work fine for me on Linux. Is this macOS-specific?

Neither of those create a non-UTF-8 filename. (Both files are named "list0", which is valid UTF-8.) They have non-UTF-8 content, but that's not weird.

But it's not too hard to get a non-UTF-8 filename:

  touch $'\xff'

Both zsh & bash support that syntax.

(You could also use process substitution with printf, but that's more steps than necessary. So, something closer to your example would be,

  touch "$(printf '\xff')"

You can't put a \0 in the filename, as there's no way to pass that string in C.)

timcobb 64 days ago |

The title of this article should be "Rust can't stop you from not giving a fuck" or "Rust can't give a fuck for you."

---

> What’s notable is that all of these bugs landed in a production Rust codebase, written by people who knew what they were doing

...

[List of bugs a diligent person would be mindful of, unix expert or not]

---

Only conclusion I can make is, unfortunately, the people writing these tools are not good software developers, certainly not sufficiently good for this line of work.

For comparison, I am neither a unix neckbeard nor a rust expert, but with the magic of LLMs I am using rust to write a music player. The amount of tokens I've sunk into watching for undesirable panics or dropped errors is pretty substantial. Why? Because I don't want my music player to suck! Simple as that. If you don't think about panics or errors, your software is going to be erratic, unpredictable and confusing.

Now, coreutils isn't my hobby music player, it's fundamental Internet infrastructure! I hate sounding like a Breitbart commenter but it is quite shocking to see the lack of basic thought going into writing what is meant to be critical infrastructure. Wow, honestly pathetic. Sorry to be so negative and for this word choice, but "shock" and "disappointment" are mild terms here for me.

Anyway, thanks for the author of this post! This is a red flag that should be distributed far and wide.

MallocVoidstar 64 days ago | |

> Pretty shocking to see the lack of basic thought going into writing what is meant to be critical infrastructure

uutils did not start off as "let's make critical infrastructure in Rust", it started off as "coreutils are small and have tests, so we're rewriting them in Rust for fun". As a result there's needed to be a bunch of cleanup work.

timcobb 64 days ago | | |

Okay, thanks for the context, but aren't distributions eager to adopt these? Are current GNU coreutils a common vulnerability vector?

> For fun

My idea of fun is reviewing my code and making sure I'm handling errors correctly so that my software doesn't suck. Maybe the people who are doing this, for fun, should be more aligned with that mentality?

12_throw_away 64 days ago | |

So yeah, their implementation of chmod checked if a path was pointing to the root of the filesystem with 'if file == Path::new("/")'.

How the f** did this sub-amateur slop end up in a big-name linux distribution? We've de-professionalized software engineering to such a degree that people don't even know what baseline competent software looks like anymore

antonvs 64 days ago | |

I love Rust, but I wonder if this is an example of the idea that its excellent type system can lull some people into a false sense of security. Particularly when interfacing to low-level code like kernel APIs, which are basically minefields inadvertently designed to trick the unwary, the Rust guarantees are undermined. The extent of this may not be immediately obvious to everyone.

timcobb 64 days ago | | |

This seems to be the case, yes. Before reading this post I was a lot more open minded about the "rewrite it in Rust" scene but now I'm just kind of in a horrorpit wondering whether I'll be stuck on macOS forever :(.

stackedinserter 64 days ago |

TIL that

> uutils read it as “send the default signal to PID -1”, which on Linux means every process you can see.

What's the use case for killing all process you can see?

Arch-TK 64 days ago | |

Many cases, including as a last resort as part of shutdown, to try to trigger remaining services into a graceful exit (although these days cgroups help avoid ever being in such a situation).

geocar 64 days ago | |

kill -SIGWINCH -1 will redraw all your windows.

r2vcap 64 days ago |

Just use Fedora :)

bombcar 64 days ago | |

All the cool kids are using Gentoo or Nix ;)

nottorp 64 days ago |

> Rust’s standard library makes this easy to get wrong. The ergonomic APIs you reach for first (fs::metadata, File::create, fs::remove_file, fs::set_permissions) all take a path and re-resolve it every time, rather than taking a file descriptor and operating relative to that. That’s fine for a normal program, but if you’re writing a privileged tool that needs to be secure against local attackers, you have to be careful.

It's not fine even for a normal program, because operations on a large number of files will end up an order of magnitude slower. No matter what language you write your utility in.

... reads the article to the end, marvels at all the problems resulting from not understanding how the OS works and missing 40 years of refinement ...

Is this in an Ubuntu LTS ?!?

sennalen 64 days ago |

Reversing max and min. That's one I've done a lot, and I don't think any compiler could save me from.

micheles 64 days ago |

> uutils now runs the upstream GNU coreutils test suite against itself in CI. That’s the right scale of defense for this class of bug. That's the minimum, it is absurd that they did not start from that!

jeroenhd 64 days ago | |

I recall the last time there was a massive bug in the uutils project, it was because the coreutils tests didn't cover some crucial aspect people relied on. Running these tests is useful for compatibility and all, but it won't necessarily catch security issues.

ordu 64 days ago | |

I believe they did it all the time. Maybe it was not automated? But they boasted in news multiple times how many coreutils tests they are passing. I suspect that those tests are useless for security, they are more about compatibility or something like that.

aw1621107 64 days ago | |

Looks like they've been doing at some kind of automated comparison against the GNU test suite since 2021 or so [0]?

[0]: https://github.com/uutils/coreutils-tracking/commits/main/?a...

mayhemducks 64 days ago |

I enjoyed reading this.

I LOL'd when I read "eternal ball of sadness".

Analemma_ 65 days ago |

I know nobody's perfect and I'm not asking for perfection, but these bugs are pretty alarming? It seems like these supposed coreutils replacements are being written by people who don't know anything about Unix, and also didn't even bother looking at the GNU tools they are trying to replace. Or at least didn't have any curiosity about why the GNU tools work the way they do. Otherwise they might've wondered about why things operate on bytes and file descriptors instead of strings and paths.

I hate to armchair general, but I clicked on this article expecting subtle race conditions or tricky ambiguous corners of the POSIX standard, and instead found that it seems to be amateur hour in uutils.

chiffaa 64 days ago | |

Few things to note

1. uutils as a project started back in 2013 as a way to learn Rust, by no means by knowledgeable developers or in a mature language

2. uutils didn't even have a consideration to become a replacement of GNU Coreutils until.... roughly 2021, I think? 2021 is when they started running compliance/compatibility tests, anyway

3. The choice of licensing (made in 2013) effectively forbids them from looking at the original source

lelanthran 65 days ago | |

> It seems like these supposed coreutils replacements are being written by people who don't know anything about Unix, and also didn't even bother looking at the GNU tools they were supposed to be replacing.

They're a group of people who want to replace pro-user software (GPL) with pro-business software (MIT).

I don't really want them to achieve their goal.

ronjakoi 64 days ago | |

They are deliberately not looking at coreutils code because the Rust versions are released as MIT and they don't want the project contaminated by GPL. I am not fond of this, personally.

penguin_booze 64 days ago |

The correct phrasing of the title: The bugs Rust won't catch.

PunchyHamster 64 days ago |

Seems like typical pattern of

* Let's rewrite thing in X, it is better

* Let's not look at existing code, X is better so writing it from scratch will look nicer

* Whoops, existing code was written like this for a reason

* Whoops, we re-introduce decade+ old problems that original already fixed at some point

Bridged7756 64 days ago | |

I call it FOTM engineering. Let's throw everything out of the window so we can use X novel thing!

slopinthebag 64 days ago |

I find it interesting how people will criticise Rust for not preventing all bugs, when the alternative languages don't prevent those same bugs nor the bugs rust does catch. If you're comparing Rust to a perfect language that doesn't exist, you should probably also compare your alternative to that perfect language as well right?

I'd be interested in a comparison with the amount of bugs and CVE's in GNU coreutils at the start of its lifetime, and compare it with this rewrite. Same with the number of memory bugs that are impossible in (safe) Rust.

Don't just downvote me, tell me how I'm wrong.

melodyogonna 64 days ago |

TL;DR: Rust can't catch logic bugs

rvz 64 days ago |

This is what happens when many people hype about a technology that solves a specific class of vulnerabilities, but it is not designed to prevent the others such as logic errors because of human / AI error.

Granted, the uutils authors are well experienced in Rust, but it is not enough for a large-scale rewrite like this and you can't assume that it's "secure" because of memory safety.

In this case, this post tells us that Unix itself has thousands of gotchas and re-implementing the coreutils in Rust is not a silver bullet and even the bugs Unix (and even the POSIX standard) has are part of the specification, and can be later to be revealed as vulnerabilities in reality.

swiftcoder 64 days ago | |

> the uutils authors are well experienced in Rust

I'm not sure that they were all that experienced in Rust when most of this code was written. uutils has been a bit of a "good first rust issue" playground for a lot of its existence

Which makes it pretty unsurprising that the authors also weren't all that well versed in the details of low-level POSIX API

IshKebab 64 days ago | |

It's not designed to completely eliminate other bug classes but it is designed to reduce the chance that they happen.

In this case the filesystem API was perhaps not as well designed as it could have been. That can potentially be fixed though.

Some of the other bugs would be hard to statically prevent though. But nobody ever claimed otherwise.

osmsucks 64 days ago |

I feel like one of the takeaways here is that Rust protects your code as long as what your code is doing stays predictably in-process. Touching the filesystem is always ripe with runtime failures that your programming language just can't protect you from. (Or maybe it also suggests the `std::fs` API needs to be reworked to make some of these occurrences, if not impossible, at least harder.)

On a separate note: I have a private "coretools" reimplementation in Zig (not aiming to replace anything, just for fun), and I'm striving to keep it 100% Zig with no libc calls anywhere. Which may or may not turn out to be possible, we'll see. However, cross-checking uutils I noticed it does have a bunch of unsafe blocks that call into libc, e.g. https://github.com/uutils/coreutils/blob/77302dbc87bcc7caf87.... Thankfully they're pretty minimal, but every such block can reduce the safety provided by a Rust rewrite.

aw1621107 64 days ago | |

> and I'm striving to keep it 100% Zig with no libc calls anywhere. Which may or may not turn out to be possible, we'll see.

Probably will depend on what platform(s) you're targeting and/or your appetite for dealing with breakage. You can avoid libc on Linux due to its stable syscall interface, but that's not necessarily an option on other platforms. macOS, for instance, can and does break syscall compatibility and requires you to go through libSystem instead. Go got bit by this [0]. I want to say something similar applies to Windows as well.

This Unix StackExchange answer [1] says that quite a few other kernels don't promise syscall compatibility either, though you might be able to somewhat get away with it in practice for some of them.

[0]: https://github.com/golang/go/issues/17490

[1]: https://unix.stackexchange.com/a/760657

osmsucks 64 days ago | | |

Since it's a personal project, Linux compatibility is the only thing I care about right now. I'm testing it under WINE as well, just because I can, but I don't have access to Mac OS so I'm skipping that problem entirely for now

// WalkDir calls fn with paths that use the separator character appropriate // for the operating system. This is unlike [io/fs.WalkDir], which always // uses slash separated paths. func WalkDir(root string, fn fs.WalkDirFunc) error {