That child process does the scary stuff - parsing. Parsing requires zero system calls. Reading to/from the parent requires only read and write, but not open, so they can only read and write to those file descriptors.
And exit.
That's it. Seccomp v1 is trivial to apply, gives 4 system calls, and makes the process virtually useless to an attacker. If you want to get fancy and allow for multithreading you can use seccomp v2 and create your threadpool before you drop privs, and probably add futex and memmap.
You pay a latency cost but the security win is huge.
Running the code in a Wasm sandbox sounds a whole lot easier and less error prone. You do have to trust the Wasm engine, but nothing else. And you don't need in-depth knowledge of OS security mechanisms.
https://github.com/google/wuffs
This sort of bug can't happen in WUFFS because you can't express the idea "corrupt the heap memory" even if you desperately wanted to. The tell-tale sign of such languages is that they are not general purpose languages, because those are able to express a wide variety of stupid things you don't want to do.
Integer overflow can happen in Rust, but it's well-defined, not undefined. This helps.
Bounds checking is part of indexing, and so even if an index overflows, the check should happen, and panic.
"impossible" is a strong word, but it would be significantly less likely in Rust. If you did the same thing as you did in C, with unsafe, then it could happen. But there's not a lot of reason to 99.9999% of the time, as it's the more difficult and less ergonomic option.
Rust's lack of implicit numeric conversions pushes authors towards using usize (size_t) for everything. So in Rust you'd be more likely to have a denial of service due to supporting 2^64 columns. If you tried to carelessly use u16 for the number of columns, you'd more likely have an application level bug like incorrect page rendering, or in the worst case a panic (equivalent of an uncaught C++ exception, which may be a program-stopping bug, but not a vulnerability).
Nothing I mentioned requires knowledge of OS security mechanisms beyond what I've described in my short comment.
And I do mean flimsy. Here's a fun example from a random copy of the AUTOSAR guidelines I found online labelled 17-03. AUTOSAR says if I have two 8-bit signed integers and I add them, that might overflow which is bad. So, what if I simply check that they're both less than 100, no more overflow? "Correct" says the AUTOSAR guide this is apparently OK.
Huh. Signed 8-bit integer. 99 + 99 = -58. This is probably not what the person who purchased your car thought the answer was, I hope whatever accident you just caused isn't fatal.
That's the price of being a General Purpose programming language. We don't know if you might want to scribble on your own heap, or delete all the files labelled "Important, DO NOT DELETE" or mail a copy of the password database to a throwaway account, and so you can do all those things. Those Linux files pointing into process address space aren't a mistake, I wrote code that needs them (and then I ported it to safe Rust months ago) but with great power comes great box office potential or something like that.
Now you might say, "I'm sure I won't get something so obvious wrong", but the trouble is that's what the people who wrote this GitHub code apparently thought too. Hence I say we should use specialised languages with a deliberately narrower scope where this category of mistake is impossible.
WUFFS as it stands would be pretty exhausting to write a Markdown parser in because WUFFS doesn't believe in strings, at all. But it's already a better fit for this problem than C++ because the worst case scenario can't happen.
It suffices to find a way to corrupt it's internal state and via this attack vector influence its behaviour.
Which yes, boils down to common attacks to separate processes and IPC.
The security claims are entirely that gaining arbitrary execution inside the wasm sandbox does not give you arbitrary execution in the host.
The benefit of a wasm sandbox over a process sandbox is entirely in the overhead reduction - but that does come at the cost of wasm being generally slower than native compilation (oh tradeoffs we will never escape you)
True, the standard does mention what is and is not safe.
That can of course be enough to causes damage, but the attack surface is still much smaller and makes RCE a lot less useful. Especially if capabilities are used to strictly limit the syscall surface for the WASM side (with reference types / interface type resources).
WASM isn't a magical security panacea, but it does offer solutions.
Of course not using languages that are prone to these attacks in the first place is a better fix.
Many that talk about how great the security sandbox is never looked into the security section of the standard.
https://webassembly.org/docs/security/
See memory safety and mitigations.
Edit: I didn’t research where the corruption comes from in this bug.
Edit again: it looks like the source file is actually C and not C++.
https://github.com/github/cmark-gfm/commit/ac80f7b56522ffa15...
source: i cut the releases ;)
But, since the operations in actual Rust are marked as safe, the compiler doesn't provide any checks here: we can cause UB in code without any unsafe { }. Moreover, checking if the path starts with /proc isn't enough to make the UB go away: procfs can be mounted on any dir, there can be bind mounts further obscuring the file resolution, etc.
This means that if you really care about memory safety (and correctness in general), the precise way you setup your environment is also critical, down to the minimum details. It's like your Dockerfile had a metaphorical unsafe { } block around it: in a system that doesn't mount /proc you just closed a whole host of bugs, and a buggy system that mounts procfs in other dirs may cause arbitrary havok. (note that mounting procfs is a privileged operation)
There are low level languages that, unlike Rust, completely prevents memory safety errors, like ATS. In ATS you can deal with pointers and pointer arithmetic (like in C or Rust) but to follow a pointer you need to provide a mathematical proof that they are valid. This is enough if we consider the program in isolation, but programs are never run in isolation. A proper mathematical proof of memory safety needs to consider ALL software running in the system, globally: then everything is mathematically verified, and the build step can just reject an unsound system setup.
That way we could theoretically be more precise about our memory safety guarantees: opening and writing to a file is safe, but only if procfs isn't mounted. If procfs is mounted anywhere, then this may go wrong: we need to prove we aren't doing something bad. This means that in a system where sysadmins can just log in and mount random stuff, writing to files must be unsafe!
Of course that's not very practical. It would be cumbersome to prove you're not doing /proc shenanigans every time you messed with files. And arguably, any program that open arbitrary filenames that came from untrusted input is buggy anyway. You should always do filename validations, specially to confine some input to some directory (when applicable), avoiding paths with ../ that escape it, for example. And, any setup that mounts procfs outside of /proc is irreparably broken. We don't have a tool to automatically check for such issues, but if those two things are followed, we won't have UB here.
How to do better than that? We need better system-level APIs, in which operations that are "obviously" safe can really be 100% memory safe all the time.
There's some exceptions though. Rust puts a mutex for accessing stdio (to prevent interleaved, broken output when calling println!() from many threads). Rust also has a mutex for accessing the environment [0]
But in this case... see, this /proc thing is pretty niche. With some bizarre combinations of brokenness (either on your application, or on the system setup, or both) it can indeed lead to UB, which can be a serious security bugs, remote code execution even, but it's very, very rare that some program needs to care about this in practice. Rust is a practical language and I think in this case the line it drew was quite sensible.
Sadly, this means that Rust safety guarantees aren't absolute, but Rust doesn't even have yet a precise mathematical definition of UB anyways, so we can't even in principle start formalizing this enough for this to matter.
Rust is also in a though spot here because as I said, there is hardly a foolproof way to check at runtime whether you're accessing procfs: checking for /proc in the path is just an heuristic that doesn't actually close the UB loophole. You would need to inspect all mounts to check for procfs and bind mounts (and also check for symlinks, hardlinks, etc),
Maybe another route is to open the file as normal, but then do a stat and check the device and inode: if the device is a procfs device, and the inode is a bad one, return an error (if you want to open it anyway, you need to use an unsafe API). Or, if we can't check because some system setup shenanigans, default to returning an error. This could be an useful crate for a sufficiently paranoid application, but might not make the cut for the Rust stdlib, even though it ostensibly closes a safety loophole. (or it just might; maybe this should be proposed)
[0] Unlike the stdio one, the environment mutex is actually critical for safety in Rust programs. But you can break this safety by calling C code that reads the environment in a non-threadsafe manner without passing through the Rust mutex. So, accessing the environment from many threads can easily lead to UB, even though the operation is marked as safe in Rust. This can still be sound from Rust's pov because calling C APIs is unsafe, so you "just" need to guarantee that all C code isn't accessing the env behind your back. Except that there's some bad APIs like getaddrinfo that may implicitly access the environment, and tons of libraries call that, so in practice many C libraries can't be given a safe Rust interface. See https://doc.rust-lang.org/std/env/fn.set_var.html and https://internals.rust-lang.org/t/synchronized-ffi-access-to... and https://github.com/rust-lang/rust/issues/27970
Note that for many well behaved programs, environment variables are read only at program startup (before creating any threads), saved to a config struct, and this struct is passed around as needed. This usage can be safe even without the mutex. So an alternative design would be to prevent calling some APIs once you have created threads. To do that you maybe could add some way of tracking at type level whether the program is single threaded or multi-threaded (perhaps with session types, or the typestate pattern: basically a type-level state machine, where spawning a thread makes you go to the multi-threaded state if you're not already there). Also, in the single-threaded state, Arc could be automatically converted to Rc as well, Mutex converted to RefCell, etc. It would be interesting to see a language designed around this.