C stdlib isn't threadsafe and even safe Rust didn't save us

C stdlib isn't threadsafe and even safe Rust didn't save us(edgedb.com)

327 points by msully4321 1 year ago | 362 comments

mmastrac 1 year ago |

The major takeaway from this is that Rust will be making environment setters unsafe in the next edition. With luck, this will filter down into crates that trigger these crashes (https://github.com/alexcrichton/openssl-probe/issues/30 filed upstream in the meantime).

usefulcat 1 year ago | |

But that won't actually fix the underlying problem, namely that getenv and setenv (or unsetenv, probably) cannot safely be called from different threads.

It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.

eqvinox 1 year ago | | |

I have a different perspective: the underlying problem is calling setenv(). As far as I'm concerned, the environment is a read-only input parameter set on process creation like argv. It's not a mechanism for exchanging information within a process, as used here with SSL_CERT_FILE.

And remember that the exec* family of calls has a version with an envp argument, which is what should be used if a child process is to be started with a different environment — build a completely new structure, don't touch the existing one. Same for posix_spawn.

And, lastly, compatibility with ancient systems strikes again: the environment is also accessible through this:

   extern char **environ;

Which is, of course, best described as bullshit.

debugnik 1 year ago | | |

No amount of locking can make the getenv API thread-safe, because it returns a pointer which gets invalidated by setenv, but lacks a way to release ownership over it and unblock setenv safely (or to free a returned copy).

So setenv's existence makes getenv inherently unsafe unless you can ensure the entire application is at a safe point to use them.

Ferret7446 1 year ago | | |

Is that a problem? I feel like calling getenv and setenv from different threads is a design antipattern anyway. Any environment setting and loading should happen in the one and only main thread right after process init.

pshc 1 year ago | | |

The underlying problem is that setenv is mutable global state and should never have existed

kazinator 1 year ago | | |

The mutex would have to be held by the caller until it no longer needs the string returned from the environment, or makes a copy:

   stdenvlock();    // imaginary function added to ISO C or POSIX
   char *home = getenv("HOME");
   char *home_copy = strdup(home);
   stdenvunlock();  // only here can we unlock
   // home pointer is now indeterminate

Other solutions:

1. Put the above sequence into a function, and don't expose the mutex. Thread-safe code must use:

   char *home = dupenv("HOME"); // imaginary function; caller responsible for freeing.

2. Provide environment lookup into a buffer:

   getenvbuf("HOME", mybuf, sizeof mybuf);  // returns some value that helps to resize the buffer

All functions that retain pointers out of the classic getenv remain unsafe.

A mutex can be provided to those applications that want to manipulate the environ array directly, or use getenv and setenv, or any combinations of these.

The main problem is all the code out there using getenv.

liontwist 1 year ago | | |

Please no.

If your program wants to use the environment as an out-of-band global var for cross thread communication, you can make your own mutex.

ModernMech 1 year ago | | |

It's the same problem with global vars, but at a machine scope. The real solution here would be for the OS to have a better interface to read and write env vars, more like a file where you have to get rw permission (whether that's implemented as a mutex or what).

belter 1 year ago | | |

> It seems like the only reliable way to fix this is to change these functions so that they exclusively acquire a mutex.

A mutex can ensure thread safety but risks deadlocks if not used carefully and will hurt performance...

goeiedaggoeie 1 year ago | | |

setenv and getenv have never been thread safe, why the concern with it now?

loeg 1 year ago | | |

Is that the underlying problem, or is the underlying problem that libraries are using thread-unsafe setenv in threaded contexts when they could just do something else?

db48x 1 year ago | | |

But it would force Rust programs to add their own synchronization mechanism around them. As long as no two threads can call getenv/setenv at the same time then it’s fine.

thayne 1 year ago | | |

In particular, it doesn't help if you call a c function that indirectly modifies the environment with FFI.

zamalek 1 year ago | |

> Nowadays the best solution to this issue is "stop using this crate" with libraries like rustls.

Nice to see that the author of the library has a sensible take. Unfortunately the ecosystem does not: https://github.com/seanmonstar/reqwest/blob/master/Cargo.tom...

benatkin 1 year ago | |

People get trained to ignore the ____UNSAFE_payattention__nevermindthatthisappears50timesinthisfile___ blocks and prefixes

This also shows up in web frameworks where Vue has the v-html directive and react has dangerouslySetInnerHTML. Vue definitely has it better.

crooked-v 1 year ago | | |

In the React world, the only times I've seen dangerouslySetInnerHTML consistently used is for outputting string literal CSS content (and this one is increasingly rare as build tools need less handholding), string literal JSON content (for JSON+LD), and string literal premade scripts (i.e. pixel tags from the marketing content). That's not to say there's no danger surface there, but it's not broadly used as a tool outside of code that's either really bad or really exhaustively hand-tuned.

ChrisSD 1 year ago |

In the Rust std, `set_var` and `remove_var` will correctly require using an `unsafe {}` block in the next edition (2024). The documentation does now mention the safety issue but obviously it was a mistake to make these functions safe originally (albeit a mistake even higher level languages have made).

https://doc.rust-lang.org/stable/std/env/fn.set_var.html

There is a patch for glibc which makes `getenv` safe in more cases where the environment is modified but C still allows direct access to the environ so it can't be completely safe in the face of modification https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...

vlovich123 1 year ago |

Even if C stdlib maintainers are resistant against making setenv multi-thread safe, at a minimum there should be a new alternative thread-safe API defined, whether within POSIX or defining a defacto standard and forcing POSIX to adopt it over time. If instead of explaining why nothing could be done was spent fixing this problem, a new thread-safe API could have replaced the old setenv which could have been deprecated and removed from many software projects.

I'm also not convinced by Musl's maintainer that it can't be fixed within Musl considering glibc is making changes to make this a non-issue.

usefulcat 1 year ago | |

The biggest problem is not the absence of a thread safe API, it's the existence of this:

    extern char **environ;

As long as environ is publicly accessible, there's no guarantee that setenv and getenv will be used at all, since they're not necessary.

If you're willing to get rid of environ, it's pretty trivial to make setenv and getenv thread safe. If not, then it's impossible, although one could still argue that making setenv and getenv thread safe is at least an improvement, even if it's not a complete solution (aka don't let the perfect be the enemy of the good).

vlovich123 1 year ago | | |

> aka don't let the perfect be the enemy of the good

Exactly my point. Over time *environ would disappear, at least from the major software projects that everyone uses (assuming it's even in use in them in the first place).

panzi 1 year ago | |

Guess that would also require some locking for all the exec() functions that don't take the environment as a parameter or that search PATH for the executable.

davidt84 1 year ago | |

I'm not convinced by you that you know more than the experts who have determined there is no backwards-compatible way to fix this.

vlovich123 1 year ago | | |

I'll take existence proofs [1] over personal insults but YMMV. You also may want to be careful assuming the expertise of people on this forum. Some people here are quite technical.

[1] https://github.com/bminor/glibc/commit/7a61e7f557a97ab597d6f...

StillBored 1 year ago |

Its like a rite of passage to be hit by an environment related bug on linux, which is mysteriously less a problem on other unix's. Which is sorta funny given how pragmatic Linus and the kernel are about fixing POSIX bugs by making them not happen, while glibc is still lagging here decades after people tried to at least make the problem better. Sure there is all the crap around TZ/etc, but simply providing getenv_r() and synchronizing it with setenv() and warning during compile/link on getenv() would have killed much of the problem. Nevermind, actually doing a COW style system where the env pointer(s) are read only. Instead the problem is pushed to the individual application, which is a huge mistake, because application writers are rarely aware of what their dependencies are doing. Which is the situation I found myself in many many years ago. The closed source library vendor, at the time, told us to stop using that toy unix clone (linux).

masklinn 1 year ago |

Previously on setenv being a terrible thing: https://www.evanjones.ca/setenv-is-not-thread-safe.html (discussion: https://news.ycombinator.com/item?id=38342642 first comment is even about it causing issues in Rust)

Animats 1 year ago | |

Yes. That's known.

Most of the rest of the problem here seems to be the development environment. They're testing on a remote machine in an Amazon data center and using Docker. This rig fails to report that a process has crashed. Then they don't have enough debug symbol info inside their container to get a backtrace. If they'd gotten a clean backtrace reported on the first failure, this would have been obvious.

Why is anyone using "setenv" anyway?

mmastrac 1 year ago | | |

Yup, it's mostly just the story and tools we used to get ourselves out of a mess that was made harder by some decisions made earlier -- the tests were running in a container with stripped symbols (we're going to ship symbols after this, no reason to over-optimize), our custom test runner failed to report process death (an oversight).

There's no reason setenv should have been called here. The `openssl-probe` library could simply return the paths to the system cert files and callers could plug those directly into the OpenSSL config.

Oversights all around and hopefully this continues to improve.

masklinn 1 year ago | | |

> Why is anyone using "setenv" anyway?

Because it’s there and it looks like a good idea until it takes one of your fingers.

kelnos 1 year ago |

This reminded me of that whole "12-factor app" movement, which several of my former coworkers had really bought into. One of the "factors" is that apps should be configured by environment variables.

I always thought this was kinda foolish: your configuration method is a flat-namespace basked of stringly-typed values. The perils of getenv()/setenv()/environ are also, I think, a great argument against using env vars for configuration.

Sure, there aren't always great, well-supported options out there. I prefer using a configuration file (you can have templated config and a system that fills in different values for e.g. dev/stage/prod), and I'll usually use YAML, despite its faults and gotchas. There are probably better configuration file formats, but IMO YAML is still significantly better than using env vars.

rikthevik 1 year ago |

Great article about digging into a non-obvious bug. This one had it all! Intermittent bug, architecture-specific, hidden in a dependency, rust, the python GIL, gettext. Fantastic stuff.

These kinds of detailed troubleshooting reports are the closest thing you can get to having to do it yourself. Thanks to the authors. It's easy to say "don't use X duh" until a dependency relies on it, and how were you supposed to know?

nwellnhof 1 year ago |

> Our nightly CI machines run on Amazon AWS, which has the advantage of giving us a real, uncontainerized root user.

> We don’t have the necessary files outside of the container, and our containers are quite minimal and don’t allow us to easily install gdb.

Have people lost the ability to build and debug their code locally, without clouds and containers?

api 1 year ago | |

Yes. It’s shocking just how much cloud SaaS has distorted peoples understanding of things. You need all kinds of layers of cloud complexity and deployment to do the most trivial stuff. We have 100% reversed the PC revolution and returned to the era of clunky expensive mainframe computing.

The reason is that cloud is where all the money is because cloud is DRM. Put software there and you can charge a subscription and nobody can evade it and you have perfect lock in forever. People usually can’t even get their data out. You can also do all kinds of realtime analytics conveniently to optimize your product.

Computing architecture is downstream of the business model. Mainframe died originally because there was no Internet and PCs were cheaper, but vendors also lost a lot of their lock in power. Now they have a way to bring a model that is much more profitable back. No more pesky freedom for users, who to be fair if given such freedom will often just refuse to pay, making quality software a non-viable business.

Tangent I know.

bluGill 1 year ago | | |

There is a lot to like about the clould model as a user. I can access my data where ever I am, from what ever device I have, and I won't lose it to a disc crash.

there are faults to the cloud but it solves real problems users have.

bluGill 1 year ago | |

This is a random trash only on arm. I doubt they could get the crash to happen locally - most likely their developer machines were all x86 where it never crashed.

they should have handled crashes better - a problem they seem to recognize but not the issue here so not covered.

msully4321 1 year ago | |

> Have people lost the ability to build and debug their code locally, without clouds and containers?

No, of course not, but it didn't crash on our machines!

mardifoufs 1 year ago | |

How would you debug locally when you probably don't have a device that runs the arch that is causing an issue? It's much faster to just debug in the actual environment where the failure happens anyways.

forrestthewoods 1 year ago |

Mutable global state is evil. Friends don’t let friends use mutable global state.

I hate envvars. It’s “the Linux way”. I avoid them like the plague. A++ strong recommend.

libc is terrible. The world needs to move on.

hauntsaninja 1 year ago |

We had so many of these issues that we ended up LD_PRELOAD-ing patch getenv / setenv / putenv

msully4321 1 year ago | |

With a fixed implementation that leaks environments (like the one that just landed in glibc)?

shikon7 1 year ago |

I wonder why it is so hard for Rust to implement its own safe stdlib independent of C.

datadeft 1 year ago |

Couldn't we have a better pattern for this?

    if (__environ == NULL || name[0] == '\0')
      return NULL;

cuno 1 year ago |

We ended up overriding and replacing with our own thread-safe version years ago when we also hit this.

janmatejka 1 year ago |

This reminds of the time I was not able to get setproctitle to work in certain code base. Eventually I narrowed the issue to this line:

  import numpy

setproctitle() worked before numpy import but not after because it couldn't find the memory address of **environ.

I'm hazy on the details but it led me to a somethingenv call (possibly getenv or setenv) in numpy initialization and it turned out that function changed the address of **environ and that was the reason for why setproctitle couldn't find it.

loeg 1 year ago |

env::set_var is marked unsafe now: https://doc.rust-lang.org/std/env/fn.set_var.html

And:

> This function is safe to call in a single-threaded program.

> This function is also always safe to call on Windows, in single-threaded and multi-threaded programs.

> In multi-threaded programs on other operating systems, the only safe option is to not use set_var or remove_var at all.

HarHarVeryFunny 1 year ago |

What is the rationale for libc not making setenv/getenv thread safe? It does seem rather odd given how environment variables are explicitly defined as shared between threads in the same process!

It doesn't seem it would take much to do it efficiently, even retaining the poor getenv() pointer-returning API (which could point to a thread local buffer). The coordination between getenv and setenv could be very lightweight - spinlock vs mutex.

jeroenhd 1 year ago | |

The spec says it's not supposed to be thread safe.

There's also no real backwards compatible way of fixing setenv(). getenv() returns a pointer that can be read at any time, and then there's the *environment parameter that can also be used to read env variables.

IMO the entire API should be deprecated for a thread safe one, but until someone comes with a standard setenv() alternative that's implemented by the libc runtimes, we'll be stuck with the shitty POSIX API, and every year we will read blog posts about get/setenv() crashing processes.

4gotunameagain 1 year ago | |

I think the argument was that the standard states that setenv is not thread safe, although from what I see it says that it does not have to be thread safe:

  The setenv( ) function need not be thread-safe. A function that is not required to be thread-safe is not required to be reentrant.

https://www.open-std.org/jtc1/sc22/open/n4217.pdf.

Page.. 1860 :')

HarHarVeryFunny 1 year ago | | |

Sure, but given that Linux defines the environment as state that's shared between threads, not having a thread-safe way of accessing it is hard to defend...

Is "the standard says it doesn't NEED to be thread safe" the argument that the Linux libc maintainers are using for not enhancing it to be thread safe, or is it based on some technical or backwards compatibility issues in doing so ?

saagarjha 1 year ago | |

The rationale is that it was implemented before threads existed, and now can't be retrofitted with thread safety.

Meneth 1 year ago |

From the backtrace, it seems strerror_r is not thread-safe, since it calls __dcigettext which calls getenv.

A similar bug related to setlocale was found in 2007 and fixed in 2014. That bug did not take getenv/setenv into account. https://sourceware.org/bugzilla/show_bug.cgi?id=5443

jandrese 1 year ago |

Yet another person is burned by calling setenv() in a multi-threaded context. There really needs to be a big warning banner on the manpage for setenv() that warns about this because it seems like a far more common problem than you would expect.

umpalumpaaa 1 year ago | |

The man page says:

> POSIX.1 does not require setenv() or unsetenv() to be reentrant.

A non-reentrant function cannot be thread safe.

In general (for POSIX, libc and many other libraries: if the docs do not explicitly say "this function is thread safe" they are not).

wmf 1 year ago | | |

It's time to move beyond this attitude and make things safe by default. For example, Solaris has a safer version of setenv().

"It is ridiculous that this has been a known problem for so long. It has wasted thousands of hours of people's time, either debugging the problems, or debating what to do about it. We know how to fix the problem." https://www.evanjones.ca/setenv-is-not-thread-safe.html

jabl 1 year ago | | |

> A non-reentrant function cannot be thread safe.

Actually, a non-reentrant function can be thread-safe. A common example of such a function in libc being malloc().

01HNNWZ0MV43FF 1 year ago | |

Funny enough, the Rust wrapper `std::env::set_var` does have a big warning https://doc.rust-lang.org/std/env/fn.set_var.html

subarctic 1 year ago | | |

Looks like that Safety section was added in 1.76.0. It'll be an even bigger warning in the future since it's now going to be unsafe in Rust 2024

kazinator 1 year ago |

This is not just a thread issue!

You run into a problem if you keep using a string returned by getenv after calling another environment function: including possibly getenv itself!

However, it's easy to just strdup the result of getenv; that defends against the issue in a single-threaded program.

vrtx0 1 year ago |

Let me try to help:

1. If a process crashes and dumps, be sure to look at the system log of the cause (e.g. SIGSEGV, OOM, invalid instruction, etc.)

2. Be certain you’re looking at the right core dumps — I believe UID 1000 just means posix UserID (which is unrelated to a PID), though I don’t use containers.

3. Stay focused on the right level of abstraction — memory model details are great to know, but irrelevant here.

4. Variables do not correlate 1:1 with registers, except in C calling conventions. The assumption about x20 and a local variable is incorrect, unfortunately.

5. getenv() and setenv() do not work as implied in the post. When a process starts via execve(), the OS/libc constructs a new snapshot of the environment, and cannot be modified by an ancestral process. It’s a snapshot in time, unless updated by the process itself. When a process fork()s, the child gets a new copy of the parent’s environment — updates do not propagate.

getenv() is thread safe and reentrant. You don’t use an environment to pass shared data — setenv() is generally used when constructing the environment for a child process before a fork(). See man environment.

6. FWIW, ‘char** env’ is a null-terminated array of pointers, so dumping memory from *env (or env[0]) is only valid until you hit the first NULL. The size of the array is not stored in the array.

I hope this helps! And apologies if this is redundant — I read so many comments; mostly variations of “the problem with getenv is x”, but gave up before reading all of the (currently) 168 comments.

roca 1 year ago |

Switching from OpenSSL to rustls solves even more problems than expected.

colonial 1 year ago |

TIL that my set_env("RUST_LOG"...) calls at startup are technically unsafe. Funny.

I should see if the env_logger crate has a better solution.

loeg 1 year ago | |

At startup it's probably fine! It's safe in a single-threaded environment.

kurante 1 year ago | | |

As long as they don't use `#[tokio::main]` or any other attribute that wraps main into an async function!

up2isomorphism 1 year ago |

Does posix say setenv us thread safe? If not, why complain about it?

einpoklum 1 year ago |

A function which sets global process state is not thread safe? Why, I'm shocked; shocked and chagrined.

But really, I don't understand why a sensitive security-related library would implicitly use an unsafe function like setenv().

bangaladore 1 year ago | |

> A function which sets global process state is not thread safe? Why, I'm shocked; shocked and chagrined.

This is a oversimplification. Windows has essentially the exact same API and it works just fine in multithreaded contexts.

The issue here is unix allows the underlying pointer to be accessed, bypassing any possible thread-safe APIs.

throwaway2037 1 year ago |

Click bait title? GLibC is very clear about what is and what is not thread-safe. I looked at the article: They fell victim to the classic getenv()/setenv() trap. This has been blogged about many times. If you look at the man page for setenv():

Ref: https://man7.org/linux/man-pages/man3/setenv.3.html

... it clearly says: "MT-Unsafe"

Also, there is a whole section about get/set env thread safety here (under "Other safety remarks -> env"):

https://man7.org/linux/man-pages/man7/attributes.7.html

wakawaka28 1 year ago |

Sounds like you just didn't know it's not threadsafe. This is common knowledge in the C and C++ world.

lopkeny12ko 1 year ago |

The whole point of Rust is memory safety, not thread safety...

masklinn 1 year ago | |

Rust literally bakes data race safety into the language. While it does not resolve general race conditions, thread safety issues which cause memory unsafety (which an UAF or dangling pointer would be) are very much within its remit.

gavinhoward 1 year ago |

It is weird that I got this right before Rust did.

Because I use structured concurrency, I can make it so every thread has its own environment stack. To add to a new environment, I duplicate it, add the new variable, and push the new enviroment on the stack.

Then I can use code blocks to delimit where that stack should be popped. [1]

This is all perfectly safe, no `unsafe` required, and can even extend to other things like the current working directory. [2]

IMO, Rust got this wrong 10 years ago when Leakpocalypse broke. [3]

[1]: https://git.yzena.com/Yzena/Yc/src/branch/master/tests/yao/e...

[2]: https://gavinhoward.com/2024/09/rewriting-rust-a-response/#g...

[3]: https://gavinhoward.com/2024/05/what-rust-got-wrong-on-forma...

mmastrac 1 year ago | |

This isn't _really_ a Rust problem. Rust is a victim of POSIX.

If you have 1) C FFI interop in Yao, there's still a chance you might have two C libraries cause a crash without your code even being involved.

gavinhoward 1 year ago | | |

Except if there is dymanic linking, I can use that to inject my own setenv and getenv, just like people inject jemalloc or other malloc alternatives.

Context: The setenv function is not thread-safe even in Rust Question: Why doesn't Rust implement a standard library without C? Answer: It does, but core lacks std::env, because env vars are part of an O/S Question: Is an O/S really necessary for env vars? Answer: Not conceptually, but without an O/S, env vars don't work as expected