Go’s race detector has a mutex blind spot(doublefree.dev) |
Go’s race detector has a mutex blind spot(doublefree.dev) |
https://go.dev/blog/race-detector
> Because of its design, the race detector can detect race conditions only when they are actually triggered by running code, which means it’s important to run race-enabled binaries under realistic workloads.
This isn't a mutex blind spot. It is a side effect of goroutine/thread scheduling, which will obviously be based on workload and other factors. There are a bunch of other cases, that are not mutex related, that it won't see unless execution actually triggers them.
The annotation also help me discover patterns, like when most of the functions of a class have the same annotations, maybe it means that all the functions of the class should have the same annotations.
I really wish an equivalent to those annotations came to Go.
The right way to use the go race detector is:
1. Only turn it on in testing. It's too slow to run in prod to be worth it, so only in testing. If your testing does not cover a use-case, tough luck, you won't catch the race until it breaks prod.
2. Have a nightly job that runs unit and integ tests, built with -race, and without caching, and if any races show up there, save the trace and hunt for them. It only works probabilistically for almost all significant real-world code, so you have to keep running it periodically.
3. Accept that you'll have, for any decently sized go project, a chunk of mysterious data-races. The upstream go project has em, most of google's go code has em, you will to. Run your code under a process manager to restart it when it crashes. If your code runs on user's devices, gaslight your users into thinking their ram or processor might be faulty so you don't have to debug races.
4. Rewrite your code in rust, and get something better than the go race detector every time you compile.
The most important of those is 3. If you don't do anything else, do 3 (i.e. run your go code under systemd or k8s with 'restart=always').
Congrats, rustc forced you to wrap all your types in Arc<Mutex<_>>, and you no longer have data races. As a gift, you will get logical race conditions instead, that are even more difficult to detect, while being equally difficult to reproduce reliably in unit tests and patch.
Don’t get me wrong, Rust has done a ton for safety and pushed other languages to do better. I love probably 50% of Rust. But Rust doesn’t protect against logical races, lovelocks, deadlocks, and so on.
To write concurrent programs that have the same standards of testable, composable, expressive etc as we are expecting with sequential programs is really really difficult. Either we need new languages, frameworks or (best case) design- and architectural patterns that are easy to apply. As far as I’m concerned large scale general purpose concurrent software development is an unsolved problem.
But, also, Go has way worse semantics around various things, like mutexes, making it much more likely deadlocks happen. Like in go, you see all sorts of "mu.Lock(); f(); mu.Unlock()" type code, where if it's called inside an `http.Handler` and 'f' panics, the program's deadlocked forever. In go, panics are the expected way for an http middleware to abort the server ("panic(http.ErrAbortHandler)"). In rust, panics are expected to actually be fatal.
Rust's mutexes also gate "ownership" of the inner object, which make a lot of trivial deadlocks compiler errors, while go makes it absolutely trivial to forget a "mu.Unlock" in a specific codepath and call 'Lock' twice in a case rust's ownership rules would have caught.
In practice, for similarly sized codebases and similarly experienced engineers, I see only a tiny fraction of deadlocks in concurrent rust code when compared to concurrent go code, so like regardless that it's an "unsolved problem", it's clear that in reality, there's something that's at least sorta working.
I have heard positive things about the loom crate[1] for detecting races in general, but I have not used it much myself.
But in general I agree, writing correct (and readable) concurrent and/or parallel programs is hard. No language has "solved" the problem completely.
As for your statement that concurrency is generally hard - yes it is. But it is even harder with data races.
The Arc is only needed when you truly need to mutably share data.
Rust like Go has the full suite of different channels and what other patterns to share data.
Or you can just avoid shared mutable state, or use channels, or many of the other patterns for avoiding data races in Rust. The fun thing is that you can be sure that no matter what you do, as long as it's not unsafe, it will not cause a data race.
Also, don’t people know that a Mutex implies lower throughput depending on how long said Mutex is held?
Lock-free data structures/algorithms are attempt to address the drawbacks of Mutexes.
https://en.wikipedia.org/wiki/Lock_(computer_science)#Disadv...
I’m not sure how else you can explain perfectly idiomatic code (a loop, or a reused err variable, or a closure) causing a program to fall on its face simply by dropping in go’s namesake feature, whose entire purpose was supposed to be that you could simply drop it in.
To actually use `go` you have to do minor contortions like always remembering to copy your loop variables, make new error variables, and also not accidentally capture any external variables in a closure.
Go doesn’t actually help you with any of this, of course. You just have to remember to do it right every single time. None of these things are hard (usually), but the fact that you have to do them at all speaks volumes about the amount of forethought that went into it.
And of course doing all of those steps doesn’t save you if one of the things you tried to copy secretly contains a pointer inside of it, like absolutely everything in golang does. You didn’t know, and it wasn’t even a public member so it wasn’t in the docs. But there was a pointer somewhere deep inside the thing you copied so now you’ve got unguarded concurrent mutation of shared memory.
I disagree there. It is reasonable to run a few service instances with a race detector. I have a few services where _all_ instances are running with it just fine.
Not enough IMHO.
We run all tests on developer machines and CI with -race. Always.
It's probabilistic, so every developer 'make test' and every 'git push' is coverage.
I can only hope Go and Rust continue to improve until the next language generation comes along to surpass them. I honestly can't wait, things improved so much already.
if id == 1 {
counter++;
}
Found your problem. /sIn all honesty, if you “do work” using channels then all your goroutines are “thread safe” as the channel keeps things in order. Also, mutex is working as intended. As you see in your post, -race sees this, it’s good. Now have one goroutine read from a chan, get rid of the mutex, all other goroutines write to the chan, perfection.
Structured Concurrency is the same idea, but for concurrency. Instead of that code to create an appropriate number of threads, parcel out work, and so on, you just express high level goals like "Do these N pieces of work in any order" or "Do A and B, and once either is finished also do C and D" and just as the language handles the actual machine code jumps for your control flow, that would happen for concurrency too.
Nothing as close to the metal as Rust has that baked in today, but it is beginning to be a thing in languages like Swift and you can find libraries which take this approach.
† C's goto is de-fanged from the full blown go-to arbitrary jump in early languages, but it's still not structured control flow.
Rust's futures/streams are basically what you're asking for. You need a crate rather than just the bare language but I don't think that's a particularly important distinction.
You should have a look at what's going on in Scala-land, with scala-native¹ (and perhaps the Gears² library for direct style/capabilities)
I like this style, though it's been too new and niche to get a taste of it being used at scale.
¹: https://scala-native.org/ ²: https://github.com/lampepfl/gears
Is there a similar proof for structured concurrency - that it can express anything that unstructured concurrency can?
But go-routines!
Well, on .NET land we would be using Task Processing Library, or Dataflow built on top of it, with tasks being switched over the various carrier threads.
Or if feeling fancy, reach out to async workflows on F# with computation expressions, even before async/await came to be.
While on the Java side, we would be using java.util.concurrent, with future computations, having fun with Groovy GPars, or Scala frameworks like Akka.
In both platforms, we could even go the extra mile and write our own scheduling algorithms, how those lightweight threads would be mapped into carrier threads.
Naturally not having some of the boilerplate to handle all of that, or using multiple languages on the same project, makes it easier, hence why now we have all those goodies, with async/await or virtual threads on top.
Yes, shared memory is useful sometimes, but I don't think it should be the norm. But I've done parallel stuff in lots of languages, most recently Erlang and Rust... Message passing is so much nicer than having threads all mucking about in the same data if you don't need them to. You can write message passing parallel code in Rust, but it's not the norm, and you'll have to do a lot of the plumbing.
How might a language optimized for AI look different than a language optimized for humans?
Comparing writing a web service in Go and rust you would likely also utilize Tokio which has a wide variety of well designed sync primitives.
This is independent of using a mutex, a lock free algorithm or message passing (because at the end of the day a queue is a memory location).
The performance of high-contention code is a really tricky to reason about and depends on a lot of factors. Just replacing a mutex with a lock-free data structure will not magically speed up your code. Eliminating the contention completely is typically much better in general.
I don't see `mu.Lock(); f(); mu.Unlock()` anywhere really.
`mu.Lock(); defer mu.Unlock(); f();` is how everyone does it to prevent that possibility.
1. https://github.com/golang/tools/blob/f7d99c1a286d6ec8bd4516a...
2. https://github.com/golang/sync/blob/7fad2c9213e0821bd78435a9...
There are dozens and dozens throughout the stdlib and other popular go code.
The singleflight case is quite common, if you have:
mu.Lock()
if something() {
mu.Unlock()
moreWorkThatDoesntWantMutex()
return
}
mu.Unlock()
most gophers use manual lock/unlocks to be able to unlock early in an 'if' before a 'return', and that comes up often enough that it really does happen.I see manual lock/unlock all the time, and semi-regularly run into deadlocks caused by it. Maybe you don't use any third-party open source libraries, in which case, good for you congrats.
Now you either refactor into multiple functions, while ensuring all copies of possibly shared data when passing function arguments are correctly guarded or ”manually” unlock when you don’t need the mutex access anymore.
I personally haven’t worked with any of these perfect programmers, but I’m glad that you do.
At a minimum now you’ve also got to scan for nil dereferences. Maybe that simple code between the mutex is pulled out into a function in a later commit. Then the function body is changed in the next commit to do something less simple.
This is all depressingly reminiscent of listening to die-hard C enthusiasts insist that avoiding use-after-free bugs and out of bounds reads and other issues are easy as long as you write code perfectly all the time. It’s easy as long as you just always do the right thing.
It’s okay to admit your language of choice has weaknesses. You don’t have to stop using go just because you can acknowledge that data races can creep in if you’re not diligent enough.
The examples pointed to in the stdlib to complain about that usage pattern could have easily done just that -- but they were optimized to shrink the window of the locking time. My take is that the code paths were simple enough to verify that it was safe to do, and being in the stdlib had both. enough eyes to validate and a driver (being foundational code).
I recognize that Go is far from perfect but think some of the pearl clutching is manufactured (OMG! nil pointers! -- they technically can happen but in my many years of coding in Go it hasn't been a problem of note.)
And of course we should recognize that the locking issue being brought up was a simple one and the story gets worse when talking about complicated concurrency. Then the Go story of "just use the race detector!" is tempered by the fact that it is only best effort and no guarantees that it's found everything (like a bloom filter of certainty).
So year, Go has footguns and other inadequacies (FFI is a bummer). But it's good enough for a lot of things. Rust is on my list to learn, so my only complaint about it is the steep learning curve and I'm an old lazy dev.
In rust you would just throw a block around the mutex access changing the scoping and ensuring it is dropped before the slow function is called.
Call it a minimally intrusive manual unlock.
drop(foo); // Now foo doesn't exist, it was dropped, thus unlocking anything which was kept locked while foo exists
If you feel that the name drop isn't helpful you can write your own function which consumes the guard, it needn't actually "do" anything with it - the whole point is that we moved the guard into this function, so, if the function doesn't return it or store it somewhere it's gone. This is why Destructive Move is the correct semantic and C++ "move" was a mistake.Specifically for Go, I'd try to address the problem in CSP style, so as to avoid explicit locks unless absolutely necessary.
Now for the case you mention, one can actually achieve the same in Go, it just takes a bit of prior work to set up the infra.
type Foo struct {sync.Mutex; s string}
func doLocked[T sync.Locker](data T, fn func(data T)) {
data.Lock(); defer data.Unlock(); fn(data)
}
func main() {
foo := &Foo{s: "Hello"}
doLocked(foo, func(foo *Foo) {
/* ... */
})
/* do the slow stuff */
}It’s absolutely a go-specific problem from defer being function scoped. Which could be ignored if Unlock was idempotent but it’s not.
This alleviates all these problems of unlocks within if bodies at the cost of an indent (and maybe slight performance penalty).