Data Race Patterns in Go

307 points by statenjason 4 years ago | 186 comments

Sorry to say, but these hit close to home for me. A lot of the synchronization paradigms in Go are easy to misuse, but lead the author into thinking it's okay. the WaitGroup one is particularly poignant for me, since the race detector doesn't catch it.

I'll add one other data race goof: atomic.Value. Look at the implementation. Unlike pretty much every other language I've seen, atomic.Value isn't really atomic, since the concrete type can't ever change after being set. This stems from fact that interfaces are two words rather than one, and they can't be (hardware) atomically set. To fix it, Go just documents "hey, don't do that", and then panics if you do.

Groxx 4 years ago | |

The lack of generics has forced all Go concurrency to be intrusive (i.e. implemented by the person using literally any concurrency), and yeah. It's horrifyingly error-prone in my experience. It means everyone needs to be an expert, and lol, everyone is not an expert.

Generics might save us from the simple, mechanical flaws. Expect to see `Locker<T>` and `Atomic<T>` types cropping up. And unbounded buffered thread-safe queues backing channels. Etc. I'm very, very much looking forward to it.

--- edited to rant more ---

I also really wonder where all these "go makes concurrency a first-class concept" claims come from, because I see it quite a few places, and I feel like it's making some very strong implied claims that absolutely do not exist.

Go has channels and select. That's neat. But on the other hand it has threads... but no thread handles. It has implicit capturing of closures. It has ambiguous value vs pointer semantics. It (style- and ergonomic-wise) encourages field references, which have no way to enforce mutexes or atomics. It has had crippled lock APIs that effectively force use of channels for... I don't know, philosophical reasons?

Go is abnormally dangerous when it comes to concurrency IMO. The race detector does an amazing job helping you discover it, but it's very easy to not use it or not take full advantage of it (i.e. non-parallel tests), and few run their production services with the race detector enabled. Because if they did, it would crash all the time, because there are an absurd amount of races in nearly all of the popular libraries (and in common use of those libraries, because concurrency is not a first-class citizen and you can't tell when it's happening / when it shouldn't happen).

josefx 4 years ago | | |

> I also really wonder where all these "go makes concurrency a first-class concept" claims come from,

Given that some of the main architects behind Go had K&R C as background I wouldn't be surprised if "first-class" just meant that the language defines both a memory model and primitives for threading. C had neither until it basically adopted both from C++11.

hexxagone 4 years ago | | |

Go does not have threads but something like "tasks". The fact that no thread handle is exposed allows for transparently moving these tasks across threads if the scheduler decides so.

"go makes concurrency a first-class concept" I think it usually refers to goroutines being built in the language.

"Go is abnormally dangerous when it comes to concurrency IMO". Personnally, it has not been my experience with Go concurrency. However I have hit some issues when trying to ocrhestrate tasks via channels and ended up resorting to atomics to do the job.

Thaxll 4 years ago | | |

"because there are an absurd amount of races in nearly all of the popular libraries"

This is fud, I ran the race detector with a lot of popular lib and I never found issues like that.

But since you're claiming there are issues everywhere, do you have examples?

nindalf 4 years ago | |

On atomics in Go, the beta for Go 1.19 was released an hour ago (https://groups.google.com/g/golang-announce/c/SNruPJUSFz0?pl...).

> The sync/atomic package defines new atomic types Bool, Int32, Int64, Uint32, Uint64, Uintptr, and Pointer. These types hide the underlying values so that all accesses are forced to use the atomic APIs. Pointer also avoids the need to convert to unsafe.Pointer at call sites. Int64 and Uint64 are automatically aligned to 64-bit boundaries in structs and allocated data, even on 32-bit systems.

Go 1.19 is expected to release in August.

morelisp 4 years ago | |

> atomic.Value isn't really atomic, since the concrete type can't ever change after being set.

How does this mean it's non-atomic? As far as I know you can still never Load() a partial Store(). (Also, even if it was possible, this would never be a good idea...)

bumper_crop 4 years ago | | |

That's why I opened with "Look at the implementation". Go is unable to store the type and the pointer at the same time, so it warps what "atomic" means. Pretty much every other language has atomic mean "one of these will win, one will lose". Go says "one will win, one will panic and destroy the goroutine.

In fact, it's even worse than that. If the Store() caller goes to sleep between setting the type and storing the pointer, it causes every Goroutine that calls Load() to block. They can't make forward progress if the store caller hangs.

avgcorrection 4 years ago | |

This is Atomic*

* Just don’t be an idiot. Worse is better.

smasher164 4 years ago |

This definitely matches my experience using Go at my previous organization.

1. Closures and concurrency really don't mix well. The loop variable capture in particular is very pernicious. There's an open issue to change this behavior in the language: https://github.com/golang/go/issues/20733.

2. Yep. I've seen this problem in our codebase. I've grown to just be very deliberate with data that needs to be shared. Put it all in a struct that's passed around by its pointer.

3. This issue is caught fairly easily by the race detector. Using a sync.Map or a lock around a map is pretty easy to communicate with other Go devs.

4. This should be documented better, but the convention around structs that should not be passed around by value is to embed a noCopy field inside. https://github.com/golang/go/issues/8005#issuecomment-190753... This will get caught by go vet, since it'll treat it like a Locker.

5 & 6. Go makes it pretty easy to do ad-hoc concurrency as you see fit. This makes it possible for people to just create channels, waitgroups, and goroutines willy-nilly. It's really important to design upfront how you're gonna do an operation concurrently, especially because there aren't many guardrails. I'd suggest that many newcomers stick with x/sync.ErrGroup (which forces you to use its Go method, and can now set a cap on the # of goroutines), and use a *sync.Mutex inside a struct in 99% of cases.

7. Didn't encounter this that often, but sharing a bunch of state between (sub)tests should already be a red flag. Either there's something global that you initialized at the very beginning (like opening a connection), or that state should be scoped and passed down to that individual test, so it can't really infect everything around it.

jchw 4 years ago |

This is pretty cool. 50 million lines of code is quite a large corpus to work off of.

I'm surprised by some of them. For example, go vet nominally catches misuses of mutexes, so it's surprising that even a few of those slipped through. I wonder if those situations are a bit more complicated than the example.

Obviously, the ideal outcome is that static analysis can help eliminate as many issues as possible, by restricting the language, discouraging bad patterns, or giving the programmer more tools to detect bugs. gVisor, for example, has a really interesting tool called checklocks:

https://github.com/google/gvisor/tree/master/tools/checklock...

While it definitely has some caveats, ideas like these should help Go programs achieve a greater degree of robustness. Obviously, this class of error would be effectively prevented by borrow checking, but I suppose if you want programming language tradeoffs more tilted towards robustness, Rust already has a lot of that covered.

likeabbas 4 years ago | |

> I suppose if you want programming language tradeoffs more tilted towards robustness, Rust already has a lot of that covered

Does anyone not want robustness of their language to cover their mistakes?

setr 4 years ago | | |

For free? Of course not.

At cost? … depends on what you’re charging me, and how much I’m getting

ohazi 4 years ago |

A dig against Rust I sometimes hear is "Oh, data race freedom isn't such a big deal, if you really need it, a garbage collected language like Java will give you that guarantee."

So now I'm hearing that Go, a garbage collected language, doesn't guarantee data race freedom? I guess it's garbage collected but not "managed" by a runtime or something?

Why go to all that effort to get off of C++ just to stop 30% short? These are C-like concurrency bugs, and you still have to use C-like "if not nil" error handling.

Why do people keep adopting this language? Where's the appeal?

hintymad 4 years ago |

> and contains approximately 2,100 unique Go services (and growing).

A side topic: this is really not something to be proud of. There used to be more people than quantity of work in Uber and engineers fought for credits by building bogus decomposed services, and the sheer number of services seems indicate it's still so.

jeremy_wiebe 4 years ago | |

That number really store out to me too. I’d be very curious how they decide what becomes a separate service.

I’d also be curious if the 50m lines of code included generated code.

shp0ngle 4 years ago | | |

I did not work in Uber, but a similar company, and... it's a very political thing, usually.

Taking an existing service and making it into 2 new microservices is a "thing you did". Suddenly, you have "impact" and can claim the new service as "yours". Everyone wants to be a king of their little kingdom.

jordanbeiber 4 years ago | |

If they still use cadence/temporal[0] extensively this kind of blurs the concept of technical ”services”.

We’ve started to use it (temporal) a bit for general automations, and it’s pretty great. Monorepo with a lot of different activities (“microservices”) makes sense.

The activites are orchestrated in workflows (much like DDD “sagas”) and scheduled via temporal. This gives awesome introspection and observability.

[0] https://temporal.io/

DominoTree 4 years ago |

Seems like Rob Pike and co may have failed

"The key point here is our programmers... They’re not capable of understanding a brilliant language... So, the language that we give them has to be easy for them to understand"

laerus 4 years ago | |

Yep, if Google wasn't behind Go the language would have already been history like so many other half baked technologies. It won't be long untill people will talk about Golang like they do about JavaScript.

icedchai 4 years ago | | |

They said the same thing about JavaScript in the 90's. If Netscape wasn't behind it... Over 25 years later, it's still going strong, even though the entire HTML/CSS/JS model of "app development" is and always was half baked.

crabbygrabby 4 years ago | | |

JavaScript is nutty but let's be honest, js is one of the most widely used programming languages. For better and for worse.

I don't think go lang will die out because it does get some things right. Unfortunately, there's still a bit of things going wrong.

philosopher1234 4 years ago | |

What’s the relevance of the quote? It makes you feel insulted?

eru 4 years ago | | |

The relevance is that the submitted article shows that you need to be rather smart to avoid Go's pitfalls.

aaronbwebber 4 years ago |

At least some of these would be caught by running your tests with race detection on? I haven't read the whole article yet but as soon as I read the loop variable one I was pretty sure I have written code with that exact bug and had it caught by tests...

https://go.dev/doc/articles/race_detector

Edit: at the _end_ of the post, they mention that this is the second of two blog posts talking about this, and in the first post they explain that they caught these by deploying the default race detector and why they haven't been running it as part of CI (tl;dr it's slower and more resource-expensive and they had a large backlog).

https://eng.uber.com/dynamic-data-race-detection-in-go-code/

Klasiaster 4 years ago |

My favorite example is the IP address type which is an alias for a slice of bytes (type IP []byte). Thus, it gets passed by reference instead of by value and you easily end up working on the same data even if you didn't plan to. This will just be a logical bug but there are data structures in Go which result in memory corruption and introduce the risk of (remote) code execution vulnerabilities.

dwrodri 4 years ago |

Is it just me, or is Golang's concurrency a very double edged sword? My exposure to goroutines and channels was mind-blowing, but I still really struggle reading through Go code and understanding what's going on in memory. The way that Go "abstracts away" ownership feels more like it's hiding important details rather than unnecessary minutia.

Here's a simple question that's stumped me for some time: if multiple go routines are popping values out of a channel, does the channel need a mutex? Why do the "fan-out, fan-in?" examples in the "Pipelines and Cancellation" post on the Go blog not require mutex locks? Link here: https://go.dev/blog/pipelines

Stuff like that, along with the ambiguity of intializing stuff by value vs using make, the memory semantics of some of the primitives (slices, channels, etc). None of it was like "of course". If something is a reference, I'd rather the language tell me it's a reference. Maybe I'm still too new to the language.

bombela 4 years ago | |

I am pretty sure the channels are safe to use with multiple producers and multiple consumers (mpmc). But somehow I cannot easily find any official doc clarifying that.

Go doesn't do anything to help you with memory safety around concurrency. And the design of the language is also not helping you avoid logical bugs.

After using Rust, all other imperative languages feel like using an angle grinder with a wood saw blade and no guard. Sure you can do really good if you are careful. But thing will go sideways remarkably quickly. And with the constant urgency of shipping for yesterday. It makes sense most programs look like the aftermath of The Boys show.

Groxx 4 years ago | | |

They are MPMC except for closing (which panics if anything writes after close), yes.

eurasiantiger 4 years ago |

”Our Go monorepo consists of about 50 million lines of code (and growing) and contains approximately 2,100 unique Go services (and growing).”

What the hell is that company doing?

Try to imagine an ERD or DFD of their day-to-day operations. 2,100 unique services…

dubswithus 4 years ago |

> We developed a system to detect data races at Uber using a dynamic data race detection technique. This system, over a period of six months, detected about 2,000 data races in our Go code base, of which our developers already fixed ~1,100 data races.

This isn't open source, correct?

aaronbwebber 4 years ago | |

Yes it is, it's part of the standard go toolchain as described in the first blog post in the series: https://eng.uber.com/dynamic-data-race-detection-in-go-code/

freyr 4 years ago |

> Uber has adopted Golang (Go for short)

Uber has adopted Go (Golang for long)

travisd 4 years ago |

Worth noting that some of these can be detected statically -- and some are detected by go vet (e.g., passing a sync.Mutex by value). I don't think it detects the wg.Add bug, but that seems relatively straightforward(†) to add a check for.

(†famous last words, I know)

kjksf 4 years ago | |

staticcheck has a check for wg.Add misuse (https://staticcheck.io/docs/checks, https://staticcheck.io/docs/checks#SA2000)

jasonhansel 4 years ago |

I think the root cause of a lot of these data races is that Go has no way of marking variables/fields/pointers/etc. as immutable or constant. This makes it easy to lose track of shared mutable state.

It's not just data races--it's also logical races, which are near-impossible to detect or prevent without something like transactional memory.

eru 4 years ago | |

Yes, missing immutability is a big part of the problem in Go.

Given that Go eschewed generics for the longest time, I can sort of see why they left out immutability markers:

To keep your sanity, you'd want some functions to take (and return!) both mutable and immutable data, as the situation requires. But some other functions should only take mutable data or only immutable data.

Thus ideally you'd need some kind of 'generic' (im-)mutability handling in your type system.

(Rust's borrow checker is basically one way to really deal with this (im-)mutability genericity.

For example, a function to do binary search on a sorted array doesn't change the array; thus it could take either a mutable or immutable version. But if the array changes (via another thread) while the function is running, then you might get into trouble.)

baalimago 4 years ago |

Passing sync.Mutex and sync.WaitGroup as value is especially irritating considering that the context.Context is meant to be passed as such

sharno 4 years ago |

Go picked the concurrency ideas of Erlang but then ignored the main safeguard that makes Erlang's concurrency fearless: Immutability.

And if you feel that Erlang's lack of type safety is an issue, then Gleam has you covered.

masklinn 4 years ago | |

> Go picked the concurrency ideas of Erlang but then ignored the main safeguard that makes Erlang's concurrency fearless: Immutability.

Not even immutability, isolation.

Though obviously immutability makes things less weird, the real gain in terms of concurrency is that you can’t touch any data other than your own (process’s), and for the most part erlang doesn’t “cheat” either: aside from binaries, terms are actually copied over when sent, each process having its own heap.

Sequential erlang could be a procedural language based around mutability and it wouldn’t much alter its reliability guarantees (assuming binaries remain immutable, or become COW).

eru 4 years ago | |

Erlang was fun to use!

But even with Erlang, concurrency is hard. Any single process's data is immutable, but if you split a process in twain, the resulting union can behave as if it had mutable state.

And let's not forget about ETS (term storage), which is basically a mutable hash table that you often have to use to get anything done.

In any case, I agree that Go did _not_ improve on Erlang.

thallavajhula 4 years ago |

>We developed a system to detect data races at Uber using a dynamic data race detection technique.

By system do you mean a process or a tool that detects these?

ryanschneider 4 years ago |

What’s up with the totally broken syntax highlighting in this post, at least on iOS? 2100 micro services and not one of them is a valid syntax highlighter for blog posts.

Edit: oh I see it highlights red and underlines every keyword. I find that incredibly distracting, so much so I assumed their highlighter was broken, but also just realized they are screenshots.

fmakunbound 4 years ago |

> contains approximately 2,100 unique Go services (and growing)

How is that possible and what do they do???

Tomis02 4 years ago | |

Sadly, people get rewarded for what they built (even if makes everyone's lives harder in the long run) than for exercising restraint and saying "no".

erik_seaberg 4 years ago | | |

There’s a perception that big tech pays longtime employees less than they’re willing to offer new candidates, making promotion or job hopping the best ways to earn the current market rate. And if you copy Google’s promo process, you get promo-driven development, because there aren’t enough projects that actually need that level of complexity.

JamesSwift 4 years ago |

Really good article, and gave me several ideas to track down some gremlins that have been bugging me in a go codebase recently.

rapiz 4 years ago |

Don't take me seriously: I heard people both saying Golang is a better C/Golang is a worse C

gigatexal 4 years ago |

In the closure example does declaring a new variable and setting its value to the iterative or the thing being passed in, does that mitigate the pass by reference issue?

yencabulator 4 years ago | |

Yes.

    for i := 0; i < 10; i++ {
        i := i
        ...
    }

https://go.dev/play/p/P7TunJCL7RS

gigatexal 4 years ago | | |

Thank you!

The example is brilliant. Thanks.

fellellor 4 years ago |

https://youtu.be/-bCkha6U70o?t=787

oconnor663 4 years ago |

Is there a mistake in Figure 2? It looks like myResults is captured, but the capture is never used?

smw1218 4 years ago |

I wrote a deep analysis/reaction to this post:

https://medium.com/@scott_white/concurrency-in-go-is-not-mag...

tl;dr Go doesn't magically solve data races and blaming the language itself isn't well supported by the examples/data.

stevefan1999 4 years ago |

can you guys put it up as a SonarQube rules?

metadat 4 years ago |

> 2. Slices are confusing types that create subtle and hard-to-diagnose data races

The "Slices" example is just nasty! Like, this is just damning for Go's promise of "_relatively_ easy and carefree concurrency".

Think about it for a second or two,

>> The reference to the slice was resized in the middle of an append operation from another async routine.

What exactly happens in these cases? How can I trust myself, as a fallible human being, to reason about such cases when I'm trying to efficiently roll up a list of results. :-/

Compared to every other remotely mainstream language, perhaps even C++, these are extremely subtle and sharp.. nigh, razor sharp edges. Yuck.

One big takeaway is this harsh realization: Golang guarantees are scant more than what is offered by pure, relatively naïve and unadulterated BASH shell programming. I still will use it, but with newfound fear.

As a multi-hundred-kloc-authoring-gopher: I love Go, and this article is killing me inside. Go appears extremely sloppy at the edges of the envelope and language boundaries, moreso than even I had ever realized prior to now.

Full-disclosure: I am disgusted by the company that is Uber, but I'm grateful to the talented folks who've cast a light on this cesspool region of Golang. Thank you!

p.s. inane aside: I never would've guessed that in 2022, Java would start looking more and more appealing in new ways. Until now I've been more or less "all-in" on Go for years.

hintymad 4 years ago | |

> I never would've guessed that in 2022, Java would start looking more and more appealing in new ways.

I don't quite understand the hatred (to the point of shouting "using Java? Over my dead body), especially in startups, towards Java. I mean, it's a language, big deal. Java's ecosystem more than enough offsets whatever inefficiencies in the language itself, at least for building many of the internal CRUD services. Besides, people like Martin Thompson shows us how to build low-latency applications with ease too. Libraries like JCTools beat the shit out of many new languages when it comes to concurrency for productivity, performance, and reliability. How many engineers in startups claim that they hate Elasticsearch because "Java sucks"? Yet how many can really build a platform as versatile as ES or a Lucene replacement with economical advantages? How many people in startups openly despise Spark or Flink and set out to build a replace because "Java is slow and ugly". Yeah, I've seen a few. And a payment company insists that Rust is the best language because "GC is inefficient and ugly", even though they are still in the phase of product iteration and all their services simply wrap around payment gateways? What's the point?

Disclaimer: I use Go in work. It's not like I have skin in the game for speaking about Java.

jasonhansel 4 years ago | | |

I actually think the Java ecosystem is part of why people dislike it. Java seems to attract a lot of extremely heavyweight frameworks (like Spring) that are too complex to fully understand and too heavyweight to make sense for most projects.

pkolaczk 4 years ago | | |

> how to build low-latency applications with ease too

That's a bit of a stretch. Surely, you can build low-latency apps, but I'd be very careful with the "with ease" bit. Low-latency Java often means zero heap allocations, aggressive object avoidance / reuse, heavy use of primitive types everywhere, so it is very much low-level like C, only with no tools that even plain old C offers, e.g. no true stack-allocated structs and no pointers. And forget about all the high-level zero-cost abstractions that C++ and Rust offer.

christophilus 4 years ago | | |

For me, Java and MySQL kind of died* when they became an Oracle thing. I just don’t want to go near anything that Oracle touches.

The other thing is that I tend to write little programs where simple deployment on a low-resource machine is desirable.

Go can handle that. Java kind of does the job with Graal now.

The JVM is incredible, though, and I love Clojure. I’m hoping that Loom + Graal helps to kickstart more competition in the “concurrent, parallel, simple to deploy” space.

* Died to me; obviously they’re both alive and well in the broad world.

tomcam 4 years ago | | |

I like how this was downvoted with no comments. Stay classy, HN

voidfunc 4 years ago | | |

Written a lot of Java, Python, and Go in my career... every single time I see someone take a hardline stance against Java it's always because they had one particular bad experience with it 15-20 years ago and couldn't bend it to their will like Python or Lisp. Or they fought Maven, or some other ancillary tool. Or they rail on the generics and yet the use-cases they come up with for true reified generics are generally niche.

Java's got problems. The biggest one is the framework laden ecosystem and that some of the frameworks are all or nothing. But the language and runtime are rock solid. I don't get the hate.

jchw 4 years ago | |

Slices may just be one of the best and worst parts of Go. They're cumbersome, their behavior sometimes feels 'inexplicable,' and even as an experienced developer you are likely to eventually fallen into one of the traps where your 'obvious' code isn't so obvious.

That said... when programming in programming languages without a slice type, I always want to have one. And though it's confusing at times, the design does actually make sense; without a doubt, it's hard to think of how you would improve on the actual underlying design.

I really wish that Go's container types were persistent immutable or some-such. It wouldn't solve everything, but it feels to me like if they could've managed to do that, it would've been a lot easier to reason about.

masklinn 4 years ago | | |

> And though it's confusing at times, the design does actually make sense; without a doubt, it's hard to think of how you would improve on the actual underlying design.

Go slices are absolutely the worst type in Go, because out of laziness they serve as both slices and vectors rather than have a separate, independent, and opaque vector types.

This schizophrenia is the source of most if not all their traps and issues.

> I really wish that Go's container types were persistent immutable or some-such.

That would go against everything Go holds dear, since it's allergic to immutability and provides no support whatsoever for it (aside from simple information hiding).

oconnor663 4 years ago | | |

> it's hard to think of how you would improve on the actual underlying design.

I'm biased as a Rust fan in general, but I think Rust pretty much nails this. Rust distinguishes between a borrowing view of variable length (a slice, spelled &[T]) and an owned allocation of variable length (usually a Vec<T>). Go uses the same type for both, which makes the language smaller, but it leads to confusion about who's pointing to what when a slice is resized.

knorker 4 years ago | | |

Indeed. Probably the most common data structure ever used, list of stuff, Go managed to make subtle and full of surprises. A knife without a handle.

This makes the stated reason for the delay of generics hard to understand. They didn't wait to get list/vector/array/slice right.

jasonwatkinspdx 4 years ago | | |

> it's hard to think of how you would improve on the actual underlying design.

I think ranges are part of D's design they got right, and I think a similar abstraction would be in line with golang's general design ethos, GC design, etc, other than perhaps some folks might pattern match it as "this is like STL therefor bad burn it with fire etc" without actually thinking about it in detail.

kosherhurricane 4 years ago | |

>>> The reference to the slice was resized in the middle of an append operation from another async routine.

> What exactly happens in these cases?

Go's append looks like this:

mySlice = append(mySlice, newItem)

To me, this makes it very clear that 1) mySlice pointer can now point to someplace entirely different in memory, and 2) there maybe new allocation.

I write both Java and Go. For personal projects, I always choose go.

Groxx 4 years ago | | |

The append pattern also implies the opposite of reality, in that it also (usually!) mutates mySlice. Which is the source of one of the two(?) possible races in that piece of code.

throwaway894345 4 years ago | |

> What exactly happens in these cases? How can I trust myself, as a fallible human being, to reason about such cases when I'm trying to efficiently roll up a list of results. :-/

For me: minimize shared mutable data. If I really can’t get rid of some shared mutable data, I mutex it or use atomics or similar. This works very well—I almost never run into data races this way, but it is a discipline rather than a technical control, so you might have to deal with coworkers who lack this particular discipline.

metadat 4 years ago | | |

Absolutely, the disappointing part is that as code authors, we need to constantly remember about various (otherwise appealing and even encouraged by the language syntax and control constructs) footguns and "never approach such areas" of (totally valid) syntax.

Reminds me of programming in Javascript (it's extreme example, but the similarity is there).

morelisp 4 years ago | |

Without disagreeing that it's an enormous footgun, one good way to avoid such slice issues is to use the uncommon `a[x:y:z]` form to ensure the slice can't grow. As we're starting to write a lot of generic slice functions with 1.18, we're using this form in almost all of them which may add elements.

masklinn 4 years ago | | |

> one good way to avoid such slice issues is to use the uncommon `a[x:y:z]` form to ensure the slice can't grow.

Do you mean you always use `a[x:y:y]` in order to ensure there is no extra capacity and any append will have to copy the slice?

Is append guaranteed to create a new slice (and copy over the data) if the parameter is at capacity? Because if it could realloc internally then I don't think this trick is safe.

masklinn 4 years ago | |

> The "Slices" example is just nasty!

I've got to say I'm not entirely clear on what they talk about specifically.

Is it simply that the `results` inside the goroutine will be desync'd from `myResults` (and so the call to myAppend will interact oddly with additional manipulations of results), or is it that the copy can be made mid-update, and `result` itself could be incoherent?

Philip-J-Fry 4 years ago | | |

So a slice consists of a pointer to a backing array, a length and a capacity. If you don't use a pointer and pass this slice around you will copy it.

This is problematic because even though you copy it, you're still pointing at the same backing array.

Therefore, a backing array with data like [1,2,3,4,5] could be pointed at by 2 slice headers (slice metadata) looking like

A: {len: 2, cap: 10} [1,2] B: {len:5, cap: 10} [1,2,3,4,5]

So any append operations on slice A will mess up the data in that backing array.

Now, sometimes your append will resize the slice, in which case the data is copied and a slice with a new larger backing array is returned. If this was happening concurrently then you'd lose the data in racing appends.

If the append doesn't need to resize the slice, then you'll overwrite the data in the backing array. And so you'll corrupt the data in the slice.

Here's an example I threw together: https://go.dev/play/p/qRUKUwIf3vx

Although the code in the post doesn't actually look like it has an issue. Their tooling just flagged it up as it potentially has an issue if the copy was actually used in the function. But the `safeAppend` function targets the correct slice each time.

aaronbee 4 years ago | | |

I believe they made a mistake with that example. It doesn't look unsafe to me because the myResults sliced passed to the goroutine is not used. Or perhaps the racy part was left out of their snippet.

Below is what might be what they have meant. This code snippet is racy because an unsafe read of myResults is done to pass it to the goroutine and then that version of myResults is passed to safeAppend:

  func ProcessAll(uuids []string) {
    var myResults []string
    var mutex sync.Mutex
    safeAppend := func(results []string, res string) {
      mutex.Lock()
      myResults = append(myResults, res)
      mutex.Unlock()
    }

    for _, uuid := range uuids {
      go func(id string, results []string) {
        res := Foo(id)
        safeAppend(myResults, id)
      }(uuid, myResults) # <<< unsafe read of myResults
    } 
  }

EDIT: Formatting and clarity

symfoniq 4 years ago | | |

I have the same question.

They talk about the "meta fields" of a slice. Is the problem that these "meta fields" (e.g. slice length and capacity) are passed by value, and that by copying them, they can get out of sync between coroutines?