GC Fun at Twitch

64 points by _raz 6 years ago | 43 comments

Allocating 10GB per the original twitch blog, to tune GC specifics, does feel like a bit of a hack around missing API knobs - but it's elegant enough. The fact that it relies on the specifics of the underlying GC, when trying to tune the specifics of GC behavior, isn't much of a drawback, so much as it's just sane performance tuning.

This proposed alternative of just toggling the GC on/off outright in a sleeping loop feels like a pretty big sledgehammer - and just as much of a hack. The 500ms sleeps are enough to see 5 GC cycles, going off of the original twitch blog's 10 GCs/second numbers, which would also concern me - as a potentially unwanted latency spike. I'm also curious what happens when the GC is toggled back off mid-GC. It's more code, and feels brittle. That ReadMemStats sync point may be worse than the GC spam in the first place!

ijcd 6 years ago |

We had internal discussions around the "hacky" nature of the solution. Both sides had proponents. The proof was in the numbers and some teams utilized it, others did not. In the end it was a few-line solution that solved the problem neatly and didn't rely on a (terrifying) dynamic solution such as is proposed in this blog post. It was expected the "hack" would be temporary as we expected the Go GC to quickly improve to the point it was not necessary.

panpanna 6 years ago | |

Since you seemed to have analyzed this carefully, why couldn't object pools be used to reduce collectable garbage in the first place?

ijcd 6 years ago | | |

That was done too, of course. There’s so much to get done in these big systems that it’s often most efficient to take the quick win and move on, especially when, as I mentioned, the world is expected to fix the problem for you for free.

dilyevsky 6 years ago | |

Ikr? “ballast is hacky! let me just build my own gc real quick”. Fwiw i think go 1.14 will have the required knob in the runtime package.

dwohnitmok 6 years ago |

It's interesting to see how other GCs try to handle this. In particular the JVM's (awesome) new Shenandoah low-latency GC has a ShenandoahAllocSpikeFactor option precisely to deal with this kind of situation, where you can specify what percentage of the heap you're willing to sacrifice in a spike in allocation rates before the GC starts running wild trying to collect garbage.

The trade-off of the knob-less approach of Go's GC I suppose.

apta 6 years ago | |

The JVM offers state of the art GCs, which allows for selecting the best tool for the job (throughput, latency, large heaps, etc.).

This is unlike the golang gc, which is tuned for latency at the expense of throughput, with no way of modifying its behavior without resorting to hacks like the article in the post.

dwohnitmok 6 years ago | | |

To Go's credit, it predates Shenandoah and ZGC, before which your only real option for low-latency GC on the JVM was Zing, which I don't think had too many people using it (I certainly have no personal experience with it). I can't say whether they were inspired by Go, but I do think that Go is responsible for bringing the desirability of low-latency GC, even at potentially high cost to throughput, to the forefront of the greater programming community's attention.

tus88 6 years ago | | |

Except when the best tool for the job is a language without GC at all.

Thaxll 6 years ago | | |

Yes you can modify GC behaviour and it's one env variable, the fact that twitch didn't use it makes no sense.

_bxg1 6 years ago |

This is not my area of expertise, but from a software engineering perspective, the proposal "Replace a constant in a configuration file with a new piece of procedural code" smells like a huge new liability when it comes to maintenance. Of course it could be truly necessary, but the author made it sound like the "ballast" method was working just fine and simply felt hacky. Personally, I'd rather document and maintain a single value change that's "hacky" than 22 extra lines of turing-complete code.

suresk 6 years ago | |

I think I can see both sides of this argument - the "ballast" method is hacky not just because of it being a sort of magic thing that might be tricky to remember later, but it is relying on undocumented behavior that is not part of the contract Go provides and could randomly break later.

The method presented in the article does seem better in that it is using well-known and documented parts of Go's runtime api, but I think it might be problematic for other reasons. Fiddling with GC behavior is always a little risky because it works fine until you hit some weird corner case and it blows up.

For example - What happens if that goroutine doesn't run for longer than you expect and you leave GC turned off while another goroutine is creating a ton of garbage? Might never be a problem, but it depends on allocation behavior and how much headroom you have.

So it feels more correct, but also seems like it requires a lot more tuning and testing to feel confident about it.

_bxg1 6 years ago | | |

> it is relying on undocumented behavior that is not part of the contract Go provides and could randomly break later

Sort of. A change in the undocumented behavior might cause you to lose your fine-tuning at some point in the future, but I wouldn't say it'll ever cause it to break. You're just telling Go how much memory you want to pre-allocate. It'll continue doing that; if that stops getting you the same GC benefits you wanted, then at worst you'll be back in the same boat you were originally.

Writing your own GC routine, on the other hand, gives you a ton of new opportunities for introducing very real breakage via your own code.

lilyball 6 years ago | |

Agreed, especially because this new code may have unintended consequences. For example, if the heap grows extremely fast in that 500ms sleep time then it can get dramatically larger than you'd like, when instead we want to run a GC right as it hits 20GB used.

twotwotwo 6 years ago |

There is internal work on a SetMaxHeap API: https://github.com/golang/go/issues/23044 (there's a review of related code at https://groups.google.com/forum/#!topic/golang-codereviews/b... ). It isn't perfect (notably, heap size and process size as seen by the OS are not identical) but seems like a step up from ballast or other workarounds.

In the issue thread Caleb Spare also proposed a minimum heap size so that you get GOGC-ish behavior once your app uses enough RAM, but don't have constant GCs with a tiny heap.

There's definitely a common issue where the GOGC heuristic doesn't take advantage of situations where it can collect less often but still remain in the "don't care" range of memory use. (CloudFlare talked about the same thing making benchmark results weird: https://blog.cloudflare.com/go-dont-collect-my-garbage/ )

And there can definitely be situations where GC'ing a bit more would be worth it to keep a process under an important memory threshold to avoid swapping or OOM kills.

The designers famously don't want too many knobs, but some other ways to convey user priorities to the runtime could certainly save users from some awkward workarounds and fiddling w/the existing knobs.

teej 6 years ago |

This has interesting parallels to the issues folks have with autoscaling in AWS. When people first start using autoscaling it can be frustrating finding the right heuristic to scale on, with the automated system over-shooting or under-shooting what the capacity needs are.

What works well is when you calculate your own capacity needs, then just set the autoscaler to change to that new capacity number. In other words, using your knowledge of how your system works, you'll make better decisions than just looking at secondary metrics like resource utilization.

I know I've done manually triggered GC in Ruby and Java but I don't know enough about Go to say if the article's suggestion is reasonable.

ec109685 6 years ago | |

What does that mean to calculate capacity needs of the application? Are you saying base it on something like throughput your app can handle?

arcticbull 6 years ago |

This reminds me of why I hate garbage collectors and think we shouldn't keep investing in them. Instead, we should double down on languages that allow you to express liveness constraints in a way the compiler can understand and manage statically. I'm not saying we have the perfect one yet, though continuing to add knobs to a gooey ball of complexity is at best a game of whack-a-mole. Do something you haven't planned for and your whole app or service takes a dirt-nap and you need to call in a crack squad of your most senior engineers. Then what? Uh, maybe allocate 11GB? There's no predictability -- or even causality -- to these optimizations.

There's enough rockets on the rocket-powered horse that is GC to make it to the moon and back.

pjmlp 6 years ago | |

What we should do is learn that many GC enabled languages also offer other means to manage resources, and increase adoption of such features, instead of throwing the baby with baby water, just because a couple of them use GC for everything.

arcticbull 6 years ago | | |

GC is a means not an ends and we shouldn't be attached to it. We should focus on developing languages that allow the compiler to statically assess and infer lifetimes then we don't need a giant for loop over all of active memory. The value GC provides is it gives the developer an escape-rope from an insufficiently expressive language. If that solution involves some form of GC so be it, but the goal should not be to preserve GC but rather to improve the efficiency of the final product without substantially impacting developer ergonomics.

MapleWalnut 6 years ago |

off topic: It's annoying how the Twitch blog linked in the article doesn't have an RSS feed. How do people read these blogs without one?

https://blog.twitch.tv/en/tags/engineering

cyrusaf 6 years ago | |

The blog post is also available on Medium: https://medium.com/twitch-news/go-memory-ballast-how-i-learn...

EdwardDiego 6 years ago |

I presume this is why Java has quite specific initial/min/max heap parameters, definable either as a set amount of RAM, or a percentage of available.

erik_landerholm 6 years ago |

If the ballast works and they can "afford" it within whatever parameters they are using to define "afford", I'm all for that method.

tom_mellior 6 years ago |

Interesting approach to the application defining a custom GC strategy. (I wonder why the author gave it this strange title, since the article is really about something that Twitch is not doing.)

I'll save this for the next time someone posts something along the lines of "you can't program X in a GC'd language because the GC is so unpredictable".

ncmncm 6 years ago |

> the ballast concept seems like a quite hacky solution to me.

"Quite a hacky solution" describes every single detail of every scrap of code connected in any way to GC. It is the whole point of the enterprise. If hacky solutions make you unhappy, your only route to happiness is to run very far away.

A lot gets done with very hacky solutions, and you will never need to throw a rock very far to hit somebody who swears by them. Those of us who don't haven't time to get that work done, so for most of the world's work, it's hacks or nothing.

zozbot234 6 years ago |

Why are people using GC in this day and age for anything other than processing on fully-general graphs (where the tracing and auto collecting is genuinely helpful)? Literally everything else can be dealt with by using more flexible memory management strategies, that do not need a pre-allocated 10GB heap, and will not hog cpu in wasteful and unpredictable ways when memory utilization rises above a set percentage.

marcrosoft 6 years ago |

It is distracting to read this article and see Go code not ran through gofmt.