An in-depth look at OCaml’s new “best-fit” garbage collector strategy

An in-depth look at OCaml’s new “best-fit” garbage collector strategy(ocamlpro.com)

176 points by testcross 6 years ago | 41 comments

"Remember that whatever works best for you, it’s still better than having to malloc and free by hand. Happy allocating!"

Nice, they are saying exactly the same as those pesky game developers.

https://www.youtube.com/watch?v=tK50z_gUpZI

pjmlp 6 years ago | |

Yeah, like Tim Sweeney.

"It's interesting that many games can afford a constant 10x interpretation overhead for scripts, but not a spikey 1% for garbage collection."

https://twitter.com/timsweeneyepic/status/880607734588211200

https://wiki.unrealengine.com/Garbage_Collection_Overview

Which was it again, the engine chosen by Nintendo, Microsoft and Google as first party to their 3D APIs?

https://developer.nintendo.com/tools

https://docs.microsoft.com/en-us/windows/mixed-reality/unity...

https://stadia.dev/blog/unity-production-ready-support-for-s...

https://developer.android.com/games/develop/build-in-unity

The anti-GC crowd on the games industry, is no different than the ones that fought adoption of C/Modula-2/Pascal over Assembly, and then fought adoption of C++ and Objective-C over C.

Eventually they will suck it up when the major platform owners tell them it is time to move on.

marcinzm 6 years ago | | |

>"It's interesting that many games can afford a constant 10x interpretation overhead for scripts, but not a spikey 1% for garbage collection."

Why is that surprising?

Games are basically about humans predicting things and random spikes prevent that from happening in time sensitive games. Beyond game play implications, I suspect there's also something about jerkiness in movement that bugs human senses.

otabdeveloper2 6 years ago | | |

The problem with GC isn't the "spikey 1%", it's the fact that GC implies the "everything is a pointer to an object" programming model, which in turn fragments your memory and destroys your cache.

Performance-oriented code implies everything is on the stack and/or packed into large arrays, at which point you don't need a GC after all.

StreamBright 6 years ago | | |

I guess this mentality leads to the current state of play where all UI related latencies are out of the roof compare to the 80s. At work, I deal with systems that require 16G of heap as the minimum. Funnily when things get rewritten in Rust providing the exact same functionality and the same or better performance the memory requirement goes down 10x (or more). It is up to us how much garbage our systems producing, how much CO2 is wasted on this. I guess many of us are ok with it. While some of us are not. https://blog.discordapp.com/why-discord-is-switching-from-go...

whateveracct 6 years ago | | |

When you're doing gamedev & are bumping up against your FPS, the GC is just another form of memory management. Doing GC-per-frame tends to work pretty well with a generational GC (generational hypothesis & frame-by-frame updates go hand-in-hand), but you usually have to take care about long-lived data. That's when you end up getting into more manual memory management combined with a GC. In a way, your GC'd high-level language RTS ends up being your scripting language.

That's how I've been thinking about it with Haskell at least (lots of GC knobs, manual performGC hook, compact regions for having long-lived data, good FFI, as high-level as any scripting language you could hope for)

k__ 6 years ago | |

What's their opinion on Rust?

Jhsto 6 years ago | | |

I would imagine excited. Rust's affine type system is an application of logic theory. OCaml is initially French academic production and (from an anecdotal experience) those academics tend to dis how impure most software (and memory management) is. While Rust does not have the purest theoretical foundations, it's still fresh air and will likely result in people paying more attention to the work of researchers in theoretics.

gopiandcode 6 years ago | | |

Before self-hosting, the rust compiler was originally in OCaml so presumably there's an overlap in communities there.

StreamBright 6 years ago | | |

Great question. I am really hoping that the non-GC world is taking off with Rust.

twic 6 years ago |

I'm not aware of any other industrial-strength GC using this strategy. Are there any?

If not, is there something about OCaml that makes this strategy more suitable than it is for other languages?

If not, is this a case of this being the best strategy they have the resources to implement, rather than the best possible strategy?

dfox 6 years ago | |

I feel that it is mostly about nobody else explicitly calling the strategy best-fit. For example BIBOP-derived GCs (which for purposes of this discussion includes BDW GC) are inherently best(-ish)-fit and in fact traditional unix malloc is also mostly best-fit.

the8472 6 years ago | |

I think the hotspot's CMS old gen allocator used best-fit strategy since its collector didn't compact. But CMS has been deprecated because newer, compacting low pause collectors have taken over its use-cases while being less fragile.

hinkley 6 years ago | | |

If memory serves, the new one uses an extra object header that points from the old object to the new one during move operations, and any reads of the old object get forwarded to the new one.

I'm pretty sure that would have not performed well without the aggressive prediction logic in modern processors.

Java 1's object accesses always read through an indirect pointer, but that went away in the name of performance, either when Hotspot was introduced, or on the next round of GC impromevents.

rlp 6 years ago | |

It's not exactly an industrial-strength GC, but Nim uses TLSF to reduce fragmentation: http://www.gii.upv.es/tlsf/. I'm not sure how that compares to the strategy in the article, though.

jjoonathan 6 years ago |

Hell is other peoples' algorithmic choices. My GC-fu isn't high level enough to comment on this one, but I just spent the last two days suffering in dependency hell because someone thought it would be a good idea to use a full-blown SAT solver for package management. Grr.

hyustan 6 years ago |

Surprised not see any talk about slab allocations.

Typical malloc implementations today use slabs:

A variety of allocation-classes is defined; for example, 1B, 2B, 4B, 8B, ...

Each allocation-class is essentially its own independent heap.

Slabs are really good:

Allocation is fast: a few cycles to determine the slab, then pick the first available cell, done.

Compaction is easy: all cells have the same size!

And I repeat, all cells within an allocation-class have the same size! This means things like pin cards, etc...

Compared to the pointer-chasing inherent to a splay-tree... I do wonder.

fjfaase 6 years ago |

What about the following strategy: Find the first space that is large enough. If it is smaller than double the size of the required, take it. (A little more space is allocated than would be strictly needed.) If it larger than double the size, split it. This leaves a piece that is at least as big as the current size. Assuming that the allocations have some distribution, it is likely that another piece of memory with this size will be allocated in the future. In this way, the distribution of available spaces will remain about the same as the wanted spaces. (Of course, one should also first round up the size to some power of two and possibly implement a minimum size.)

judofyr 6 years ago | |

Sounds similar to a buddy allocator: https://en.wikipedia.org/wiki/Buddy_memory_allocation

adultSwim 6 years ago |

Shout out to Damien Doligez, a phenomenal engineer

jeffdavis 6 years ago |

This reminds me somewhat of the main postgresql allocator. It keeps segregated freelists for smaller allocations, and then larger allocations are handled by malloc.