Parallel garbage collection for SBCL [pdf]

Parallel garbage collection for SBCL [pdf](applied-langua.ge)

159 points by slyrus 2 years ago | 45 comments

dang 2 years ago |

Recent and related (but we merged the comments hither):

Steel Bank Common Lisp 2.3.8 released: “a mark-region parallel GC is available” - https://news.ycombinator.com/item?id=37295611

hayley-patton 2 years ago | |

Please note that the paper was published before I implemented incremental compaction; the GC in SBCL 2.3.8 can compact.

dmpk2k 2 years ago | | |

Does this have much effect on Kandria’s latency results? Shirakumo tries to put a positive spin on things, but her results show that SBCL’s existing GC isn’t up to the task for real-time games (or many networked apps, for that matter).

As an aside, thanks for your efforts. Adding a new GC is a major effort, which improves things for all of us! :D

amno 2 years ago | | |

Thank you for the paper, and for the work!

Can I ask a newbish question: will this new GC be available on all OS:s and CPU architectures, or only on some? I don't see anything in the paper about being limiting to some certain platform, so my hopes are high :).

rayiner 2 years ago |

Very cool! Here is the paper: https://zenodo.org/record/7816398. It uses the well known Immix heap layout/algorithm. https://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi...

The old gencgc was pretty cool for the single core era, and it sounds like it still holds up well. If I recall correctly, it was based on the Bartlett Mostly Copying paper, which is an elegant and practical GC design. https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-12.pdf. I miss these old papers that described this stuff in a way you didn’t have to be a math major to understand. I think the first version of that paper had the C++ code as an appendix: https://www.hpl.hp.com/techreports/Compaq-DEC/WRL-88-2.pdf.

Clarity in your technical communications matters. The Immix papers are similarly well written and clear. I don’t think it’s a surprise that both GC designs have also been independently implemented over and over. The Chaitin-Briggs register allocator is another example where I’d attribute at least some of the success in widespread industrial implementation to Briggs’ excellent and approachable PhD thesis describing the algorithm: https://www.cs.utexas.edu/users/mckinley/380C/lecs/briggs-th...

sctb 2 years ago | |

For more Lisp+Immix action there's also Andy Wingo's article and talk about a new GC for Guile from earlier this year:

Article: https://wingolog.org/archives/2023/02/07/whippet-towards-a-n...

Talk: https://fosdem.org/2023/schedule/event/whippet

latenightcoding 2 years ago | | |

Good to know, it never sat right with me that Guile depended on BDW-GC for so long. That's what you use when you are implementing a toy language and don't have time to write your own GC.

mgsouth 2 years ago | |

I'm getting "connection reset by peer" for the Immix link. Wayback link instead: https://web.archive.org/web/20230313182036/https://users.cec...

amno 2 years ago | |

> Clarity in your technical communications matters.

Indeed; it does.

I wish more people watched that Steels talk, where he speaks about importance to use accessible language. One-syllabus words all the way would be perhaps a bit tedious read, but he makes a very good point about clarity in connection to simplicity.

Thanks for all the links Raiyner.

dang 2 years ago | |

(As the paper was posted in a more accessible form a few hours ago (thanks slyrus!), I reupped that submission and merged the comments from https://news.ycombinator.com/item?id=37295611 hither.)

vindarel 2 years ago |

Other great achievements from last year [0]:

- SBCL is callable as a shared library (see sbcl-librarian)

- sb-simd

- prebuilt binaries for Android (termux, unofficial)

- better image compression using zstd

- I'll add https://github.com/sionescu/sbcl-goodies, binaries with "goodies" inside (OpenSSL, libfixposix)

yay!

[0]: https://lisp-journey.gitlab.io/blog/these-years-in-common-li...

bonus from 2021: true static binaries are coming https://www.timmons.dev/posts/static-executables-with-sbcl-v...

corinroyal 2 years ago | |

OMG! I read the SBCL release notes every month, but when you put together recent highlights like this, I'm blown away. Android binaries? True static executables? Yes, please!

This is impressive progress, especially for a language so many have written off as moribund. And that's just one implementation of the spec. ECL, ABCL, and Clasp are all progressing at a brisk pace too. Maybe the only good language is a dead language.

ducktective 2 years ago | | |

>True static executables

Correct me if I'm wrong but this has not happened yet.

koito17 2 years ago | |

I cannot emphasize how much of a game changer SB-SIMD has been for writing performant numeric code in SBCL. Being able to experiment with AVX intrinsics in a REPL has been quite the experience for me.

fud101 2 years ago | | |

needs a tutorial

ducktective 2 years ago |

Nice, good to see activity in SBCL dev.

Does anyone know how difficult implementing a Real-Time GC would be for SBCL or ECL. I know of that paper by Rodney Brooks (L -- A Common Lisp for Embedded Systems)

aidenn0 2 years ago | |

You'll need to be more specific.

From a technical (i.e. useless) point of view SBCL very nearly already has a real-time GC, with a few modifications it would qualify since already:

1. The amount of work in a GC operation is bounded by the heap size

2. SBCL has a fixed maximum heap size

Two things would need to be done:

1. Ensure the areas where GC is inhibited are bounded

2. Call GC operations on a timer, and when they are done, ensure there is sufficient free space; RTGC cannot exist without specific bounds on the mutator, so you could almost certainly invent bounds on the mutator that would make the gencgc qualify as real-time for.

#1 would need to be done for any actually useful RTGC anyways.

A slightly less snarky answer is that a non-toy GC is a lot of work. Different choices in GC will affect code-generation (e.g. read and/or write barriers GC, and forwarding-pointers for incremental GC[1]).

There's a reason the gencgc has been around as long as it has, and it's because it's "good enough" for a lot of people and the work needed to replace it (with any non stop-the-world GC anyways) is rather large. Even TFA is neither incremental nor concurrent, just parallel.

1: Stop the World GCs may also use forwarding pointers, but codegen doesn't need to know about them because they are fully resolved before the mutator continues.

CyberDildonics 2 years ago |

A pattern I've noticed is that in languages with garbage collection, the garbage collection aspect is never finished. It is always a pain point and people are constantly waiting for promised future features and refinements.

Alifatisk 2 years ago |

What would having a parallel GC mean in the practical sense?

That there is no pauses happening during the runtime?

hayley-patton 2 years ago | |

There are still pauses, but the pauses are faster by using multiple cores to collect, which is nice for both throughput- and latency-sensitive apps. No-pause ("on-the-fly") collectors exist, but sufficiently-short pause ("concurrent") collectors still do wonders with sub-millisecond latency.

simiones 2 years ago | | |

The new GC is not concurrent, it is still a stop-the-world collector. The improvement is that it can do the collection and compaction work in parallel on a multi-core machine.

As such, it will greatly increase throughput if given multiple cores, and will lead to lower pause times based on this improvement in throughput.

simiones 2 years ago | |

No, it simply means that it can better utilize multiple cores to do the GC work faster, so pauses will be proportionally shorter. With the current SBCL collector (or the old one, since this one is already released), even if you ran on a 16-core system, the GC would still only use a single core to reclaim memory (in practice, to move some of/all your live objects to a different address, as it was a copying collector).