Erlang/OTP: Garbage Collector

Erlang/OTP: Garbage Collector(medium.com)

151 points by vkatsuba 3 years ago | 77 comments

dmpk2k 3 years ago |

The post glosses over the most important part of Erlang's GC: it collects process heaps separately. This transforms a hard problem (collecting a global heap with low latency despite concurrent mutators) to a _much_ simpler problem, at the price of more copying. Compare Java's G1 with Erlang's GC; the former hurts my head.

For those problems that are amenable to Erlang's model, this is a fine solution. The only real improvement here would be making collection incremental.

_0w8t 3 years ago | |

Erlang also has reference counters for things like strings that are immutable and can be shared between threads (processes in Erlang).

Overall this is a good model. Use GC for small per green thread heaps. Then use reference counters for shared immutable structures that cannot form cycles and copy everything else.

bitwalker 3 years ago | | |

Erlang only uses reference counting for binaries larger than 64 bytes, everything else is allocated on the process heap (or in heap fragments) and copied. Just that is enough to have a beneficial effect though, since large binaries are relatively common in practice, and are frequently passed around from process-to-process.

weatherlight 3 years ago | |

I thought Erlang's Garbage collector was incremental by virtue of being per process. A system may have tens of thousands of processes, using a gigabyte of memory overall, but if GC occurs in a process with a 20K heap, then the collector only touches that 20K and collection time is imperceptible. With lots of small processes, you can think of this as a truly incremental collector.

It's not incremental per process, but I'm not sure it would even matter that much in practice.

dmpk2k 3 years ago | | |

Yes, that is how it works, except (as you implicitly note) that large heaps in single processes can cause problems; allowing incremental collection per heap would flatten the latency profile further.

ramchip 3 years ago | | |

Large GC jobs get scheduled on dirty schedulers today (a background thread pool), since it's not OK to block a normal scheduler more than 1ms or so in Erlang. If they could be split into smaller chunks of work, perhaps it could be done on normal schedulers, making time allocation more fair.

dfox 3 years ago | |

Another point is that due to erlang's immutability there cannot be pointers from oldgen into nursery and thus the GC does not need write barriers.

amelius 3 years ago | |

Wouldn't Erlang be much more efficient if it simply compiled to the JVM?

_old_dude_ 3 years ago | | |

Almost 10 years ago, i've tested erjang [1] using a medium sized application. Throughput was better than BEAM but latency was terrible.

[1] https://github.com/trifork/erjang/

lenkite 3 years ago | | |

JVM standard does not support isolates so it won't work. Java's father Gosling wanted to get isolation into the Java spec but he failed.

The modern GraalVM does have isolates but its a VM specific feature and not a java standard feature.

jlouis 3 years ago | | |

It likely would. But efficiency is only one factor. Many Erlang applications are far more concerned with consistent latency than throughput efficiency. So a switch to the JVM is a lot of cost.

vkatsuba 3 years ago | | |

You can take a look to the interview with Francesco Cesarini https://www.youtube.com/watch?v=-m31ag9z4VY for more details - here is provided a part where compared JVM with a BEAM.

weatherlight 3 years ago | | |

Sure on a single machine, perhaps. but once you have multiple machines, the JVM would have to do what the BEAM does today; copy messages between processes regardless of location. That's going to slow down throughput.

nesarkvechnep 3 years ago | | |

No.

vcryan 3 years ago | | |

Ha! Absolutely no

vkatsuba 3 years ago | |

This is a good point, thanks! I will extend the topic or maybe will be better to provide new topic as continuation of the current topic - since putting everything in one article can be difficult to understand and will increase the article itself, making it more difficult to read.

throwawaymaths 3 years ago | |

> the price of more copying.

More copying if you pass values between processes. Honestly it would be really cool if you could mark off certain values that you know you're going to pass around and put them in a heap like the global binary heap.

benmmurphy 3 years ago | |

there are lots of foot guns for the user with this model. because transferring data between processes involves copying this can become a problem. Erlang tries to optimise the handling of large binaries by using a separate reference counted heap. however, this introduces another set of issues where memory is 'leaked' because a smaller binary is holding a reference to a larger binary or because processes that have not been GC'd have not decremented the ref count of large binaries in the heap that they no longer user.

throwawaymaths 3 years ago | | |

You literally listed the two biggest footguns and claimed there are "lots" of footguns. That really is it.

travisgriggs 3 years ago |

Scaling up an MQTT<->webhook relay that I wrote in Elixir to 1000’s of long running connections, I found that I needed to manually trigger periodic GCs on my long lived processes.

As binary strings work their way through the pipelines via messages, it leaves binaries on the binary heap that don’t go away because the ref count stays above 1. There are a number of GC parameters one can tune on a per process level that might cause a long lived process to collect more aggressively. But my long lived processes have a natural “ratchet” point where it was just easy to throw a collect in. This solved all of my slow growth memory problems.

I’ve read elsewhere that Erlangs GC benefits often on the basis that must Erlanger processes are short lived.

sacnoradhq 3 years ago |

ORCA (as part of the Pony compiled language) includes a more performant GC than C4 or BEAM/HiPE. It does so by reducing almost to zero the need to do global GC pauses by sharding the heap per actor, zero-copy message passing, fine-grained concurrent sharing semantics, and lock-free data structures.

bitwalker 3 years ago | |

I mean, the BEAM doesn't have global GC pauses either, as each process has its own heap - but I would expect Pony can take things a step further as a result of its strong type system, which IIRC is why it can support zero-copy messaging.

sacnoradhq 3 years ago | | |

This is true. Erlang's heap per PID. Azul's C4 and other JVM GC move in the no world stopping direction but they're still at the mercy of the model of the JVM.

If one can avoid GCs altogether a-la precise (de)allocations like Rust's non-reference-counted entities, this is cool but often requires unnatural contortionism. RC is still necessary in certain cases.

isaacsanders 3 years ago |

This is another article with more details: https://hamidreza-s.github.io/erlang%20garbage%20collection%...

vkatsuba 3 years ago |

If you want to expand the examples or improve the topic - just leave a comment about it.