The main difference is that it computes the tables at startup from the sorted Unicode intervals. So the construction code has to be fast. The same code is also used for user character classes in regular expressions.
Anyway, it builds them in two passes. First pass it de-duplicates nodes, but only the previously constructed node is a candidate for de-duplication. This keeps the memory usage low during construction. De-duplicated nodes can still be modified during this construction, so they may be re-duplicated (there is a reference counter to determine when this happens).
Second pass (after all data is loaded, no more changes allowed), it globally de-duplicates the leaf nodes using a hash table. Many of the leaf nodes are duplicates (and not just the all zero ones).
JOE is one of my favourite software projects, one I can only look up to. I don't know anything about you, but thank you!
https://github.com/ryzom/ryzomcore/blob/core4/nel/src/misc/s...
[1] https://gitbox.apache.org/repos/asf?p=lucy.git;a=blob;f=core...
https://github.com/bellard/quickjs/blob/b5e62895c619d4ffc75c...