Why should a Trace-ID be 128 bits? (A Surprisingly Long Answer)

Why should a Trace-ID be 128 bits? (A Surprisingly Long Answer)(newsletter.signoz.io)

13 points by elza_1111 58 days ago | 17 comments

ralferoo 57 days ago |

Surprised the author didn't even think about the logical conclusion of his closing paragraph: "128 bits is the ideal sweet spot, collision safety effectively forever, and it happens to match the size of a UUID, which means every database, every language, and every protocol already knows how to handle it."

UUIDs are already generated randomly for exactly the same reason. Rather than inventing something new, they should have just used a UUID.

benmmurphy 57 days ago | |

Generating 16 random bytes is simpler than generating a random UUID

ralferoo 57 days ago | | |

It basically makes no odds, unless you consider applying a constant AND and constant OR operator complicated - as UUID v4 is just 122 random bits and 6 bits fixed.

UUID v7 is a 48 bit timestamp, 74 random bits and 6 bits fixed. Sure, this is a little more complicated, but it's often worth it for many applications because it can be sorted, so keys will be approximately monotonically increasing.

drdaeman 57 days ago | | |

And there’s a good reason for that, because UUIDs have additional properties. I don’t know if versioning, partial ordering, or stable references are useful for traces or not, but with UUIDs those could’ve been a possibility.

devin 57 days ago |

From a practical standpoint, isn't it usually the case that there are retention periods for traces given how numerous they can be?

I bring this up because this article starts with "I asked Claude", but it doesn't explore the the length of time you're generating IDs over at all, which is an important aspect to consider when selecting size.

singron 57 days ago | |

Yes. The original Dapper used 64 bit trace ids and collisions were rarely a problem.

If you don't drop any spans from a trace, you can completely disambiguate a collision since the trace will have two distinct root spans. If you are missing spans, you might have a break in the parent-child links.

Even with infinite retention, your analysis will bucket by time somehow, so a collision might have no effect if the collision doesn't happen at a proximate time. If you are manually looking at traces, it will be very obvious there is a collision unless they happen at the same time.

Also, birthday paradox only expresses probability that there is a collision somewhere, but if you are filtering or looking at single spans, then the probabiliy that you actually see a collision is greatly reduced.

I think for basically all systems, an additional 64-bits has insignificant additional cost, so you may as well prevent collisions, but I think it could be a reasonable tradeoff if it mattered.

devin 57 days ago | | |

nod Adding this to my growing list of "things experienced engineers would discuss which is conspicuously missing in this case"

The future is going to be filled with "best practices" trendslop decision-making.

_trampeltier 57 days ago |

Why not 256, "because of bandwith costs". An adblocker does save bandwith costs, but not a handful bytes from an ID.

gpderetta 57 days ago |

TL;DR Birthday Paradox.

qbane 57 days ago | |

tl;dr we reinvented UUID and it works well

drdaeman 57 days ago | | |

Certainly not true. UUIDs have structure to them, and variants. Trace IDs are just 128-bit numbers, with any further semantics (almost) completely non-standardized (some systems encode timestamps, etc). They slapped a “last 56 bits are random” flag (not in the ID itself but as external metadata, so not like UUID at all) later giving IDs just a bit of semantics, but it’s not a reinvention of UUIDs.