UUIDs are popular, but bad for performance (2019)(percona.com) |
UUIDs are popular, but bad for performance (2019)(percona.com) |
This will affect you only when you are frequently creating and storing new IDs, which leads to reshaping btrees.
If you have IDs and its number is under control and doesn't change a lot you're just fine.
Every IPv6 datagram has a pair of source and destination UUIDs. :)
It doesn't talk about NoSQL or sharding, where random IDs usually perform much better than sequential due to a lack of hot shards.
If you distribute your reads and writes at random across N machines you can get ~Nx performance vs one machine. If you make your writes sequential, you'll usually get ~1x performance because every insert goes to the same machine for this second/minute/day, then rolls to a new one for the next period. There are sharding schemes that can counter this, but they require insight into your data design before implementation.
For example, Google’s Cloud Firestore is bottlenecked to 500 writes/second if your key is monotonically increasing (like a timestamp or these timestamp-based UUIDs) causing you to “hotspot” the index: https://cloud.google.com/datastore/docs/best-practices#high_...
For example, previously Amazon S3 performance guidelines recommended randomizing prefix naming with hashed characters to optimize performance for frequent data retrievals. You no longer have to randomize prefix naming for performance, and can use sequential date-based naming for your prefixes.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimi...
With MySQL, you're writing to a single box. Because of that, you'll want to be on hot pages. With Firestore, you're writing to many boxes. If you hotspot with Firestore, you're only writing to one box. If you write completely randomly to Firestore, maybe each box is bottlenecked to 400 wires/second (rather than 500) because there's no locality on each box. But if Google isn't charging you based on locality, there's no penalty for you since it'll scale up to more boxes (invisibly behind the scenes).
If you're only writing to one box, might as well take advantage of locality. If you have the opportunity to write to many boxes, you don't want all of the load going to one box.
However, one could imagine a distributed system where you could try and get both. Create an ID that has the ShardId to route it to the correct box and then something sequential after the ShardId. If you had 1,000 shards and 25 boxes, each box would be responsible for 40 shards so you wouldn't be perfectly sequential, but you'd kinda end up with 40 hot pages rather than randomly writing all over the place. It would also give you ample room to scale up your cluster.
So there is room to apply this technique to a distributed system as well.
Oh, one thing I'd note is that MySQL's InnoDB uses index-oriented (clustered) tables as they note in the article (which is why this is important with MySQL). PostgreSQL doesn't use index-oriented tables so new rows are (I believe) just written sequentially by default regardless of ID. It will have to update the primary-key index, but because the PK will be a lot smaller than the whole rows, you probably don't have to worry as much about randomly writing there. It's easier to keep an index hot than to keep a whole table hot.
How come? Even distribution is one of requirements for crypto hash functions.
The second argument on secondary indexes’ size still holds water, if it matters for a given application.
The thing that he ended up recommending however was super interesting in that I had never seen it mentioned before but it was basically to use this instead http://www.crockford.com/base32.html
But, still provide strong value as a unique identifier which is what makes them popular.
I’ve used integers as primary keys, with UUIDs as alternate keys for external-to-the-data-store queries.
This is a typo, v3 isn't random. It is generated deterministically from inputs.
> The only “repeated” value is the version, “4”, at the beginning of the 3rd field. All the other 124 bits are random.
And this is close but not quite correct. UUID v4 has a couple of other fixed bits, there are only 121-122 random ones. There are patterns in the text representation other than constant numbers. :)
I'm reading this article and it says that UUID are compared byte by byte, and seems to be indicating they're stored as string. Is that actually the case? I would have assumed that SQL supported 128 bit ints, but this seems to imply it does not.
Another question: if a column is set to char(fixed size) do the various sequel engines really not optimise to do multi word comparisons? (e.g. 8byte at a time, then 4, 2, 1, as size requires)
This article is about MySQL, apparently it's really the case in MySQL?
It's not the case in every rdbms universally. Postgres has a uuid type that stores them how you would (rightfully) expect.
I have no idea why MySQL does it this way, it does seem odd.
Or worse, the right result for the wrong reason. I've actually seen a case where creating a new entity in the application populated X records in X child tables, each with a sequential ID, and as a result all of them had the same surrogate PK. They were 1:N relationships in principle, but the software wasn't feature complete yet so the actual records were all 1:1.
Years later one of those tables finally received some extra records, and it caused a really weird bug because a query had accidentally used the PK instead of the FK as a join key, but for years it had happily chugged along because the two columns were in sync.
In an environment where millions of events are processed every second, being able to uniquely identify them is a must, and temporal keys are not always an option.
Wait, what? I thought it takes 4 base-64 digits to represent 3 bytes of data. Not 3 base-64 digits to represent 2 bytes of data.
4 * 6 = 24 = 8 * 3
So, you're correct... the exact amount of bytes that it takes to represent "actual" bytes goes like this: Actual bytes | b64 bytes required | overhead
1 | 2 | 2x
2 | 3 | 1.5x
3 | 4 | 1.33x
4 | 6 | 1.5x
5 | 7 | 1.4x
6 | 8 | 1.33x
7 | 10 | 1.43x
8 | 11 | 1.37x
9 | 12 | 1.33x
EDIT: As you can see, this averages with an overhead of between 33% (best case scenario where the encoding requires no padding, happens every 3 rows above) and something like 37%, decreasing with the number of bytes being encoded and approaching the minimum, 33% (e.g. to encode 1024 bytes, you need 1366 b64 digits, an overhead of 1.333984375x).It's better than nothing, but one of the values of UUIDs for identifiers is that you can create new ones client-side while offline. These "sequential" UUIDs will fail standard UUID validation because of the byte swapping and, in my experience, when used offline-capable apps, will result in sparse clusters of sequential UUIDs that yield an unpredictable improvement over truly random UUIDs.
Where you don't have to check collisions with a db. He mentioned generating the pk's on remote client, but that doesn't capture the interesting bits.
You generate the newly created object with the guid.
You send it to the API/Microservices and it's generated, fire-and-forget style. And the remote client has an Id of the newly created object to do something with.
I've seen this kind of architecture before. It sounds nice but is loaded with consistency problems.
https://chrisrichardson.net/post/sagas/2019/08/04/developing...
You receive a message "{entity}Created" and it contains the Id of the full object.
Note also that as long as you have a single central database you don't need UUIDs. They are only needed if you have several processes creating objects without coordination.
Often 32 bit integers are anyway not long enough as a primary key, so you need at least 64 bit keys. UUID is just double the size then.
Even in mobile apps, which seem like the most common use case for needing them: Unless your app creates these objects while not connected, you don't need UUIDs.
If your backend database is where objects get created, you don't need UUIDs.
Why would you use 4 bits for a version number in something that's supposed to be unique? What is the benefit of following this specification despite such cost, versus creating 128 unique bits based on time / random generators / machine IDs yourself?
When you start with random UUIDs, and then decide you actually wanted per-host-namespaced ones halfway through, if you have allocated zero bits for the version ID you're up the creek.
You're trading off present-day efficiency for future-day flexibility. Whether that's wise for a particular case depends on that case.
does anyone know what data type Reddit uses for their post/comment id? It seems like a short alphanumeric yet unique identifier and much shorter than a uuid.
Also what does twitter use for their post/comment id? Seems like some sort of big int?
https://blog.twitter.com/engineering/en_us/a/2010/announcing...
"we settled on a composition of: timestamp, worker number and sequence number. Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper"
I might be misunderstanding something here but if your random seed is based on time, under high-concurrency, doesn't this risk collisions? I can't see any thread-safety guarantees in the documentation.[1]
[1] https://dev.mysql.com/doc/refman/8.0/en/mathematical-functio...
UUIDs are generally generated randomly. This results in terrible insert performance into B+tree-based indexes, and terrible (= no) lookup locality with basically any database. In a large table, successive entries ends up in separate disk pages.
Even with time-based UUIDs, the time fields are ordered backward, which produces the same issue.
One way to fix this (beside the methods outlined in the article) is to create time-based UUIDs with fields sorted the other way around. Or reverse the time fields during indexing (slower, but still not as slow as an unnecessary disk read!).
I don't know for certain, but I suspect DynamoDB and most other databases that can trace their origins to the bigtable whitepaper have similar behavior.
That isn't always going to matter (or can be a very good thing). For a KV store like dynamodb a uuid is a great key because it'll distribute nicely across your shards.
It also depends a lot on your query patterns. If you have a uuid primary key but you're partitioning by some other value you may end up significantly reducing the number of pages you have to look through.
When you're IO bound, I'm not sure it actually is slower in practice.
MariaDB now has a native UUID column type as of MariaDB 10.7. This is brand new -- 10.7 had its RC release in Nov, not GA yet but very soon I'd imagine.
At least btrees perform much worse with random insertions. I don't know how much impact it has on LSM
If you look at the fastest approach the author tried in the second "UUID insertion rate" graph, it actually stores a modified UUID in a 36-character string. There is a missed optimization to store the modified UUID as 16 bytes, rather than 36, but imposing some order on the keys is the dominant improvement.
You can also use base85 and go to 20, but you get into some funky chars there.
Seems to have the same idea of "human-read-write without visually identical characters" but with an expanded set for shorter string length.
The only issue I've had with UUIDs is when they don't sort in increasing chronological order. RDBMSs don't appreciate high insert loads into random points in the index. Take care of that, however, and they're a treat.
> This alphabet, 0123456789ABCDEFGHJKMNPQRSTVWXYZ, is Douglas Crockford's Base32, chosen for human readability and being able to call it out over a phone if required.
Also 2 and Z can be similar in writing.
However it is nice to not see 0 and O, 1,I,l in the same string.
- It's case-insensitive allowing data encoded with it to survive going through most random line-of-business applications which may have errant UPPER() or lower() calls somewhere in their depths, as well as making it easy for humans to talk about out-loud (no need to specify case makes it easy)
- Making all the "pillar" shaped characters (I, i, 1, l) equivalent and all the "donut" shaped characters (o, O, 0) equivalent means big swaths of human typing mistakes are avoided. After so much exposure to Crockford Base32, I now loath having to decipher "is it a capitol 'I'? Or a lowercase 'l'?"
Overall, it's a great way to encode any user-facing ID; just make sure you understand that just using this encoding won't stop users from realizing that Crockford Base32 encoded IDs are sequential. If you want sequential IDs to appear non-obviously sequential after encoding you'll need to use additional techniques. Assuming Crockford Base32 obscures sequential IDs is the one case where I saw someone do something they shouldn't as it directly relates to Crockford Base32.
Base32 is a representation format which provides a textual representation of numbers, not a storage format.
I mean, anyone is free to dump Base32, or even base 2, into a string and just run with that, but that would be highly inefficient.
I just added it to the "Other projects" section of my base converter. https://convert.zamicol.com/
I loved it, think it all made a ton of sense. Lots of code samples, no weird technology choices (normally they would do it all via protobufs and gRPC but they keep the same principles and just use HTTP instead and Typescript for code samples)
Section 6.3.3 gets to the point about base32
[1] https://markmail.org/thread/3jzqjy6cavxcrpbq [2] https://markmail.org/download.xqy?id=3jzqjy6cavxcrpbq&number...
I don't know if you save enough problems by using them as alternative keys in Postgres for it to be faster, my guess is that just using them as primary key would be faster than a serial primary key and an UUID alternative one. Still, UUIDs are much more useful as client-facing data than as a pure database value, so I would also do that (and standardize my PK format), probably paying a performance penalty.
Great for (primary keys or anything) in a distributed/sharded database (e.g. CockroachDB), when data access is mostly by keys.
Nobody thinks this, do they?
That aside, Postgres stores UUIDs as a 128bit int, not as a string, so it almost certainly uses multi-word comparisons via memcmp: https://stackoverflow.com/a/29882952/703382
That's super weird. I can't think of a major RDBS that doesn't have a native date/time data type, unless you count SQLite.
Addendum: I also realise this is anecdotal, but someone on Stackoverflow mentions a significant speed up from changing `text` to `uuid` in Postgres.[1] But this also fits with what I've been told on #postgresql on libera.chat. That being said, integers would still outperform uuid.
[0] https://www.postgresql.org/docs/14/datatype-uuid.html [1] https://stackoverflow.com/questions/29880083/postgresql-uuid...
I've only read it partially (it's a very interesting read nonetheless), however, this is a key point (italic mine):
> Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored six times for each row. That means a total of 6B char(36) values representing 216 GB.
It assumes that MySQL users store UUIDs as CHAR(36), which is very wasteful, since an UUID actually requires 16 bytes (128 bits).
Now, one can store UUIDs a binary blobs in MySQL, however, they are not human-readable, so one tends to store them as CHAR(36) instead, wasting 20 bytes per entry (in total, requiring 2.25 times the strict necessary).
By supporting UUID as native data type, the storage can use the strict necessary amount of bytes, but still maintain readability, because the RDBMS will convert the data to a human-readable form.
Additionally, MySQL's clustered indexes are subject to write amplification, which makes things worse.
I haven't read the rest of the article though, which likely includes other consideration about the spatiality problems due to randomness. Things gets even more complex, due to the relationship with the data structures (I haven't fully read the article).
one is an idiot
Additionally using the less cryptographically secure uuid v1 can be a performance optimization since it has implicit time based sorting.
Except the way the fields are laid out basically defeats the point: UUIDv1 lays a 60 bits timestamp starting from the lower 32 bits, so it only sorts within a 7 minutes (2*32 * 100ns) bucket.
Hence the proposal for UUIDv6, which lays the exact same timestamp in reverse order (starting from the "top" 32b), making it naturally sortable.
I don't remember the parameters of the test, however.
InnoDB clusters on the PK by default, so when you're inserting a UUID you're not only inserting in the middle of the index (on average) you're also inserting in the middle of the table.
And I don't know how much the on-disk storage has been optimised for this sort of things, but if the answer is "not" and sparse pages are not really a thing, you might need to rewrite half the table in order to do so.
Because MySQL (specifically innodb) will cluster by the PK by default, while Posgres will not.
Meaning in MySQL, by default, the table will be stored in PK order, so when you insert an UUID at a random location any record which sorts after it has to be moved to make room, for nothing since UUID ordering is not really relevant (at least for the current standard UUIDs).
In pg, clustering is not a property (at least for now) it's an explicit point operation, which is not performed by default. So when you insert a row whether UUID or INTEGER-keyed, it's just tacked on wherever there's free space. The UUID still has to be inserted at the right place in the index' btree which can lead to a lot of cascading changes, and it's more expensive to compute, store, and load than a simple integer (especially 32b), but it doesn't induce an arbitrarily high amount of shuffling in the storage layer of the table itself.
But if you put many files with the exact same prefix - even sequential dates - then you hit the threshold the doc says, about 5000 ops/sec.
P.S.: In a few projects I've built prefixes using timestamps, but not at the very beginning, and worried that they weren't getting sharded out. The change I linked to fixes that problem.
(Or do you mean the original article, rather than the linked Crockford article?)
Lesson learned, thanks!
Alternatively, it's possible that they created pseudo-UUIDv1 by hand putting data in UUIDv6.
Not sure if the term gets used much now a days, but ‘cargo cult’ programming is definitely a thing still. Never stopped.
I should clarify - the problem with Firestore is the index tablets. The indexes inherently need to be inorder or you can't perform ordered queries.
Partitioning is very explicit in DynamoDB, for better or for worse. Harder to shoot yourself in the foot, but also limits what you can do.
I wasn't discussing using random UUIDs.
A slightly more sophisticated pattern is to have the initial creation command return an ID of the creation event. Then the client can poll that ID and just check that the event has been processed successfully. This requires the client to trust the server a little, but it means it can just poll a small, temporary, probably in-memory "recent events" table instead of hammering the main API.
A more efficient design is to YOLO, submit the second command while the first is still being processed, and trust the server to handle the commands in the right order. This is fine for backend services, unfortunately human users get annoyed at their frontend when it says "sorry you have to do the purchase finalization again because an earlier 'add to cart' command failed and I only noticed it now".
I use UUIDs but I don’t know why they would magically make my Postgres a distributed system. I like them because the client can generate them offline.
Distributed relational databases aren't naïve in this context. MySQL is.
I have a fairly recent PhD in algorithms and I haven't heard naïve used this way. When I hear naïve, it usually just means "does the immediately obvious thing".
For what you're trying to say, the term I'm familiar with is "oblivious", e.g. "oblivious routing" or "oblivious local search", occasionally with modifiers such as "cache-oblivious".
Nobody is assuming simply adding a UUID transforms a system into a distributed one.
The parent comment is about exposing UUIDs probably in order to not expose the sort order of the database.
SQL Server has a sequential uuid type which avoids exactly this problem.
https://datatracker.ietf.org/doc/html/draft-peabody-dispatch...
Disclaimer, I have no data to back up this solves the performance problems described. It's just likely to solve that "writing in the middle of the table" part.
Integers are a superior way to record sequential data.
You only care about them being sequential enough that your database engine will write them down quickly and efficiently (for engines like InnoDB or MSSQL which have such behaviour). But as a developer you typically want them to be random so that you can generate them in a decentralised manner, prevent guessing or accidental joins, safely merge records from multiple tables, etc.
Sequential UUIDs usually preserve enough randomness for the latter (as long as you don't do stuff like generating a trillion IDs exactly at midnight), while providing enough sequentiality for the former.
you refer to the uuid generated by sql server?
IIRC this is often a mix of hardware network address and a portion that is either random or based on a high-precision time, so it is still unlikely that you'll see collisions between machines.
MS SQL Server's NEWSEQUENTIALID function returns something akin to a v4 UUID (fully random, aside from variant indicator bits, I'm not sure if the variant bits are respected in NEWSEQUENTIALIDs output or if it just returns a 128-bit number) but after the first is generated the rest follow in sequence until the sequence is reset (by a reboot). Assuming the variant indicators are present, there is are 122 random bits. Even if your system is up long enough to use 2^58 (2.8810^17 if you want that in decimal) IDs generated this way, you still effectively have 64-bits of randomness even if the variant bits are present. For most systems the chance of collision with sequential UUIDs is, while larger than with other types, still so small as to be inconsequential. These are "atoms in several galaxies" level numbers.
You don't want to use sequential UUIDs in place of v4 UUIDs where security matters, or course, as it is easy to work the next in sequence.
> If you need something sequential then just use a much more simple number*
Sequential isn't their only property. UUIDs, including those that increment from the start point, are intended not to collide with those generated elsewhere.
Sequential UUIDs are a compromise - giving away a small amount of collision protection in order to gain what could be a significant efficiency boost in some circumstances (DB indexes being the main one).
Each GUID generated by using NEWSEQUENTIALID is unique on that computer. GUIDs generated by using NEWSEQUENTIALID are unique across multiple computers only if the source computer has a network card.
UUIDs are generally not guaranteed to be universally unique. The name is misleading marketing.
If you generate them randomly, then the probability of collision is small enough not to matter, but the more you introduce deterministic elements, like generating them sequentially, the more your probability of collision increases.
It'd be much more orderly for each node to have a sequential ID, the ID of the node that created it, a timestamp when it was created and a real UUID v4 for if someone wants to prevent collisions. Same data, but the primary key on an individual node is a lot smaller (half the size) and the metadata is available for use if someone wants it. There is natural room to build up namespaces and such. It is better to keep these observations separate instead of munging them all into the record ID.
I only disagree with:
> It'd be much more orderly for each node to have a sequential ID
I've outlined some reasons above why a random ID has advantages over a sequential one.
I run Postgres in prod so I can use purely random UUIDv4, but for those running mysql or mssql, UUIDv6 and the like are a useful compromise between sequentiality and randomness. The fact that they happen to use timestamps as the source of sequentiality is only an implementation detail.
There's also many ways to "work around it", so that it doesn't seem inconsistent to the user.
I'm definitely not arguing that it's how most systems should be designed though.
POTS = Plain old telephony service is restricted to a narrow frequency range of 300–3,300 Hz, called the voiceband, which is much less than the human hearing range of 20–20,000 Hz [from https://en.wikipedia.org/wiki/Plain_old_telephone_service ]
Fun fact: dzs counts as a single letter in Hungarian (e.g. in alphabetical ordering).
The feedback flow is async from the creation flow.
If you get created through the feedback flow it's created.
Hard to achieve if everyone starts from 00000000-0000-0000-0000-000000000000.