Ask HN: Why don't foriegn keys “scale”?

6 points by djlewald 3 years ago | 2 comments

I've heard this repeated quite a bit, particularly around the inception of dynamo and NoSQL in general. But don't think I've ever heard the "why" side.

perrygeo 3 years ago |

Foreign keys must be checked at INSERT time to ensure they exist in the other table. This enforcement of referential integrity takes time and slows down the ingest rate.

On the SELECT side, data that has been pre-denormalized into a directly consumable format (documents) will be faster than constructing the same data from a join.

Of course giving up referential integrity and normalization adds a significant burden on the developers. You need to balance that against any marginal speed gains. Since your application must now enforce valid references without any help from the DB, you need to account for both the runtime and development effort it takes to roll your own referential integrity and maintain your own normalization strategy in the face of updates, etc.

DemocracyFTW2 3 years ago | |

So to sum up, foreign keys (and other DB-level constraints such as CHECK constraints) do scale—when you have little data, then their upfront cost may appear high and application-level validation trivial, but when you have lots of data over lots of tables, checking for all the possible invariants from the application becomes a huge burden.

IMHO it is the lesser evil to denormalize data from a normalized source, be it in separate tables or a separate DB / schema (as long as a single source of truth is maintained) than to put non-normalized data front and first. It is always simpler to assemble normalized data into denormalized documents than doing the reverse (parsing documents, picking apart unstructured and poorly structured field values).

This is why I believe document and graph DBs are fine when they are ancillary to a relational DB.

Foreign keys do incur a cost indeed and insofar are subject to scaling headaches, but it does not seem possible to avoid that cost if what you want is data integrity.