Document Database Transaction Models

Document Database Transaction Models(fauna.com)

72 points by evanweaver 4 years ago | 42 comments

Pamar 4 years ago |

Document databases are very convenient for modern application development, because modern application frameworks and languages are also based on semi-structured objects instead of tabular data.

Citation needed.

mjburgess 4 years ago | |

All object-oriented programs have a data model which is "semi-structured" (ie., non-tabular).

Almost no software application data models are tabular.

jhgb 4 years ago | |

Wait, you need a citation to show that commonly used languages have structs/records/classes with pointers/OOPs?

OK, in that case, I'm citing Java spec, C++ spec, C# spec, Javascript spec, Smalltalk spec, Common Lisp spec...

Pamar 4 years ago | | |

All the languages you listed worked fine to develop complex applications using "tabular data".

I find ... debatable that building yet again one more FB or IG challenger is significantly more important than maintaining or creating new Banking, ERP, Inventory Management solutions and for that stuff "tabular data" doesn't seem to be a bad match, or at the very least, using json or xml documents will not provide much of an improvement (all IMHO, of course...).

gregwebs 4 years ago | |

I think this is the much of the reason document dbs are popular. If you have an array in your struct, just save it.

Having used mongoDB in the past, for me it was a better day 1 experience. I didn’t need SQL, I just stored my data. The problem is that on day 2 there’s a different access pattern for your data but now it’s stored in a way that’s highly optimized only for that first application.

evanweaver 4 years ago | | |

If only there was some database that let you store flexibly structured documents but keep the data normalized. Perhaps you could even construct views and indexes to accelerate different access patterns.

jhgb 4 years ago |

> In the distant past of the 2010’s, document databases didn’t offer transaction support, instead implementing various forms of eventual consistency. Vendors and open-source maintainers promoted the idea that transactionality was an unnecessary, complexifying feature that damaged scalability and availability—and many claimed that adding it to their systems was impossible.

I find it interesting that the proposed explanation for lack of transactions in document databases is the CAP theorem which covers distributed systems, NOT document databases. Clearly such thing as a document database with transactions is not impossible if, for example, GemStone/S can support transactions just fine.

evanweaver 4 years ago | |

GemStone/GemFire use a transactional protocol akin to Tuxedo. Open a bunch of locks, write a bunch of updates, release the locks. As per the docs (https://gemfire82.docs.pivotal.io/docs-gemfire/latest/develo...) this does not offer isolation or even atomicity, so it doesn't give you the C in CAP at all.

These are exactly the kind of "transactions" you get when you try to implement everything at the application level rather than the database level. Couchbase transactions (in the article) are the same. And it's not that different from Vitess cross-shard transactions either, which are not isolated (https://vitess.io/docs/reference/features/two-phase-commit/). Tandem SQL used the same scheme as well I believe.

Prior to Spanner, there were no production databases that offered ACID transactions across distributed, disjoint shards.

jhgb 4 years ago | | |

I'm sorry, what does Gemfire have to do with Gemstone/S? That seems like a completely different software from a different vendor.

> Open a bunch of locks, write a bunch of updates, release the locks.

That's how transactional databases using two-phase locking generally work, isn't it?

vvern 4 years ago |

> Additionally, out of band coordination, most likely via human beings, is required to make sure that all potential readers of the transactional writes are also transaction-aware.

Subtle burn on the mongo and its client-coordinated session causality model.

sargun 4 years ago |

Given that most application developers started with something like a postgresql, or a MySQL that has pessimistic locking, and thus transactions rarely end in read / write conflicts (aborts), how have people (and potentially programming languages) adapted to optimistic concurrency control?

Also, when you say:

> On the other hand, server-side transactions use a more typical pessimistic relational lock. They open a transaction, do some work, and then commit it.

What do you mean? What are server-side transactions?

evanweaver 4 years ago | |

We haven't seen much difference in practice between frequent aborts and frequent timeouts. Both are better than deadlocks.

I meant transactions issued from a server ("cloud") client to the database, as opposed to a mobile client.

jhgb 4 years ago | |

> how have people (and potentially programming languages) adapted to optimistic concurrency control

If they started with PostgreSQL, then they're already adapted to optimistic concurrency control, right?

sargun 4 years ago |

Having played with the Firebase Python client, outstanding writes within a transaction are "kind of" exposed because you fetch the document, and make alterations to it before saving it (within a "transaction"). That object, when you make alterations to it, bubbles up the changes -- AFAICT, this is a bit of client side hack, but it's ergonomically wonderful.

evanweaver 4 years ago | |

I guess most ORMs are like that. Is the object shared by reference across the entire runtime or do you end up with divergent objects?

sargun 4 years ago | | |

It's only valid in the context of that function invocation. The docs say not to write impure functions or do anything with concurrency -- if you start two simultaneous transactions, the client isn't "smart" about it unfortunately.

evanweaver 4 years ago |

I’m around to answer questions and discuss.

Anybody who has used Couchbase transactions, or sharded Mongo transactions, and can corroborate our analysis?

jimsimmons 4 years ago | |

I like the presentation of ideas. Can you expand on use cases of Firebase’s nested document model? Seems powerful as a file system but not sure how that will play with the complexities of distributed applications.

evanweaver 4 years ago | | |

Firebase was originally designed more as a realtime communication mechanism than an operational database. The idea was that clients would subscribe to different nodes in a data hierarchy to receive realtime notifications from other clients that were publishing to those nodes. Depending on what was in the client view, sometimes you wanted to subscribe to a leaf, sometimes to a subtree, sometimes to everything.

As these things tend to go, when there is a place to store arbitrary data, all kinds of things get shoved into it, so the mixed model in Firestore is a compromise between the original tree-of-nodes data model and a more conventional document data model.

My assumption is the Firestore-to-Spanner mapping creates subcollections as shared tables with foreign keys to the parent documents, but I don't actually know. However, that would match the mandatory 1-to-many-to-1-to-many data layout, and makes more sense than shoving all the dependent data into the document itself or creating multiple millions of SQL tables for millions of documents.