Cognitect: Relevance merges with Metadata Partners (Datomic)(cognitect.com) |
Cognitect: Relevance merges with Metadata Partners (Datomic)(cognitect.com) |
From their FAQ:
"Datomic is not a good fit if you need unlimited write scalability, or have data with a high update churn rate (e.g. counters)."
Don't you get most of that through... caching? Also, it seems to assume that the dataset will fit into RAM.
* Storage, Transactions, and Querying are separated as in different processes/machines separated.
* Data is immutable. Storage is pluggable and has implementations on top of Dynamo/Riak.
* Transaction semantics and ordering are controlled by a single process for consistency. This is the write scaling caveat. It's less of a restriction than it sounds (if you're thinking SQLite2 like I did) because there aren't writes/queries competing for resources, it's just the sequencing.
* Queries on the db are performed in-client and can interoperate with client code and state. When you write a query, datomic pulls the data from storage to the local machine and performs the query.
* Queries are in a logic programming language called datalog. Even if you aren't interested in the rest, I'll recommend spending an hour working through http://learndatalogtoday.org/ just for the exposure to logic programming.
In traditional ACID databases (SQL), all queries (read and write) mostly only scale UP (beefier db machine), not OUT (lots of db machines is very hard). Datomic is an ACID database where writes still scale UP, but reads can scale OUT.
Consequences of this separation of read and write means that datomic reads scale practically arbitrarily for both query load and for dataset size. Writes do not.
This is a lot like Git, where you have to push to a central place which orders and rejects commits, but you can make useful reads from your local machine without touching network. Datomic is a lot like Git + realtime secret sauce.
That's only half the value though - Datomic also doesn't have an object relational impedance mismatch. This means Datomic doesn't need ORMs; Datomic's programming model is simpler than SQL for a competitive set of features. So you code faster with less bugs.
And no, the dataset does not need to fit in RAM.
(Honest question, not a cheap attempt at dismissal :-) )
As for why they chose this, you'd have to ask them to be sure.
But two reasonable assumptions are: 1. they wanted the storage layer to be "dumb", in particular so that they could use existing services like Dynamo. 2. they wanted reading processes to be totally independent. Readers can talk directly to the dumb storage layer without any centralized resource coordinator to execute queries. That means horizontal scalability in the strict sense.
It means that reads from a hot cache do not touch network. Reads are very fast and scale "out". You can write code that does a lot of reads without caring much about performance. (SQL reads only scale "up" and you care very much about their performance.)
Datomic is like Git (distributed reads, central writes); Postgres is like CVS/SVN (centralized reads and writes). This is made possible by immutable history.