Datomic is Free(blog.datomic.com) |
Datomic is Free(blog.datomic.com) |
My view is that Datomic is a novel upstart in the persistence space. Most of their competition - Postgres, Mongo, Cassandra - is open-source, so they're just shooting themselves in the foot. The "pay us extra for convenient hosting and consulting" model isn't threatened by open-source in the slightest.
The only thing I can think of is that they're trying to compete with Oracle/Db2/SQL server, but I can't imagine an enterprise eyeing any of those solutions ever giving Datomic a chance.
I always wonder if this sort of move portends an exit of some of the core technical team, who would very much like to fork the codebase and move on, but in this case with only the binaries being opened up, it feels more as though they want some more people to try Datomic out. Databases such as Neo4J do this as well - free to run, but you'll probably want to pay for support.
Actually, you get the best thing from the two world. Plenish is a library that allows you to sync the content in Datomic to Postgres. https://github.com/lambdaisland/plenish
> This section only applies to Datomic 990-9202 and lower. Newer versions of Datomic Cloud will be free of licensing related costs, and you will only pay for the hardware that you use to run the system.
2007 - the Clojure programming language is announced by Rich Hickey and gains quite a bit of traction over the next 5 or 6 years. It never becomes a "top 5" language, but it could still today be arguably considered a mainstream language. It's been endorsed as "the best general purpose programming language out there" by "Uncle" Bob Martin[1] (author of Clean Code) and Gene Kim[2] (auther of The Phoenix Project, the seminal DevOps book). The fact that Rich spent two years working on it without pay and without the commercial backing many other languages enjoy is a real testament to his commitment and his vision. A Clojure-related emacs package[3] quotes Rich when it starts a REPL: "Design is about pulling things apart."
2012 - the Datomic database is announced by Rich Hickey's company. The database is praised for its ingenuity and its "time travel" features. It was designed to be deployed anywhere in the beginning, but, over time, it became difficult to deploy outside of AWS environments and even the AWS deployment path was quite cumbersome--the Datomic marketing page used to feature a maze-like diagram of all the AWS-specific features needed to make the thing work (it would be nice to find a link to that picture); I'd think most companies would have trouble digesting that and integrating it into their technology "stack".
2020 - Nubank (a Brazilian fintech backed by at least one US venture firm and a large production user of Datomic) acquires Rich Hickey's company. It appears Datamic never gained much use outside of a handful of companies. Making it free of charge (2023) may be the cost-effective thing to do in such a situation if it costs more to handle billing and payments than are brought in. The reason they're not releasing the source code could be legal one or simply the fact that open sourcing a large piece of software takes a lot of effort--something a for-profit financial services company like Nubank doesn't prioritize (rightly so).
1: https://blog.cleancoder.com/uncle-bob/2019/08/22/WhyClojure.... 2: https://itrevolution.com/articles/love-letter-to-clojure-par... 3: https://github.com/clojure-emacs/cider/blob/master/cider-uti...
so, less than useful if you want to study and modify datomic; you may have the legal "right to repair" but not the practical possibility
I will also argue that 90% of those don't need this. Just by seeing the term "web scale" makes me shy off.
What have you thought about or read so far?
Hope it goes open-source as well later on.
Thanks guys!
What are you using SQLite for? If it's analytics, perhaps DuckDB WASM might be an option?
As the thread you link mentions, the phrase “security by obscurity” historically means (more of less) “security primarily by obscurity”. But sometimes this point gets lost. The thread you mention is interesting.
Wikipedia:
> Security through obscurity (or security by obscurity) is the reliance in security engineering on design or implementation secrecy as the main method of providing security to a system or component.
Summary:
Layers of security (which can include a wide range of techniques, including obfuscation, etc): useful, because delaying attacks and/or making them less likely is useful.
Obscurity as a main method: theatre, because it often leads to self-deception about the true risks involved
Writing in a single thread removes a whole host of problems in understanding (and implementing) how data changes over time. (And a busy MVCC sql db spends 75% of its time doing coordination, not actual writes, so a single thread applying a queue of transactions in sequence can be faster than your gut feeling might tell you.)
Transactions as first-class entities of the system means you can easily add meta-data for every change in the system that explains who and why the change happened, so you'll never again have to wonder "hmm, why does that column have that value, and how did it happen". Once you get used to this, doing UPDATE in SQL feels pretty weird, as the default mode of operation of your _business data_ is to delete data, with no trace of who and why!
Having the value of the entire database at a point in time available to your business logic as a (lazy) immutable value you can run queries on opens up completely new ways of writing code, and lets your database follow "functional core, imperative shell". Someone needs to have the working set of your database in memory, why shouldn't it be your app server and business logic?
Looking forward to see what this does for the adoption of Datomic!
This one confused me. The obvious reason why you don't want the whole working set of your database in the app server's memory is because you have lots of app servers, whereas you only have one database[1]. This suggests that you put the working set of the database in the database, so that you still only need the one copy, not in the app servers where you'd need N copies of it.
The rest of your post makes sense to me but the thing about keeping the database's working set in your app server's memory does not. That's something we specifically work to avoid.
[1] Still talking about "non-webscale" office usage here, that's the world I live in as well. One big central database server, lots of apps and app servers strewn about.
So in a traditional DB you might have a DBA set up a reporting database so the operational one is not affected. Using Datomic the reporting service gets a datomic peer that has a copy of the DB in database without any extra DBA work and without affecting any web services. This also works nicely with batch jobs or in any situation where you don't want to have different services affect each others performance.
Its true that a lot more memory gets used, but it is relatively cheap - usually the biggest cost when hosting in the cloud being the vCPUs. But usually in Clojure/Datomic web application you don't need to put various cache services like Redis in front of your DB.
Thea assumption here is that the usual bottleneck for most information systems and business applications is reading and querying data.
But if you have a handful of app servers, it's much more reasonable. The relatively low scale back-office systems I tend to work with typically has 2, max 3. Also, spinning up an extra instance that does some data crunching does not affect the performance of the app servers, as they don't have to coordinate.
There's also the performance and practicality benefits you get from not having to do round-trips in order to query. You can now actually do 100 queries in a loop, instead of having to formulate everything as a single query.
And if you have many different apps that operates on the same DB, it becomes a benefit as well. The app server will only have the _actual_ working set it queries on in memory, not the sum of the working set across all of the app servers.
If this becomes a problem, you can always architecture your way around it as well, by having two beefy API app servers that your 10s or 100s of other app servers talks to.
Having the applications keep a cached version of the db means that when one of them runs a complex or resource intensive query, it's not affecting everyone else.
So is any cloud-managed db offering and at that scale we talking very small costs anyway.
Why datomic instead?
I don’t think I agree with this as stated. It is too squishy and subjective to say “perfect”.
More broadly, the above is not and should not be a cognitive “anchor point” for reasonable use cases for Datomic. Making that kind of claim requires a lot more analysis and persuasion.
This is Ions in the Cloud version, or for on-prem version the in-process peer library.
It doesn't feel like the people behind Datomic actually want to have users outside of the Clojure world, which will be rather limiting to adoption.
You could have one or more append-only tables that store events/transactions/whatever you want to call them, and then materialized-views (or whatever) which gather that history into a "current state" of "entities", as needed
If eventual-consistency is acceptable, it seems like you could aggressively cache and/or distribute reads. Maybe you could even do clever stuff like recomputing state only from the last event you had, instead of from scratch every time
How bad of an idea is this?
How do they scale it for Nubank? (millions of users)
However, if you want to paginate data that you need to sort first, and the data isn't sorted the way you want in the index, you have to read all of the data first, and then sort it. But this is also what a database server would need to do :)
So, "events" in Datomic are structured and Datomic uses them to give you query powers, they're not opaque blobs of data.
It's a good idea to version your schema changes using something like liquibase into git, that gets rid of at least some of those pains. Liquibase works on a wide variety of databases, even graphs like Neo4j
I got the same feeling in Erlang many times, once write operations start getting parallel you worry about atomic operations, and making an Erlang process centralize writes through its message queue always feels natural and easy to reason about.
Releasing only binaries, while I understand people being grumpy about it, seems like an interesting way of keeping their options open going forwards. Since it was always closed source, it now being 'closed source but free' is still a net win.
The Datomic/Cognitect/NuBank relationship is an interesting symbiotic dynamic and while I'm sure we can all think of ways it might go horribly wrong in future I rather hope it doesn't.
Open sourcing the database helps on that.
Based on experience with Prolog, I always thought using Datalog in a database like Datomic would mean being able to model your data model using stored queries as a very expressive way of creating "classes". And that by modeling your data model using nested such queries, you alleviate the need for an ORM, and all the boilerplate and duplication of defining classes both in SQL and as objects in OO code ... since you already modelled your data model in the database.
Does Datomic live up to that vision?
Datomic also support rules, including recursive rules. I wrote a library to do OWL-style inference about classes of entities using rules. You can see an example here (https://github.com/cognitect-labs/onto/blob/master/src/onto/...). This is a rule that infers all the classes that apply to an entity from an attribute that assigns it one class.
I would also say that building an "entity type definition" system as entities in Datomic is almost the first thing every dev tries after the tutorial. It works... but you almost never _need_ it later.
I was more thinking of the means to define your data "classes" (or whatever it is called on this context) though, rather than how it is passed around.
Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup. Ran into one scenario where apparently we weren't quick enough to migrate to the latest version, AWS had dropped support for $runtime in Lambda, and it became impossible to upgrade the CloudFormation template. Had to write application code to export/reimport prod data from cluster to another—there was no other migration path (and yes, we were talking to their enterprise support).
We migrated to Postgres and are now using a 10th of the compute resources. Our p99 response times went from 1.3-1.5s to under 300ms once all the read traffic was cut over.
Mother Postgres can do no wrong.
Still, Datomic seems like a cool idea.
I guess they don't claim to be open source, they're claiming to be free, which is - in itself - awesome.
Last time I checked, you couldn't push binaries to maven central, without also releasing the source. That may have changed.
EDIT: I was wrong. They actually released binaries under the Apache licence, not the source code. Which is, mildly said, deceptive. I don't even have an idea what that actually means.
They don't say anything about the source code being published. That's why (to me) this is so interesting. I've never seen binaries released without source code before.
But Maven Central has strict rules around what can be published there. I just double checked and it's a requirement to publish the source as well as the binaries:
https://central.sonatype.org/publish/requirements/#supply-ja...
https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even...
This is why I' rather use XTDB [1], a database similar to datomic in spirit, but with bitemporality baked in.
Datomic is an operational database management system - designed for transactional, domain-specific data. It is not designed to be a data warehouse, nor a high-churn high-throughput system (such as a time-series database or log store).
It is a good fit for systems that store valuable information of record, require developer and operational flexibility, need history and audit capabilities, and require read scalability.
(via https://docs.datomic.com/pro/getting-started/brief-overview....)> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
So... no?
(I say that, but "Datomic binaries" presumably refers to compiled JVM class files; and JVM bytecode is notoriously easy to decompile back to legible source code, with almost all identifiers intact. Would Apache-licensing a binary, imply that you have the right to decompile it, publish an Apache-licensed source-code repo of said decompilation, and then run your own FOSS project off of that?)
I watched a lot of that and used Clojure fulltime for five years. Wonder what he's up to these days.
Also the ability to annotate transactions is awesome.
So many goodies.
Here's a good summary:
https://medium.com/@val.vvalval/what-datomic-brings-to-busin...
I think they went way too fast to commercial, and needed to go a freemium model to actually get market share.
I had a few projects it would have been cool on, but I just did postgres instead and won in the long run.
There's a reasonably interesting writeup of the tech details that helps show off Atomics value some, https://www.zsolt.blog/2021/01/Roam-Data-Structure-Query.htm... https://news.ycombinator.com/item?id=29295532
I hope more companies consider this unusual arrangement at least as an alternative to other approaches. Permissively licensed binaries can come in handy, though it certainly comes with it's risks. For example, Microsoft released the binaries for its WebView2 SDK under the BSD license; this is nice of course, but the side-effect is that we can (and did) reverse engineer the loader binary back to source code. I suspect that's unlikely to happen for any substantially large commercial product, and I am not a lawyer so I can't be sure this isn't still legally dubious, but it's still worth considering: the protections of a EULA are completely gone here, if you just distribute binaries under a vanilla permissive open source license.
> Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
That doesn't answer the question at all. I assume the answer is no, because otherwise they would just say yes, and have a link to the source code somewhere. But that is such a weird, and possibly duplicitous way to answer.
This is cool as well. It's a CloudFormation template based product you can deploy from AWS Marketplace.
In this case Datomic maintains development control over their product and "source of truth" is still themselves, and the implicit assumption is that you enthusiastically use their product for free with no strings attached because you respect them as the source of truth.
Freeware has been a thing for mere four decades now.
[1] https://github.com/terminusdb/terminusdb [2] https://github.com/vaticle/typedb
> Is it Open Source?
Datomic binaries are provided under the Apache 2 license which grants all the same rights to a work delivered in object form.
Datomic will continue to be developed at Nubank, where it is a critical piece of our infrastructure.
If you want historical audit trails, make them intentional and subject to the same rules and patterns as your regular data.
There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy. I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.
Datomic's killer feature is time travel.
Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?
Our data model is not large and we had a very complete test suite already, so it was easy to produce another implementation backed by postgres, RAM, etc.
We use https://django-simple-history.readthedocs.io/en/latest/ (with some custom tooling for diff generation) for audit logs and resettability, and while you can't move an entire set of tables back in time simultaneously, it's usually sufficient for understanding data history.
> Mother Postgres can do no wrong.
I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).
The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).
Don't misunderstand me. There are plenty of times when something else is the right choice. I'm just saying, when I have a say in the matter, folks need to clear that bar -- "tell me why tool xyz is going to be so much better than postgres for this use case that it justifies the overhead of adding another piece of software infrastructure."
Like, you want to add a document database? Obviously Mongo, Elasticsearch, etc are "best of breed." But Postgres is pretty capable and this team is already good at it. Are we ever going to have so many documents that e.g. Elasticsearch's mostly-effortless horizontal scaling even comes into play? If you don't ever see yourself scaling past 1,000 documents then adding a new piece of infra is a total joke. I see that kind of thing all the time. I can't tell if developers truly do not understand scale, or if they simply do not give a f--- and simply want to play with shiny new toys and enrich their resumes.
I mean, I've literally had devops guys telling me we need a Redis cluster even though we were only storing a few kilobytes of data, that was read dozens of times daily with zero plans to scale. That could have been a f'in Postgres table. Devops guy defended that choice hard even when pressed by mgmt to reduce AWS spend. WTF?
You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.
TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.
Simple, eloquent, damn true.
There is some SQL temporal support but it's not great and varies a lot. Also since it's not native to the storage it has a lot of complexity issues under the rug making it not great.
Many financial systems use Event Sourcing (OOP + ORM). I had to suffer this at a previous employer.
See https://vvvvalvalval.github.io/posts/2018-11-12-datomic-even...
- mutable data vs immutable data
- tables, row based vs tripple store, attribute based (EAV/RDF)
- table schemas vs attribute schemas
- relational connections vs graph connections
- SQL vs datalog
- nested queries vs flat queries and rules
- remote access vs client side index
etc.
It allows you to write queries in a pull style, it can be trigger based, datalog or raw index access. Its by default immutable and allows historical query. It allows meta data on the transaction themselves.
A lot of the time the user builds much of that himself or relays on frameworks to do it.
Edit: Oh, there are streaming tickets for $20.
How did that work out for you? Usually following a hype cycle, there is a negative hype cycle i.e. Mongo is webscale, then Mongo is a buggy mess.
Clojure seemed to just fade away. Did it turn out well or are there interesting pitfalls that make it not as great as advertised?
The best things about Clojure are things you don't really appreciate until you've already done the work to learn them.
For example, I never would have known how amazing it was to evaluate code inside the editor until I did the work of learning Emacs + evil-mode + nrepl/cider + whatever so that I could spin up my http server in-process and then modify code without restarting everything. Even today I'm doing `nodemon index.ts` like a goofball.
I stopped using Clojure simply when I met someone who wanted to build some big projects with me and, despite appreciating that Clojure was probably amazing, they simply couldn't be bothered to learn it. Fair enough. It was when Javascript was just getting `yield` coroutines (before async/await) which finally made Javascript bearable for me enough to switch to it for server work.
Clojure just has increasingly compelling languages and ecosystems to compete with, yet it has a huge ramp up, being a lisp, that make it hard for people to choose it.
Just consider how, to even write Clojure comfortably, you really need something like Paredit. Else, what exactly are you going to do if you want to nest or unnest a (form) deeper in the ast structure? Manually rebalance parens? Cut and paste it? Only Paredit lets you easily move a (form) around the ast. And it's amazing but yet another tool you have to learn just to truly evaluate Clojure.
That said, I've been (and currently am) a Clojure engineer for the past 5 years and loving it. Quite a lot of jobs out there, more and more each time I look, healthy ecosystem and friendly community. It doesn't hurt that it's the most paid programming language as well.
> How did that work out for you?
For the personal projects, it's been incredibly useful. The language fits the way I think, and being built on the JVM, it has both a performant runtime and lots of access to a wide ecosystem of libraries.
The Clojure-specific ecosystem library has been accused of being stagnant. I tend to take a more charitable view. The libraries I've used have tended to be smaller in scope and more well defined in terms of feature set. This makes it easier for them to converge on a feature-complete state, and many of them have done just that. If you don't mind assembling multiple smaller libraries into the useful whole you need, this can provide a stable platform on which to build and develop expertise and higher level libraries.
For larger scale commercial work, it's a harder sell. As you've pointed out, Clojure is not hugely popular, so it's fundamentally a minority language. This can make VC's touchy about funding. This is true to the extent I'm aware of at least one organization that started moving away from Clojure for that reason.
There's also the shape of the learning curve. It can be hard to get started with Clojure because of the issues around the syntax and associated tooling. The more piecemeal aspect of the library ecosystem can then make it harder to get to hit the early successes a larger framework-oriented approach can give you out of the box. You can get there, but it at least takes more initial effort. The same is true for all the abstractive power of Clojure (and other Lisps). Abstractions are nice, but they take time to develop and the payoff is on a considerable lag. The useful rule about waiting to abstract until after you see 2 or 3 instances of a pattern means you need to at least have spent enough time to see those 2 or 3 instances (and maybe a few more) before you really start to see the payoff in your own code.
The net of all this is that it's a language that may make it more difficult to get funding, will be initially somewhat confusing to most developers, and the payoff may well be deferred to the point you don't see it before you give up (either out of frustration or due to external factors). All in all, a considerable set of headwinds.
So what does that mean? It's probably better for projects on a longer time horizon that have a team willing and able to put in extended effort to effectively use the language. (And if the team is not self-funded, good to have a funder with some ability to accept the risk of a non-conventional solution). Not saying these projects don't exist, just that they're not common enough to build a 'popular/mass-market' ecosystem on.
Were they hoping that people wouldn't notice?
The products even have github repos! There's just no source in them.
It's open source cosplay.
update: So, I asked chatGPT and this is what I got:
----
ChatGPT> Do you know how Datomic and XTDB differ?
Yes, Datomic and XTDB are both database systems, but they differ in a few ways:
1. Architecture: Datomic follows a client-server architecture, where the client sends requests to the server for data manipulation. On the other hand, XTDB is designed as a distributed database system, where data is automatically partitioned across multiple nodes for high availability and scalability.
2. Query Language: Datomic uses the Datalog query language, which is similar to Prolog, for querying the database. XTDB, on the other hand, provides a modern and expressive query language based on Clojure.
3. Data Model: Datomic stores data in the form of entities, attributes, and values. XTDB uses a document-based data model, where each document is a self-contained unit of data.
4. Distributed Transactions: XTDB supports distributed transactions while Datomic does not. This means that XTDB can guarantee the consistency of data across multiple nodes in a distributed system, while Datomic cannot.
5. Open Source: XTDB is open-source, while Datomic is a commercial product, which requires a license to use.
Overall, both Datomic and XTDB have their own unique features and advantages, and the choice between the two depends on the specific requirements and use case of the application.
----
I dunno, it sounds like XTDB is a clear winner here, no?
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that (...)
Applying this only to their binaries directly contradicts what the license says.
A copyright license is a copyright license: in theory, all a copyright license does is give you additional rights to use something. Using a license like Apache 2 for binaries is somewhat unconventional, but it's totally possible. It (obviously) does not give you access to the source code, and I think this could never work with the GPL and other copyleft licenses because they use wording that implies you need to distrubute the source code, which you don't have.
The copyright owner, of course, has ownership, so their obligations don't really change by virtue of giving someone a copyright license. As far as I know, they could give someone a license to use something that is completely invalid and could never actually be used, and they can definitely do things like stop distributing under one license and switch to another. They own the source code, and they own the binaries (I believe the binaries would be considered a sort of derivative work in copyright terms, but again, not a lawyer.) So when they distribute a binary under a given license, it's unrelated to any particular distribution of source code. The only time this gets complex is when the ownership of a asset is split among many disparate parties, at which point everyone is pretty much beholden to the copyright licenses; like open source projects without CLAs. But if they own the source code entirely, they could, for example, distribute some source code under GPL, but then distribute modified binaries under a commercial license with a EULA, and not redistribute the modified source code, since it's their code to license to others, not a license they are subjected to themselves.
It's certainly weird for the binary license to be Apache, rather than some proprietary EULA, though.
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form
This is actually an interesting question. But I can’t see how a binary only distribution would be in the spirit of the Apache license.
I can feel the internal open vs. closed source argument from here.
To some, the answer is “open source” no matter the question. Hello wagging tail, meet dog.
Using it was pretty nice at the scale of a small startup with a motivated team, but scaling it up organizationally-speaking was a challenge due to Datalog's relative idiosyncrasy and poor tooling around the database itself. This was compounded by the parallel challenge of keeping a Clojure codebase from going spaghetti-shaped, which happens in that language when teams scale without a lot of "convention and discipline"--it may be easier to manage otherwise. All of that said, this was years ago so maybe things have changed.
At this point I'd choose either PostgreSQL or SQLite for any project I'm getting started with, as they are both rock-solid, full-featured projects with great tooling and widespread adoption. If things need to scale a basic PostgreSQL setup can usually handle a lot until you need to move to e.g. RDS or whatever, and I'm probably biased but I think SQL is not really that much worse than Datalog for common use-cases. Datalog is nice though, don't get me wrong.
EDIT: one point I forgot to make: the killer feature of being an immutable data store that lets you go back in time is in fact super cool, and it's probably exactly what some organizations need, but it is also costly, and I suspect the number of organizations who really need that functionality is pretty small. The place I was at certainly didn't, which is probably part of the reason for the friction I experienced.
[1] https://docs.datomic.com/pro/api/io-stats.html [2] https://docs.datomic.com/pro/api/query-stats.html
- It is possible to make queries against the database PLUS additional data not yet added, that is, "what if" queries
- Having a stable database-as-value is really useful for paginating results; you don't have to worry about new values being inserted into your results during execution, the way you do with traditional databases no longer how long (minutes, hours, even days) you take to traverse the data
- Reified transactions makes it possible to store extra data with each transaction, trivially, such as who made the update and why
- Immutability is amazing for caching at all layers
Development experience is extremely nice using clojure. I've used it for two other projects and has been very reliable. My latest project didn't really need any of its features compared to a traditional rdbms but I opted for it anyways so I don't have to write sql.
> “Datomic added to DynamoDB was the only option that didn’t force us to sacrifice features or add the expense of developer time. Without it, we would have had to push back a lot more, as the features would have been too difficult.”
(https://www.datomic.com/the-brigades-story.html)
like, what? effectively useless information.
some of the other testimonials mention keeping revision history, which is neat, but why Datomic vs. others? it's pretty easy to keep revision history with other databases too.
"No, the source is not available, and the product will continue to be developed by us, internally. However, binaries are provided..."
Doesn't this mean, that, as soon as I (somehow) get hold of the source code, I can distribute it as I want?
As do most distributed databases. Even when you don't store your entire database (or working set) in memory, you'll likely still have to add quite a bit of memory to be used as I/O cache.
"If, for some reason (for example, license issue or it's a Scala project), you can not provide -sources.jar or -javadoc.jar , please make fake -sources.jar or -javadoc.jar with simple README inside to pass the checking. We do not want to disable the rules because some people tend to skip it if they have an option and we want to keep the quality of the user experience as high as possible."
https://shaunlebron.github.io/parinfer/
The size of the community is not as important as the value of what is available. It's just not widely known, or maybe there's still much potential yet unrealized.
This is an oldie but a goodie on how interactive an experience you can have when using Clojure on the front end and backend of a web app:
This project is the most recent and most promising iteration of someone making a very visual interactive tool for introspecting on data structures in the REPL environment
I am stilling that phrase. It's perfect.
1. Architecture - XTDB is designed to work with Kafka, but it also works with typical JDBC databases as a datastore. Datomic can use SQL databases, DynamoDB or Cassandra if you want distribution. I personally would think long and hard before I introduced a distributed database to my organisation "because it scales".
Part of the value proposition of Datomic is it easily scales read workloads horizontally and by isolating read and writes into separate processes it improves write performance significantly. The metric usually thrown around is 75% of CPU cycles in a traditional RDBMS are concurrency coordination, which is avoided by the Datomic model. That number is quite old now so I don't know if it's still accurate as of 2023.
2. Query language - both use Datalog and support the Datomic `pull` syntax. XTDB also supports SQL.
3. Datomic's EAVT quadruplets are a compelling feature because they are so generic and can be used/re-used in many contexts. A document database would have to fit your use case pretty directly.
4. Datomic has a single transactor process. Do you need distributed transactions? Does Datomic need distributed transactions? You'd have to find someone from say, Nubank, and ask them for war stories. :-)
5. Datomic is now free-as-in-beer.
In my unqualified opinion XTDB is appropriate to choose in the following situations:
- You need to model "valid time" as part of your domain.
- Do you want a document database and are happy with everything that entails?
- You need access to the source code of your database.
- Do you have existing analysts who know SQL but don't know or can't learn Datalog?
Nubank is releasing the binary permissively. You might want more, but this is not a breach of trust.
Datomic has been around for more than 10 years, so there is ample data to base expectations.
Guess it is just one of the tradeoffs, as while some features Datomic has out of the box are hard to replicate in RDBMS-es, things like pagination which are often took for granted is a bit of work to do in Datomic. So it is something to keep in mind when considering the switch
[0] https://forum.datomic.com/t/idiomatic-pagination-using-lates...
Datomic's covering indexes are heavily based on their built-in ordering, and doesn't really have much flexibility in different ways to sort and walk data.
Personally, I'm a fan of hybrid database approaches. In the world of serverless, I really enjoy the combo of DynamoDB and Elasticsearch, for example, where Dynamo handles everything that's performance critical, and Elasticsearch handles everything where dynamic queries (and ordering, and aggregation, and ....) is required. I've never done this with Datomic, but I'd imagine mirroring the "current" value of entities without historical data is relatively easy to set up.
The CPU version is released as binary code under the MIT license.All that said... can't Datomic also do traditional query execution on the server? I thought it had support for scale-out compute for that. AIUI, you have the option to run as either a full peer or just an RPC client on a case-by-case basis? I thought you wouldn't need to resort to writing your own API intermediary, you could just connect to Datomic via RPC, right?
The introduction of intelligent application-level partitioning [1] and routing schemes can help one balance cost and performance.
[1] https://blog.datomic.com/2023/04/implicit-partitions.html
This is being repeated so often. And yet — the above is true, IF (and that's a big if for some of us) you are OK with having your database on a single machine.
If you want a distributed database with strict serializability, where some nodes can go down and you still get correct answers, Postgres is not it.
But I've also been burned by people reflexively reaching for $SHINY_NEW_TOY by default, when really there is no need. Architects and senior-level devs are the worst offenders. They throw a bunch of needlessly buzzword-compliant infra at a problem and then move on. They have the time and freedom to learn $SHINY_NEW_TOY well enough to MVP a product, but then the project is passed on to people who don't have that luxury.
I feel like there's a progression that often happens:
1. Early engineers: stick to Postgres or another RDBMS because it's all they know
2. Mid-stage engineers with "senior" in their title for the first time: reach for $SHINY_NEW_TOY
3. Late-stage engineers: stick to Postgres because it's something the whole team already knows and they recognize the true long-term cost of throwing multiple new bits of software infra into the mix
What a weird choice to make.
It is in the spirit and the letter of the license to do binary only distribution.
Why do you think so many companies are pushing the Apache License 2.0 over the GPL?
The Apache license 2 is pretty clear that binary only distribution is allowed, but I think it’s also clear that the assumption is that source is available in some form. Otherwise, why would you care about derivative works?
As is, it would be possible to decompile the JVM code into something resembling source code and then distribute that with or without modification. Which just seems odd to me.
Turns out if you really focus on composability above other concerns you get Haskell.
Did he? I seem to remember a quip in one of the presentations about German not being comprehensible to him being his own problem, because he never learned German.
The Clojure tools.build process has this exact problem. It's too low-level, so everyone got to write their own build scripts on top of that to do the same thing as everyone else. Now there's a situation of a 1000s of bespoke build scripts and 3-4 different front-ends, all effectively doing the same thing.
"In my line of work, sister, sometimes a second chance is a chance to mistake the same mistake twice" - State and Main
you can easily create datamodel to have this in SQL dbs: create table transaction_history(..., execution_time timestamp);
> And, in practice, it's not that much different for the average user if only the binaries are Apache licensed. When was the last time you needed to open up the Postgres source code and modify something?
Sure, if you're playing a game it probably doesn't make a difference. If I'm building my IT infrastructure on a product, tt makes a huge difference if I get a an open-source-licensed "binary" or access the to source:
- the package they distribute contains no less than 960 different jars. Most of those are the standard apache-project-everything-and-the-kitchen-sink-style dependencies. Say I'd like to update log4j because it contains a catastropic vulnerability that datomic decide not to fix. (not that that sort of thing ever happens)
- or say Datomic decides to abandon the product altogether or goes out of business
- or say I'm not happy with their quality of service contract around their DB they support and would like to work with a different company
Many businesses use Microsoft SQL Server or Oracle and don't need access to the source. I'm not saying open source isn't nice, but it is absolutely not a requirement for IT infrastructure.
I'd imagine people rely on many cloud services that are in fact, not open source.
For the vast majority of use cases, a FOSS DBMS and a free-as-in-beer DBMS are indistinguishable. If you're in a category where they're not, then don't use Datomic, but this is still far more than a publicity stunt.
If I use a free-binary-but-no-source product, I’m much more likely to get stuck.
(Of course, as a regretful MySQL user, I am pretty stuck, but largely because MySQL is, in many respects, a terrible product. It does, quite reliably, get security updates at the kind of general maintenance that keeps it working no worse than it ever did.)
My point is that the option to modify the source results in software bein available and community maintained in a way that binary only isn't. Even if I change the source myself just twice a decade.
The first feature on benefits (and the only reason I've ever heard Datomic brought up and/or considered it myself for production workflows) is using that stuff in application workflows: https://docs.datomic.com/pro/time/filters.html#history
Could be you're saying it in fact doesn't work well performance-wise, that'd (surprise me but) certainly explain why it's not more popular -- but I think it's clear it wants you to use this as an application feature.
Datomic is great but as another commenter said, is good for "small-ish backoffice systems that never has to be web scale". You almost probably can rely on querying history for internal applications. I think their primary market was for companies to use it internally but they never made this clear.
He lucked out when a unicorn went all in on it. Word around Cognitect was, Datomic was barely breaking even.
I wonder if anyone has made a collection of reference examples implemented this way (and in general think that a substantial compendium good examples of DB schema and thinking behind them could be worthwhile).
It is available as PDF on his publications page:
Each time the company was more comfortable with mainstream dbs, so we ended going with something like you’re talking about, built on top of a db. A couple of the projects were because a mainstream dbs wouldn’t scale.
The systems definitely worked, but it was also a lot of implementation complexity on an other wise simple business prop: “store this data as facts”
I especially don't trust free shit if I don't have access and control of the future of the code. Even then it's not a sure thing.
Now I'm even more curious if you could skip Datomic and just do something like this directly with a relational DB in production
So YOU can focus on building your own specific business logic, instead of re-implementing the immutable DB wheel.
And 9996 of them are used for stuff like the internal HR DB or other minor projects.
But this is all besides the point; Datomic is now free (as in beer) with a great license (Apache 2.0). You can use this amazing tool for free, and you have as much need to look at the source to do so as you might need to look at PostgreSQL's source.
Some of us have been hoping for this day since Datomic was first announced, but even as an insider (I have been working at NuBank NA for less than a year) I was stunned at the speed with which this decision was made and implemented.
I mean, presumably if you run a Java decompiler over it you'll get perfectly-legible Java source code. Just Java source code that makes a lot of calls to methods defined in the Clojure stdlib.
I'm guessing it would look a lot like what an Objective-C or Swift program looks like when you throw it into a C decompiler: a lot of boilerplate for imperatively building stack-local data structure temporaries to be passed into runtime functions, but otherwise the same code you'd expect.
> But this is all besides the point; Datomic is now free (as in beer) with a great license (Apache 2.0). You can use this amazing tool for free, and you have as much need to look at the source to do so as you might need to look at PostgreSQL's source.
Personally, I don't want to use Datomic as a tool; I want to use Datomic as a bag of parts. I want to pilfer the major components and libraries within the Datomic codebase, reusing them inside other things. (You know how CockroachDB and Clickhouse both implemented PostgreSQL-syntax-compatible binary wire protocol in part by taking the parser.y file directly from Postgres's codebase? I want to do that kind of thing, with Datomic. And probably using some pieces of Neo4j and/or Apache BEAM at the same time.) I also want to study the data structures and algorithms used to implement Datomic, to better port the concepts used in Datomic to other databases.
If Datomic was a true FOSS project, doing all of that would be simple/easy. With just a binary, though, I can't do any of that.
Some of the things that Clojure generates are valid bytecode for which there is no Java source code equivalent.
Reusing the pieces:
In general, Clojure works so well because it is all of a piece, with many decisions and subsystems working together. Datomic's source is the same way, you can't really consume just part of the elephant, even if you had the source code. Many things that Datomic does simply don't make sense at all out of context.
FoundationDB does this as well.
They actually have recent commits, and a release last year.
I don’t think this is a huge challenge (anymore) for Postgres or whatever traditional database.
If you're doing this, then you need to stop :)
I realize you're not claiming those columns are useful from an application perspective. Just curious to know if I'm wrong and they are useful.
Because as I understand it, the selling point of Datomic is their audit trail functionality and that is admittedly a bit onerous to implement in a RBDMS. Even though I feel like every project needs/requires that eventually.
The garbage collection / VACUUM part of an MVCC system is the harder part, saving all versions and querying a point in time is the easy one.
https://docs.oracle.com/en/database/oracle/oracle-database/2...
You can configure how long the old data is kept:
https://docs.oracle.com/en/database/oracle/oracle-database/2...
Worked examples:
If you or anyone else remembers, I'd love to watch
The full video: [Time 0:49:06] https://www.youtube.com/watch?v=VSdnJDO-xdg
Transcript here: https://github.com/matthiasn/talk-transcripts/blob/master/Hi...
I don’t agree with this. iirc Rack ultimately uses and array to represent HTTP responses. It has three members: the status code, the headers, and the response body.
If you’re shipping a new change, is it easier to mistake response[0] or response.headers?
This is a trivial example, but the general class (ha) of trade-off is amplified with more complex objects.
I love clojure and lisp but the blindness behind a statement like “no real gain” has always kneecapped adoption.
False dichotomy. There are many options other than arrays. Clojure in particular is fond of hashmaps. You can have your response.headers without OOP.
OOP is probably the best way to model such,.. well objects, allowing them a private, encapsulated state, and making it only modifiable, or even viewable through a well-defined public interface that enforces all its invariants.
When you call a connection or connection pool object, you’re querying its current state. This is absolutely data.
Alan Perlis.
Most of those had escrow agreements for central closed source components with vendors in case the vendor went out of business. (obviously only for things perceived as critical and from companies with some perceived risk of failure).
And god knows how many times have I experienced companies biting themselves because they bought into a product that turned out not to deliver what was promised after the contracts were signed.
Anything can be anywhere if we ignore latency and throughput.
Really depends on your metrics. Very little of it builds long-term trust that didn't already exist. Much of it abuses long-term trust.
See the discussion on https://news.ycombinator.com/item?id=35590734
I can't imagine a clearer example of circular reasoning.
I can lift exactly as much weight as I decide to put on the barbell.
Marketing performance is an incredibly confoundable variable.
For example, I have heard one commercial at least 5 times in the last week: advertising open positions at McDonald's. Part of that very commercial stated that, "1 in 8 people have worked at a McDonald's." I am literally one of those people! How could anyone possibly measure the effectiveness of that commercial? Is it even meant to be effective at all?
The overwhelming majority of advertising I see is from "household name brands". The notion I have heard is that the goal is not to introduce themselves to new customers, or even drive more traffic to their brand: it's to keep their status as a "household name". Do they do this because it is effective, or because they are simply big enough to afford it?
> Client (notice not Proxy) caches uncommitted writes to support read-uncommitted-writes in the same transaction. This type of read-repair is only feasible for a simple k/v data model. Anything slightly more complicated, e.g. a graph data model, would introduce a significant amount of complexity. Caching is done on the client, so read queries can bypass the entire transaction system. Reads can be served either locally from client cache or from storage nodes.
Additionally, this laziness means your business logic can dynamically choose which data to pull in based on results of queries, and you end up with running fewer queries as you'll never need to pull in extra data in one huge query in case business logic needs it.
I've also taken a look at generated clojure bytecode. It looked like the codegen is pretty straightforward with minimal optimization. It looked like it wouldn't be too hard to reverse with maybe a little bit of backtracking (essentially a parsing problem, I believe). You'd then need a separate step to redo the macros.
It sounded like it might be a fun little project (just to see if it can be done and try my hand at decompiling), but I would have wanted to decompile datomic to make it interesting and the license precluded that.
> You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and **in Source or Object form**, provided that (...)
Would you recommend using it, or would it be better to go with a safer option?
I am worried about it being unmaintained. I do have some issues that are more smells than anything else — like things becoming slower after an uptime of around three weeks (I now reboot my systems every 14 days). I could also do with improved performance.
I'm disappointed that the Winds of Internet Fashion haven't been kind to RethinkDB. It was always a much better proposition that, say, MongoDB, but got less publicity and sort of became marginalized. I guess correct and working are not that important.
I'm slowly rebuilding my application to use FoundationDB. This lets me implement changefeeds, is a correct distributed database with fantastic guarantees (you get strict serializability in a fully distributed db!), and lets me avoid the unneeded serialization to/from JSON as a transport format.
I remember back when it came out it was a big deal that it could easily scale master-master nodes and the object format was a big thing because of Mongo back then.
That was before k8's wasn't a thing back then, and most of the failover for other databases wasn't a thing just yet. I'm too scared to use it because they have a community but they're obviously nowhere as active as the Postgres and other communities.
A map is... Whatever you put in it.
If you need that validation, you just validate the map? You can use established methods like Malli or Clojure Spec for this. If you need to use a record with a fixed schema, just use a record instead of map. In Clojure, you can use most of the map functions for records too.
In the Clojure culture (so to speak) maps may also have a schema, as used by various schema checking tools, which are richer than runtime type checks. (Not the same as database schema)
Nit: I would not say that a JVM object is a schema, because there’s more to it. Rich is well known for saying that the idea of an object complects two things: (1) a record-like (struct) data structure with the (2) code to manipulate it.
Sometimes it’s even more complected because in some languages classes can make assumptions about state across all objects and threading.
I really like Clojure, but I really don’t know what some of its fans think (also true of other lisps), like there is a healthy pollination of ideas between languages, lisps are not God’s language.
ML record:
{first = "John", last = "Doe", age = 150, balance = 0.12}
Clojure hash-map: {:first "John", :last "Doe", :age 150, :balance 0.12}
Destructuring a record in an ML function: fun full_name{first:string,last:string,age:int,balance:real}:string =
first ^ " " ^ last
(It’s unclear from the example whether or not all of the destructured values are required in the function signature. I hope they are not, but I left them in since I don’t know. The caret positioning raises further questions.)Destructuring a map in a Clojure function:
(defn full-name [{:keys [first last]}]
(str first “ “ last))
I don’t know if I’m missing something that ML offers with its records aside from more strict typing, which you can also have in Clojure when it’s useful. In both cases, it looks like it’s applied at the function declaration and not the record declaration.https://www.cs.cornell.edu/courses/cs312/2008sp/recitations/...