CockroachDB 2.0 Performance Makes Significant Strides(cockroachlabs.com) |
CockroachDB 2.0 Performance Makes Significant Strides(cockroachlabs.com) |
Cockroach performance seems to scale linearly, but single-connection performance, especially for small transactions, seems rather dismal. Some casual stress testing against a 3-node cluster on Kubernetes showed that small transactions modifying a single row could take as much as 7-8 seconds, where Postgres would take just a few milliseconds.
The documentation recommends that you batch as many updates as possible, but obviously that doesn't work for low-latency applications like web frontends that need to be able to do small, fine-grained modifications.
- Replication factor increased to 5x (rather than the 3x default)
- 8 indexes on the table being modified which also needed to be updated
- Nodes spread across North America, incurring higher RTT latency between nodes
- Relatively high contention on the data triggering client-side retries
- HDD's as the storage medium (RockDB is optimized for SSDs)That's surprising. I wasn't expecting CockroachDB to be really fast, given the constraints they work within. But that sounds more like a bug or config error. Unless perhaps you mean a really high number of processes trying to update the same row at the same time? Like a global counter or something?
...
[1] github.com/heroiclabs/nakama
Even if I like CockroachDB's pg sql more, it would be helpful to have the comparison/benchmark to show something more.
This database has the potential to dethrone Spanner in a major way.
"Note: We have not filed for official certification of our TPC-C results. However, we will post full reproduction steps in a forthcoming whitepaper."
The Oracle on SPARC cluster (at the top, 2010) performs 30.2M qualified tx/min vs the 16K tx/min in this blog post. The Oracle cluster also costs $30M, which is clearly higher than the Cockroach cluster's cost.
That said, the TPC-C benchmark is new to me. Happy to update this comment if I'm misreading the numbers.
(Edited to incorporate the reply below.)
We're focusing today on our improvements over CockroachDB 1.1, using a small-ish cluster. We'll be showing some more scalability with larger clusters in the coming weeks. If you've found CockroachDB performance slow in the past, you will be pleasantly surprised with this release!
A n1-highcpu-16 GCE VM costs $289.84/month. Local SSDs are added at 375GB per drive, and they cost $30/month at $0.08 per GB. I highly doubt you could fit the ~1250 warehouses (what got you the peak TPM-C) on 375GB local SSD, but I have to make assumptions here! So, now you're paying $319.84 per instance per month, or $949.52 for 3 of these instances.
At 16,150 TPC-C, you're paying roughly $0.06 per TPC-C, or, looking at it the other way, you're getting 16.83 TPC-C per dollar spent each month. Is that good? I don't know!
Now, the really interesting question is, is that TPC-C/$ on CRDB 2.0 actually better than TPC-C/$ on CRDB 1.1? The answer lies in how many local SSDs you have to provision to reach that peak throughput. Peak is at ~1300 warehouses on CRDB 2.0, and ~800 warehouses on CRDB 1.1.
Does anyone with more knowledge here know how much storage you need per warehouse in the TPC-C test?
edit: Looking into it even further, I agree with the co-author's response here that TPC-C is still an appropriate metric. TPC-E is different and newer but still not as widely used.
We chose TPC-C because it's far more understood than TPC-E in 2018. We wanted to provide understandable benchmarks that can be put into context with other databases. Other databases report TPC-C numbers, so we choose to do so as well.
Enterprise pricing generally basically scales with the size of your company/budget and how much trouble they think you'll be worth as a customer.
As a rule of thumb, it starts at just above 1000 USD per unit, and goes up from there.
Many contracts are bespoke orders especially when you're dealing with a small company, so you can't have transparency since there isn't a single product.
Transactional writes are likely the slowest thing since they need to talk to all replicas.
Would be great to see how it compares against postgres in similar scenarios.
A lot of congrats and excitement, questions about who uses it in a production environment, very specific use-case questions, and of course the name.
Weird how predictable the response to one company/tech always is.
However....
If I worked at CockroachDB, and I saw the negative feedback around the name, I'd take it to heart. At the end of the day, the name is marketing for the hard work of their engineers, and marketing for the engineers that want to use this DB (remember, they need to sell it to their managers who may not be technical).
This issue can show up in unexpected ways. For example, for cloud providers like Compose (IBM company), would they be comfortable with putting "CockroachDB" on the front page? They might if it's good enough, but it's at least a consideration (i.e. another meeting, another stakeholder to convince).
Or how about an enterprise company that's going through due diligence, and when their client asks them about their tech stack do they say "CockroachDB" or do they obfuscate the name by saying "It's a high-performance distributed database". That's a crucial moment to market CockroachDB, and it could get lost. As sad as it is, saying that you're using MySQL "because Oracle" is a point of leverage for some sales people.
Is the name worth it? Asking honestly.
https://blogs.msdn.microsoft.com/e7/2009/05/02/a-little-bit-...
TiDB has a weird kind of variation on "read committed" where you get phantom reads (though they're not called that in the documentation, which is actually ambiguous on this point). This is a problem for apps that expect consistency.
TiDB supports READ COMMITTED isolation which is not the same as MySQL, but it is just designed for some special cases for TiDB itself and it is not recommended for external users.
TiDB has been widely adopted by many users (https://github.com/pingcap/docs/blob/master/adopters.md) in production because it support the best features of both RDBMS and NoSQL. It is quickly evolving and iterating based on users’ requirements which are prioritized and listed on the Roadmap.
I'd really love some kind of distributed-for-performance database using the exact optimizer and query planner of SQLite plus std plugins (FTS5, JSON, transitive_closure, spatial). Something like a mix between Bloomberg's comdb2 (which uses a modified SQLite frontend) and rqlite (distributed-for-safety).
Note: You can save most of the shortcomings of CDB on the SQL client side today, but don't underestimate the time it takes to implement CDB-specific workarounds...
Why do you think the identical optimizer and query planner would work in a distributed environment, with no changes from the single server implementations?
My guess is that the benchmark setup would cost about 1m dollars to install (3 racks of commodity servers). The software is free. Naturally, Oracle aren't pushing this, when they charge 10s of millions for Oracle rack :)
A traditional RDBMS does not have to worry about split brain decisions, but it can hardly do multi-master in the intuitive way.
MySQL Cluster (NDB) did such benchmarks a few years ago where we ran 2.2M transactions per minute using DBT2. This meant executing millions of SQL queries per second towards the cluster and most of the CPU load is in the set of MySQL servers.
Currently I am using the load phase of this benchmark to test loading massive amounts of data into the new 7.6 version of MySQL Cluster using machines with 1 TB of memory.
If Comdb2 7 ever reaches stable, and BLP invests some time into deployment, ops and monitoring tools/docs, it'll be a strong competitor.
Have you explored implementing a CRDT based solution like WOOT instead?
CRDTs could be a solution, but from what I gather they require too much context information to be viable for a text editing application. Our app currently uses something similar to OT.
> issues a write every time someone types into a text field
With more than a handful of people, this is getting into conflict territory pretty rapidly, especially if the document is structured as a single row (hopefully it's more granular than that). Time for some back of the envelope maths:Assuming that an average person types at around 200 words per minute (number pulled from https://www.livechatinc.com/typing-speed-test/#/), that's a character every 300ms on average. With 10 people editing the document, that's a character every 30ms on average, which can easily lead to conflicts if they're all trying to update the same resource.
And while I'd like to say that MySQL Cluster is nonetheless exhibiting very impressive performance, I can't really say that; they are using expensive networking and hardware that most people don't have available, microoptimizing both the client and server sides, and using a low-level API specifically designed for doing well in these sorts of benchmarks, but they still lag far behind the per-core performance of state of the art key / value stores in similar circumstances. For example, MICA can do 80 million key/value requests per second of a comparable size to the ones they listed on a single commodity server, with only regular NiCs and 10GiB ethernet (and in fact can saturate the Ethernet link). Granted, MySQL Cluster is a full-fledged database and MICA just does key/value, but I can pretty much guarantee you that on real requests MySQL Cluster's performance collapses, and in multi-master mode it's known to be inconsistent anyway.
If you really need hundreds of millions of k/v requests per second, you'll pay a lot less buying three servers and hiring all the researchers who wrote the MICA paper to manage your key / value store than you will buying MySQL Cluster :P Or, if you want a real database, you can play the same game with the many, many database systems that do much better than Oracle's cluster on TPC-C; the same person who wrote the MICA paper released one last year about Cicada, which can handle over 2 million serializable TPC-C transactions per second on a single 28-core server. Or you can try Calvin, which can do around 500,000 georeplicated TPC-C transactions per second on a commodity EC2 100-node cluster (2.5 compute units each back in 2014) and can operate on top of any existing key-value store. The database world has advanced a lot in the past ten years, and people who really need performance have no shortfall of options that aren't Oracle.
I'd never heard of MICA, will read up on it. Calvin, though, is extremely limited. They tried to build CalvinFS on top of Calvin, but it didn't work. Subtree ops didn't scale (at all), as the DB wasn't flexible enough. Features and maturity sometimes matter, and NDB has both.
And the top results are usually crazy high number of cores clusters. The Sun example was over 1700 cores.
The open source databases didn't play that game, so TPC-C became irrelevant.
Too bad there isn't a good way to directly compare the healthy survivors.
I think what's interesting with TPC-C is that you can sort the results based on performance or price/performance. On the price/performance metric, SPARC looks expensive. Dell has a $20K SQL Anywhere cluster that can do 113K tx/min.
I wonder if anyone tried to run these benchmarks on the cloud and how one would calculate total cost of ownership there now.
Yeah, but 1700 cores worth. That's still a lot of $300 boxes. Like qty 53 Sparc T3-2's for example. Which seem to be $1200 to $2k on eBay. And unsupported, end of life, etc.
I'd compare CockroachDB's number to some more recent result with a similar number of cores. (If you can find one)
However there's a difference between architecture and deployment, and having everything in a single package makes operations much easier. CockroachDB also uses a KV storage layer (using RocksDB) with SQL on top.
For the business users, we support ansible and K8s deployment, both can help them run the TiDB easily and quickly in production. The deployment is not a problem so far.
IMO, for the distributed system, if you need to operate many instances (10, 100 or even more), what you pay attention to most is not whether the instance is one single binary or not, but is how to operate all these easily. At that time, there is no much difference.
One of the strengths of event sourcing is that you can fix things after the fact. Say you got a wrong event like you suggested that conflicts. You don't check it at the time before allowing it in log. You notice the wrong event, delete it, and replay the log.
You can always replay the log to get you current point of time.
Updating a DB every 30ms should be trivial. Heck, you should be able to grab an exclusive row lock, double check your state, and write your change without even coming close - 100% conflict or deadlock free, regardless of the number of writers, simply by using a DB as it is designed to be used. In this case, by using the biggest reasonable hammer available: make everything sequential at the DB level. You can absolutely build other systems on a relational DB that don't have that limitation.
Citus is great if you want the Postgres interface but is still using standard rowstore tables. CockroachDB is similar with rowstore performance but with added distributed consensus overhead. They are both much better for OLTP and sharding. CockroachDB also provides easy high-availability and replication.
kdb+ is much better for numeric/financial analysis apps, especially when used with the integrated query language and interpreter environment.
But for what reasons are you using Citus as well? Would like to know if I am missing something or hear another perspective.
Can you explain your use case? Thanks
Also the original post only mentioned MemSQL, Citus, and CockroachDB. With those, MemSQL is the fastest.