ScyllaDB: Drop-in replacement for Cassandra that claims to be 10x faster

ScyllaDB: Drop-in replacement for Cassandra that claims to be 10x faster(scylladb.com)

114 points by haint 10 years ago | 93 comments

Very nice.

Broadly speaking, this is the correct style of architecture for a database engine on modern hardware. It is vastly more efficient in terms of throughput than the more traditional architectures common in open source. It also lends itself to elegant, compact implementations. I've been using similar architectures for several years now.

While I have not benchmarked their particular implementation, my first-hand experience is that these types of implementations are always at least 10x faster on the same hardware than a nominally equivalent open source database engine, so the performance claim is completely believable. One of my longstanding criticisms of open source data infrastructure has always been the very poor operational efficiency at a basic architectural level; many closed source companies have made a good business arbitraging the gross efficiency differences.

acconsta 10 years ago | |

Agreed, but which architectural features are you referring to?

jandrewrogers 10 years ago | | |

Over the last decade, the distributed system nature of modern server hardware internals has become painfully evident in how software architectures scale on a single machine. The traditional approaches -- multithreading, locking, lock-free structures, etc -- are all forms of coordination and agreement in a distributed system, with the attendant scalability problems if not used very carefully.

At some point several years ago, a few people noticed that if you attack the problem of scalable distribution within a single server the same way you would in large distributed systems (e.g. shared nothing architectures) that you could realize huge performance increases on a single machine. The caveat is that the software architectures look unorthodox.

The general model looks like this:

- one process per core, each locked to a single core

- use locked local RAM only (effectively limiting NUMA)

- direct dedicated network queue (bypass kernel)

- direct storage I/O (bypass kernel)

If you do it right, you minimize the amount of silicon that is shared between processes which has surprisingly large performance benefits. Linux has facilities that make this relatively straightforward too.

As a consequence, adjacent cores on the same CPU have only marginally more interaction with each other than cores on different machines entirely. Treating a single server as a distributed cluster of 1-core machines, and writing the software in such a way that the operating system behavior reflects that model to the extent possible, is a great architecture for extreme performance but you rarely see it outside of closed source software.

As a corollary, garbage-collected languages do not work for this at all.

z92 10 years ago | | |

Basically ditched Java in favor of C++, and used a C++ framework called Seastar.

"The Scylla design, right, is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC."

http://www.scylladb.com/technology/architecture/

superpaul 10 years ago | |

I agree as well, on the networking aspect since based on their diagram they are utilizing Intel NICs which Intel provides this DPDK to bypass the kernel space and access the hardware from the application itself.

Now my question is how portable Scylla be in terms of NIC vendors?

vitalyd 10 years ago | | |

I believe dpdk supports non-Intel NICs as well. Would also be interesting to compare scylla with kernel networking vs Cassandra.

acconsta 10 years ago |

It's exciting to finally see this. Cassandra's strengths were in its distributed architecture (no master, tunable consistency, etc.). The database engine itself has always been a bit of a mess (https://issues.apache.org/jira/browse/CASSANDRA-8099).

nattaylor 10 years ago |

>The Scylla design, right, is based on a modern shared-nothing approach. Scylla runs multiple engines, one per core, each with its own memory, CPU and multi-queue NIC. We can easily reach 1 million CQL operations on a single commodity server. In addition, Scylla targets consistent low latency, under 1 ms, for inserts, deletes, and reads.

Interesting. From: http://www.scylladb.com/technology/architecture/

whalesalad 10 years ago | |

So on virtualized hardware, namely AWS, I'm sure the benchmarks won't be so magnificent. Needing a dedicated nic per core is a big deal unless you're at a pretty large scale.

jandrewrogers 10 years ago | | |

A modern Ethernet chipset has a large number of independent hardware queues. These can be assigned to VMs for direct access to the NIC, bypassing the hypervisor. AWS, since you used that example, offers instances with this type of direct bypass.

Just to pull an example from memory, the ubiquitous Intel 82599 10GbE NIC silicon has up to 128 TX and RX queues in hardware. IIRC, these are bundled in pairs for direct access in virtualized environments, so in principle you could have 64 virtual cores each with their own dedicated physical hardware queue. This is almost certainly what they were talking about. That is the whole point of this feature in Ethernet silicon; it gives cores (virtual or physical) dedicate network hardware off a single NIC.

vitalyd 10 years ago | | |

Not dedicated NIC per core, but multi-queue NIC having its queues serviced by dedicated cores.

acconsta 10 years ago | | |

How big do you have to be to lease hardware?

jhugg 10 years ago |

"A Cassandra compatible NoSQL column store, at 1MM transactions/sec per server."

Personal pet-peeve of mine. Using "TPS" or "Transactions/sec" to measure something that is in no way transactional. Maybe ops/sec, reads/sec, updates/sec, or something...

JoeAltmaier 10 years ago | |

Add my pet peeve: not listing latency stats. Big Tables does millions of ops/sec but it can take 5(!) seconds to complete one. That's the stat that matters to customers.

lucindo 10 years ago | | |

http://www.scylladb.com/technology/cassandra-vs-scylla-laten...

mappu 10 years ago |

Numbers look great, but so do /dev/null's. What guarantees does it make?

Has it been through Jepsen yet?

dorlaor 10 years ago | |

It's planned. However, I don't believe we'll pass it today. We're targeting GA for Jan and we'll give it our best shot.

acconsta 10 years ago | | |

Yeah, I was wondering about that. It looks like you guys have done some brilliant work with the storage engine, but reimplementing all the distributed logic is another (possibly bigger) project.

acconsta 10 years ago | |

Honestly, Cassandra's Jepsen didn't set a high bar:

https://aphyr.com/posts/294-call-me-maybe-cassandra/

cbsmith 10 years ago | | |

Except that problem has been largely addressed now.

mbfg 10 years ago |

Wait, did I read that right? the test was with (1) one server? What's the point of that? Smells like a cooked up test.

glommer 10 years ago | |

The point of that is to show how efficient a node can be, because that is what is replaced.

All the external facing things for scylla is the same as Cassandra. That includes all the ring stuff and all network protocols.

So you should expect similar cluster behavior.

mbfg 10 years ago | | |

>> So you should expect similar cluster behavior.

i would expect nothing.

If theire numbers were astounding with a 10, 100, 1000 node cluster, they would have published numbers with such set-ups. I call shenanigans on a report that is purposely out of line with the expected use case.

kcw39217 10 years ago | | |

Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.......

There is nothing commodity about a server with 128GB RAM.

When you introduce other nodes, you get chatter and network traffic....

domlebo70 10 years ago |

Literally zero mention that in the event of a network partition they will just drop messages on the floor. This is fine as a cache, perhaps replacing Redis... but as a Cassandra replacement this is pretty scary

vitalyd 10 years ago | |

Where did you get this from? I hope that's not a conclusion from the benchmark doing single server load testing.

nnx 10 years ago | |

Source?

prohor 10 years ago |

The license is Affero GPL, which means you need to open-source your code even if you use it for a service. "Traditional" GPL was effective only while redistributing. That means you would need to go for commercial license whenever you build a service on it. Which in fact is a fair approach for a business model when there is a company behind an open source project. Especially that this time there is no lock-in. You could always come back to Cassandra.

mappu 10 years ago | |

The virality doesn't cross the database interface layer.

Modifications to the database software must be shared, yes, but your client application is outside the reach of the AGPL and can remain proprietary.

philipov 10 years ago |

Wow, that autoadvancing website is a deal breaker.

dmarti 10 years ago | |

ScyllaDB web person here. If I made it so that you could block one script and get a home page without the horizontally scrolling thingy (but have all the other JS stuff work including syntax highlighting and graphs), would you come back? ( dmarti@scylladb.com )

eip 10 years ago | |

NoScript FTW

kcw39217 10 years ago |

which JVM did they use? What was the flags passed to the JVM?

dschiptsov 10 years ago |

Finally, back to sanity of great old-school products, like Informix, by dropping Java (the whole scam) for C++14 and by paying attention to details of an underlying OS (again).

Same trend, by the way, is in Android development.

_Codemonkeyism 10 years ago | |

10x speedup (same algorithms, same architecture) replacing Java with C++ is not possible (~2x at max).

One of the latest benchmarks I've seen is "Comparison of Programming Languages in Economics" [1] for code without any IO just number crunching, has a 1.91 to 2.69 speedup of using C++ compared to Java. So any code involving IO is going to be slower.

Replacing bad Java code with excellent machine aligned C++ a 10x speedup is possible.

[1] https://github.com/jesusfv/Comparison-Programming-Languages-...

ryeguy 10 years ago | | |

You are placing way too much weight in microbenchmarks. You simply can't use them to make a sweeping statement like you just did. Writing code that is identical to one another from language to language is not idiomatic and is not representative of how you would write each in a large scale project such as cassandra.

Java has a ton of overhead that C++ doesn't. Each object has metadata which results in more "cold data" in the cache. Each object is a heap allocation (unless you're lucky enough to hit the escape analysis optimization), which again leads to less cache locality because things are distributed around memory. Then there's the garbage collector. Then bounds checking.

glommer 10 years ago | | |

It doesn't come from the choice of language. It comes from the choice of architecture. C++ is a tiny piece of the puzzle. It would have been hell to implement such an architecture in Java bit this is as far as the language matters.

cbsmith 10 years ago | | |

It's particularly flawed given:

a) IO is such a large portion of the problem b) Hypertable isn't just way, way faster.

_Codemonkeyism 10 years ago | | |

[Edit: The LMAX guys showed how much more performance is possible with aligning code with CPU/hardware (in this case for Java)

http://mechanical-sympathy.blogspot.de/ ]

finalight 10 years ago |

will this be the next docker in the nosql database?

superpaul 10 years ago | |

I don't even see any connection how could this be the next docker in the nosql world... in other words i didn't get at all what you mean...

ketralnis 10 years ago | |

What does this mean?

mappu 10 years ago | | |

It's nonsensical buzzwords.

I guess the poster's underlying question is "will this database become hyped as the Next Big Thing"