RedisRaft

144 points by anhldbk 3 years ago | 50 comments

jwr 3 years ago |

A gentle reminder that FoundationDB exists and has this nailed down really well. They are just bad at marketing, so it's not in fashion. But do check it out if you want a distributed database with strict serializable semantics, that works.

eternalban 3 years ago | |

>bad at marketing

I am guessing maybe it's Apple's corporate secrecy that is the issue. Apple likely has a massive deployment of this tech.

ironchef 3 years ago | | |

There are some huge deployments out there. Snowflake (the database) has a big dependency on it for how they do metadata (for example)

geenat 3 years ago | |

Strongly consistent FoundationDB = Likely similar write performance to CockroachDB or TiDB when you avoid secondary indexes.

Secondary indexes in "distributed strongly consistent" systems is what ruins performance: because each index is +1 write to another "index table" (... +1 raft quorum check!).

I don't think FoundationDB has "secondary indexes" to begin with, so one may never run into the +1 write per index issue.. it's just a layer on top of TiKV (equivalent to RocksDB in CockroachDB).

jwr 3 years ago | | |

I am not sure if I am quite getting your point here, but if we're talking about indexes, I've just implemented indexing using FoundationDB and writing to the index happens in the same transaction as the main data write (not really sure why it would ever be otherwise). Definitely not "+1 raft quorum check".

Calling FoundationDB "just a layer on top of TiKV" is… well… :-)

eatonphil 3 years ago | | |

I can't speak to FoundationDB specifically but in general, why would secondary indexes require a second quorum check?

Can't secondary indexes just be implemented in your state machine?

I.e. your state machine handles the insert and also writes a secondary index at the same time. Every state machine could do this identically off the log.

ruuda 3 years ago |

I would love to see a Jepsen test of this when it's ready. The Redis Cluster evaluation [1] was a great read.

[1]: https://aphyr.com/posts/283-jepsen-redis

jabradoodle 3 years ago | |

FYI jepsen already evaluated the development build of Redis Raft and the write up was a good one https://jepsen.io/analyses/redis-raft-1b3fbf6

Will indeed be interesting to see the analysis once it becomes stable.

danw1979 3 years ago | |

“What would Kyle have to say about this ?” was the first thought that came to my mind.

TheDong 3 years ago |

Linking to the introduction bypasses the prominent note in the readme:

> RedisRaft is still being developed and is not yet ready for any real production use. Please do not use it for any mission critical purpose at this time.

noobdev9000 3 years ago | |

This is essentially a complete fabrication. If it's on public repo, ready or not, someone will use it for prod.

tehbeard 3 years ago | | |

It's not a fabrication it's a statement that if you use this in prod and it catches fire, that's on you buddy.

This is like saying laws around seatbelt wearing are a fabrication because some people ignore them..

sethammons 3 years ago | | |

Fabrication means lie. What are they lying about?

solatic 3 years ago |

Why choose this over etcd? Especially if it's a limitation / non-goal to support all Redis commands, or to respond with Redis-like quick performance? Why not go with the battle-hardened (it's the backing datastore in Kubernetes), proven option?

kbumsik 3 years ago | |

I am not sure neither. But this might overcome the etcd's soft storage limit of 8GB? [1]

[1] https://github.com/etcd-io/etcd/issues/9771

361994752 3 years ago |

I've been watching this project for a long time. It was supposed to be released with Redis 7[1]. But I guess this is not true anymore. And there is no public roadmap saying when it will be production ready.

[1] https://www.zdnet.com/article/redis-labs-unveils-redis-datab...

bullen 3 years ago |

I made my own distributed JSON over HTTP database back in 2016.

It has been running in a intercontinental production environment with 100% read uptime since 2017.

It's 2000 lines of code: http://root.rupy.se (this test env. has 3 nodes: fem, six and sju)

wyldfire 3 years ago | |

2000 lines of Java. Do performance tests show worst-case latency impact of GC pauses?

bullen 3 years ago | | |

Worst is 0.7 seconds for save and 0.1 for load it seems.

Average is 4.8 millisec and 0.5 millisec respectively.

But those are just the JVM doing it's thing.

The numbers I like are these: 190.2 158 331

200ms global save average, 158ms min and 331ms max from europe to east+west US and Asia. Without fault, so consistent, much of that can be attributed to AWS improving so much over the years.

As to why the load/save are slower it's because the complete global roundtrip stats I only have for registers which are rare now so they don't hit the GC I'm guessing.

The thing I'm most proud of is async-to-async meaning the system will saturate all cores (without io-wait) on all machines without problems... it just keeps solving the problem at 100% efficiency, no memory leaks and 5 years uptime without any crashes.

Slowing down is the worst case, and if that is a problem just upgrade instance type, no uptime then though.

CyberDildonics 3 years ago | |

I did that once by hosting json files on a webserver in a different country.

remram 3 years ago |

> A cluster may lose up to (N/2)-1 nodes

What a weird notation. When N=3, a cluster may lose up to 1 node, I don't know how that matches this formula.

compsciphd 3 years ago | |

tweaked the language a bit. thanks for pointing it out.

decide1000 3 years ago |

I am looking at KeyDB and consider to use it as replacement of Redis. Besides some speed improvements it has good looking replication and cluster solutions. https://docs.keydb.dev/docs/cluster-spec

Simpliplant 3 years ago | |

We thought the same and deployed KeyDB to production as a replacement for big Redis deployment (200+ GB memory) and we ran into many unpleasent issues with it - very high replication latency, instability, random crashes, memory leaks, etc. So I'd advise you to do thorough testing before you use it in production.

decide1000 3 years ago | | |

We start tests in the coming week. Current memory of Redis use is about 70gb. Thanks a lot for your comment. I hope to create a stable KeyDB environment as it would solve some of our problems we have with Redis replication. The issues you describe sound scary.

qeternity 3 years ago | |

It may have improved, but KeyDB has a number of issues for common Redis use cases e.g. if you're using Redis as task queue (typically BRPOP) you'll encounter a race condition in which each KeyDB instance will make a new task available on all nodes for listening workers resulting in duplication of tasks.

ukuina 3 years ago | |

I attempted to use KeyDB precisely for its replication and clustering, but was forced to switch to Redis HA. Too many issues getting it to work in a stable way.

Cardinal7167 3 years ago |

But raft isn’t strongly consistent, it has known liveness issues.

https://decentralizedthoughts.github.io/2020-12-12-raft-live...

eatonphil 3 years ago | |

What does strongly consistent have to do with liveness? If there's a connection it seems pretty indirect.

he0001 3 years ago | |

The article you linked to says if you have PreVote and CheckQuorum it then doesn’t have liveness issues.

kbumsik 3 years ago | |

That is about availability in the CAP theorem, not consistency though.

withinboredom 3 years ago | |

Raft is a pretty decent -- not great -- consensus algorithm (IMHO) but it is used because it is easy to understand. If I had to trust one, I would probably go with Multi-Paxos, if you could successfully implement it.

mperham 3 years ago |

AWS’ new MemoryDB also seems to be a strongly consistent Redis cluster service. Anyone know how they compare?

https://aws.amazon.com/memorydb/features/

kbumsik 3 years ago | |

MemoryDB has a single node (primary node) strong consistency.

MemoryDB seems to have a very similar architecture to that of AWS Aurora, which separates a storage layer and compute nodes and consistency is implemented not by communicating between compute nodes but by offering a consistent distributed storage layer. This architecture usually don't have a multi-node strong consistency by itself and can have replicas.

This means that in MemoryDB only the primary node is strongly consistent but the replica nodes don't.

Instead, in my experience, those kinds of AWS offerings have less operational headaches because the storage remains safe even the primary node fails and you don't need to worry about managing distributed nodes.

Edit: add pros

phamilton 3 years ago | |

My understanding of MemoryDB is that it basically replaces the AOF with a distributed log (it might be Kafka/kinesis, but it could just be backed by the same data layer as aurora). The biggest win there is that acknowledged writes are not lost if the writer node dies. A reader can replay the log and get fully caught up as it is promoted during a failover.

This comes at a cost though and writes are slower than traditional redis.

geenat 3 years ago |

Game changer if you can turn raft checks on/off on a per-query basis, like scylladb / cassandra.

esafak 3 years ago | |

Please elaborate.

manigandham 3 years ago | | |

Cassandra (and ScyllaDB which is the same data model) allow for customizable consistency level on a per-query basis. You can send 1 write with only one node confirming while sending another requiring full cluster acknowledgement.

More details: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml...

slondr 3 years ago |

Does this have any benefit over Mnesia?