Redis streams as a pure data structure(antirez.com) |
Redis streams as a pure data structure(antirez.com) |
> Before Streams we needed to create a sorted set scored by time: the sorted set element would be the ID of the match, living in a different key as a Hash value.
I think the sorted set would be a much better choice, because then you could still insert items in the past, like when that admin remembers there was a tennis match last week he never recorded. Same goes for modifying past values, or deleting values. These operations are trivial using a sorted set & hash, not so using streams.
I'm excited for streams and I'm glad Antirez is taking time to blog and evangelize, but this article didn't convince me there's a compelling use-case for streams aside from the Kafka-like use-case.
We use sorted sets as queues heavily and this would be a necessary thing for us to consider giving streams a go which would indeed be interesting from a memory savings (we sometimes have millions of items in our queues for a short time). Sometimes, say on error conditions, you want to stuff something back at the start of the queue (because the order of processing matters) instead of at the end as one example.. priority being another.
Does anyone have good patterns for joining across entries from two or more Redis streams? This is one of the most interesting aspects of Kafka/Flink/Spark/Storm/etc. Would be useful to be able to develop with streaming joins in Redis playgrounds.
Let's say tennis games are recorded on a piece of paper and entered into the computer later. What is different?
Aside: would an embeddable redis be a useful thing for apps and other isolated devices?
1. Embedded systems are often used in environments where you need very resilient software. To crash the DB because there is a bug in your app is usually a bad idea.
2. As a variation of "1", it's good to have different modules as different processes, and Redis works as a glue (message bus) in that case. So again, all should talk to Redis via a unix socket or alike.
3. Latency is usually very acceptable even for the most demanding applications: when it is not, a common pattern to solve such problem is to write to a buffer from within the embedded process, that a different thread moves to Redis. Anyway if you have Redis latencies of any kind, you don't want to block your embedded app main thread.
4. Redis persistence is not compatible with that approach.
5. Many tried such projects (embedded Redis forks or reimplementations) and nobody cared. There must be a reason.
Hydrating/deserializing data from Sqlite into types/objects and doing whatever goodness those need, then using Redis to make "updating the database" super fast (in memory after all) and let Redis write it back to Sqlite as there is IO/time/lull in traffic.
Kinda like how Epic Cache does its transaction journal flushing every X minutes?
I understand that Elasticsearch is a common place to put logs, also because I assume that searching through logs is a common use case, but I wonder whether Redis has particular benefits for this use case. The data structure seems particularly tailored to it (but not so much to searching I guess).
For logs without full indexing, Loki (https://github.com/grafana/loki) is a recent entry into the space, and it probably a good option to look at. It indexes metadata (labels), so it allows searching by labels but not full text. It is also supposed to be horizontally-scalable, which is probably something you want in a log storage solution.
"A key difference I observed was that if a Kafka consumer crashes, a rebalance is triggered by Kafka after which the remaining consumers seamlessly start consuming the messages from the last committed offset of the failed consumer.
Whereas with Redis streams I had to write code in my application to periodically poll and claim unacked messages pending for more than some threshold time."
So far I haven't used it outside of hobby projects for webGL games and such, but it's worked brilliantly, and no Kafka required for hobby async-streaming infrastructure!
Hopefully it's useful to someone out there! https://github.com/erulabs/redis-streams-aggregator
Per-message acknowledgement is an advancement. Redis requires manual lookups for unacknowledged items but you can also use Apache Pulsar for a more scalable distributed disk-based system which itself is a solid evolution over Kafka's design.
Also note that "exactly-once" semantics are actually impossible. Messaging systems are either "at-least-once" or "at-most-once". Kafka has some attempts at using transactions to solve this but that's only when using Kafka streams and only ensures read progress, not the processing result.
1) Your application must coordinate and make sure that everything up to that offset is indeed processed successfully. 2) You application must stop if it encounters an error (because it can't commit an offset greater than that item) or handle it separately by logging to another topic, database, etc.
Other systems like Redis, Pulsar, Google PubSub provide per-message acknowledgement to allow items to be individually processed without blocking other forward progress.
It gives you a lot of interesting concepts like "lightweight databases" or push queries result into streams.
Here the GitHub repo: https://github.com/RedBeardLab/rediSQL
Full disclaimer, I am the author.
> rediSQL
why different spelling
For #1, any application which has an in-order requirement would suffer from this problem. I worked with event processing systems so we never really had to worry about this, since each event was independent. However, there were instances where we would need to track state for certain objects getting processed to make sure all of their child objects were also processed. For this we would use an external store with a short TTL since the lifetime of the object during processing would only be a few minutes.
All-in-all it just comes down to what your app’s requirements are. I don’t think Kafka is meant to replace every pub sub service out there, but definitely has some great use cases.
Redis feels like that. It’s a simple data structure server. Now if we could have those datastructurs sync with flatfiles with the same redis API, a lot of applications would become much simpler.
I’m not sure how big of an undertaking it is though.
I’m willing to bet, a fast general datasrtuctures database syncable to flat files would open up many possibilities.
I use sqlite for this purpose, essentially as an in-memory cache of data populated from disk and incoming server packets. Having redis as an option to replace mysql (or at least to compare memory use and speed) would be great.
I looked for an embedded Redis fork and came up blank, do you have links? I found Vedis, but I would rather have something built off of the Redis code than a re-implementation.
The current use of sqlite is to allow our scripted code (lua and actionscript) to make queries of the exposed data without having to write C++ code for every possible query and data object type (and implement new ones on demand).
Redis might not be the correct thing for this exact use case (some of the queries are more complex than a simple key or range look-up) but I may be prepared to take those limitations in exchange for a substantial speed and/or memory use improvement.
I would be surprised if "actual Redis" was ever the right answer to "sqlite is too much".
But I do wonder if there are some lessons to take from Redis api and wrap something like lmdb/bdb etc.
I'm not familiar enough with Redis to know when/if this would make sense over just using sqlite, though.
I've used Redis very successfully on the backend, so maybe I'm just trying to find some reason, any reason, to play with it in player facing code!
It may be useful to you as well: RediSLQ.com
Or on GitHub: https://github.com/RedBeardLab/rediSQL
Full disclaimer: I am the author
It is pretty cool to be able to share live interconnected objects between processes with full transactional safety.
For some of those using something as Redis as the existing service might be interesting, for others it will be a no-go.
I worked on automative infotainment system in the past, and throwing Redis on an embedded Linux system there would have been fine if it would have fulfilled a particular task in a good fashion. I think I even proposed it once for something.
My payload are lists of int64's. I need to do set operations on those lists before sending the result over the wire. If you advise against embedding redis, can I instead embed my logic in redis? As a filter of sorts?
Which is topical, because watching for updates is a core feature of Redis streams (and Redis already had pub/sub channels before that). For that use case, SQLite is too little, even if your needs are otherwise quite basic.
Unfortunately, this difference in capabilities seems to be partly a result of limitations in the underlying OS APIs. SQLite uses POSIX advisory locks to lock ranges of the database file, but I don’t think there’s any similar API that provides an event or semaphore associated with a given file, instead of a lock. There are plenty of messaging APIs that aren’t associated with an arbitrary file – there are semaphores, message queues, and shared memory objects, in fact two sets of APIs for each of those (SysV and POSIX), plus signals, etc. But those all have their own namespaces, and if the two processes trying to synchronize with each other are in different containers, they might not share those namespaces. There are Unix sockets – those are a decent option, but they require one process to set itself up as the server, which is a bit weird in the SQLite model where all the processes are on an equal footing, and any may quit at any time. They also don’t work over NFS (whereas locking does, at least sometimes). You can try to mmap a regular file and then treat it as shared memory, but that’s not guaranteed to work in all cases, and again doesn’t work over NFS. I suppose you could try to abuse a lock as a semaphore, but that has its limitations…
But it’s not like many people use SQLite over NFS anyway. Whatever the approach, I’d love to see a “SQLite for notifications”. It would probably be a pretty simple library, but with the needed bells and whistles like bindings to higher level languages. If a library like this exists, I’d be very interested to hear, because a while back I searched for one in vain.
> Carefully tested for correcteness and tuned for performance – never loose a bit.
correcteness -> correctness (slightly ironic :-) ) loose -> lose
> RediSQL inheritanve all the SQLite knobs and capabilities
inheritanve -> inherits
> RediSQL is written in Rust which provides more guarantess agains common bugs
agains -> against
> Only a very minor part of RediSQL is not releases as open source
releases -> released
Just fyi :-) Looks like a really interesting tool :-)
The correct link is the following now: https://payhip.com/RediSQL
Passo got acquired and shutdown...