Redis vs. Kafka vs. RabbitMQ(blog.devgenius.io) |
Redis vs. Kafka vs. RabbitMQ(blog.devgenius.io) |
This same blog already had a copied submission yesterday, so it seems it regularly does this: https://news.ycombinator.com/item?id=30061113
Also the comment about Redis not being persistent, Redis has lots of persistence options that allow you to choose trade offs between durability and performance, with the most strict setting using an append only log of every query.
Decent 1000 mile view of these solutions, but some more depth would have been nice.
https://www.cloudamqp.com/blog/reasons-you-should-switch-to-...
However many people using Rabbit are using it with Celery which doesn't work with quorum queues - so people learn to expect performance issues.
This approach has been working well for us without the need of Kafka with our serverless apps. I was curious if anyone is doing something similar.
[0] https://nats.io/ [1] https://github.com/nats-io/nats-server [2] https://nats.io/download/#nats-clients
[0] https://docs.nats.io/nats-concepts/jetstream#exactly-once-me...
It's what I use when I don't know what I should use.
(In fact these days I might even be tempted to start with a simple PostgreSQL based queue and only swap to Redis later if it becomes clear that's what's needed).
I guess a better approach might be to carefully analyse requirements up front but if those requirements aren't known at the time you start the project it's useful just to get you going.
The most important features are normally not message liveliness or throughput (complex routing can definitely be a defining feature however). This is because both are fairly easy to solve for - especially in absence of other constraints.
Much more important are durability, ordering, partitioning, acknowledgement models, fencing/isolation/failure of both brokers and consumers, etc.
These are all very nuanced things but ultimately determine which systems can be used for which applications.
A lot of people with rush to recommend Kafka but it's actually a rather narrow solution, it's distributed log model is definitely the right way to persist and replicate messages but it's fetch and consumer group APIs are essentially hot garbage for anything except strict streaming or other ordered processing cases.
This would be the major sharp edge of Kafka that people don't understand and end up pidgeon-holed into patching themselves - strict cumulative acknowledgement. This leads to head of line blocking and the only solutions involve tracking acknowledgements yourself either not using consumer groups at all or layering some inefficient solution ontop of it that only updates the offset appropriately and properly skips processed messages when recovering/rebalancing.
An alternative this article misses is Apache Pulsar which is much better suited for the role of "general purpose messaging system" that can just as easily function as a worker queue where ordering isn't important and supports various models of ordered consumption depending on your requirements.
I was also going to suggest LogDevice but it appears it's been abandoned/archived sadly.
Regardless ignore fluff articles like this. Understand the caveats of the Kafka API before going all-in, if your problem fits it's very simple/cost effective solution so it's worth it if the constraints don't bother you and you aren't annoyed by Confluent's stewardship.
Otherwise I would preference Pulsar, it's the more flexible option that you are unlikely to grow out of. Even as you get big it's natively multi-tenant and geo-replicated etc.
[0] https://blog.crunchydata.com/blog/message-queuing-using-nati...
https://www.psycopg.org/docs/advanced.html#asynchronous-noti...
We wrote a celery-like library for Node[0] and RMQ has been rock solid.
4. Redelivery
Are messages redelivered if there's a failure in the consumer? This is critical to many systems using older messaging platforms like MQ.
Are there any stories of this saving someone from a potential disaster? My experience has been that this only causes bugs, such as resending hundreds of thoudands of out-of-date emails.
Retention policies and compaction exist where you don't actually want to keep the data, but the capability is one of the distinguishing features.
You can make an event log look like a database or a queue or a cache - but if you do that you should definitely consider whether you are using the right tool for the job.
It also means every message in Kafka is "addressable" via topic/partition/offset which lets you refer to "foreign" messages etc.
We use Kafka as a queue because we understand it very well, but it has a lot of limitations compared to "purpose-built" queuing services.
In that case, sure, theoretically with JMS you can spawn as many parallel consumers as you want and with kafka you're limited to the number of partitions you've configured your topic with, as you can only have one partition per consumer.
But you get so much more out of kafka with consumer groups and partitions when you do complex message processing that would be a lot harder with traditional queues (if you partition based on the same key).
Yes. Then depending on the data it can be difficult-to-impossible to scale past that. Scaling down, of course you can always start fewer consumers, but unless you have many more partitions than consumers or partitions as a multiple of consumers, the load will be unbalanced.
> But you get so much more out of kafka with consumer groups and partitions when you do complex message processing
Well, what's "complex message processing"? If you mean the stream topology is complex or the operations benefit from key sharding I agree. If you mean that some atomic operation is complex, no, it's irrelevant or even bad e.g. the complexity means you need a DLQ/OOO retries, or load per item is so unpredictable you want to work-steal.
If you can live with not strictly ordered messages around scale time you just rescale. If you can live with some latency you stop producing, wait until lag is zero, scale, and then start producing again.
Plus if you pick a partition number with nice divisors like 6, 12, 20, 24, 40 or 60 you can have balanced consumption with different number of consumers.