Redis stream vs Kafka, A clash of two kings

Note: Non-members can read the full story in this link.
Kafka is known for addressing large-scale data processing problems and is widely deployed in the infrastructure of many well-known companies. As early as 2015, LinkedIn had 60 clusters with a total of 1100 Brokers, processing 13 million messages per second.
But it turns out that scale isn't the only thing Kafka excels at. Its advocated programming paradigm -- partitioned, ordered, event processing -- is a good solution for many problems you may encounter. For example, if events represent rows to be indexed into a search database, then the last modification is the last index, which is important, otherwise, searches will indefinitely return stale data. Similarly, if events represent user behavior, processing the second event ("user account upgrade") may depend on the first ("user account creation"). This paradigm differs from traditional job queue systems, where events are popped from the queue by many workers simultaneously, which is simple and scalable, but it breaks any ordering guarantee. Suppose you want ordered processing, but perhaps you don't want to use Kafka because of its reputation as a difficult-to-operate or expensive heavyweight system. How does Redis with its "stream" data structure, released with version 5.0, compare? Does it solve the same problems?
Kafka Architecture
Let's first take a look at the basic architecture of Kafka. The fundamental data structure is the topic. It's a time-ordered record sequence, appended-only. The benefits of using this data structure are well described in Jay Kreps' classic blog post, The Log.

Topics are partitioned to allow them to scale: each topic can be hosted on separate Kafka instances. Records in each partition are assigned consecutive IDs called offsets, which uniquely identify each record in the partition. Consumers process records sequentially, keeping track of the last offset they've seen. Since records are persisted in a topic, multiple consumers can independently process records.

In practice, you may distribute your processing across many machines. To achieve this, Kafka provides an abstraction called a "consumer group," which is a group of cooperating processes consuming data from a topic. Partitions of a topic are assigned to members within the group. Then, when members join or leave the group, partitions must be reassigned to ensure each member gets a fair share of the partitions. This is called the rebalancing algorithm.

Note that a partition is processed by only one member of the consumer group. (But a member may be responsible for multiple partitions) This ensures strictly ordered processing. This toolkit is very useful. You can easily scale your processing by adding more workers, while Kafka handles distributed coordination problems.
Redis Stream Data Structure
How does Redis' "stream" data structure compare? Redis streams conceptually equate to a partition of the Kafka topic described above, but with some minor differences. It's a persistent, ordered event storage (similar to Kafka). It has a configurable maximum length (as opposed to retention period in Kafka). Events store keys and values, akin to Redis Hash (as opposed to single key-value in Kafka). The most significant difference is that consumer groups in Redis are entirely different from those in Kafka. In Redis, a consumer group is a set of processes that read from the same stream. Redis ensures that events are only delivered to one consumer within the group. For example, in the diagram below, Consumer 1 won't process '9', it will skip it because Consumer 2 has already seen it. Consumer 1 will get the next event not seen by any other group member.

The role of groups in Redis is to parallelize the processing of a single stream. This resembles a traditional job queue structure. Thus, it loses the ordering guarantee essential as a core of stream processing, which is unfortunate.
Stream Processing as a Client Library
So, if Redis only provides topics with job queue semantics effectively, how can we build a stream processing engine on top of Redis? Well, if you want Kafka's features, you need to build them yourself. That means implementing.
- Event partitioning. You need to create N streams and treat each stream as a partition. Then, upon sending, you need to decide which partition should receive it, probably based on the event's hash value or one of its fields.
- A worker partition assignment system. To scale and support multiple workers, you need to create an algorithm to distribute partitions among them, ensuring each worker has an exclusive subset, akin to Kafka's "rebalancing" system.
- Ordered processing with acknowledgment. Each worker needs to iterate through each partition, tracking its offset. Though Redis consumer groups have job queue semantics, they can help here. The trick is to have each group use one consumer, and then create a group for each partition. Then each partition will be processed sequentially, and you can leverage built-in consumer group state tracking. Redis can track not only offsets but also acknowledgments for each event, which is powerful.
This is the absolute minimum requirement. If you want your solution to be robust, you might also consider error handling: in addition to crashing your workers, perhaps you'd want a mechanism to forward errors to a "dead letter" stream and continue processing.
The good news is -- if you're a Python enthusiast -- these problems have been addressed and more in a newly released library called Runnel. If you want to benefit from Kafka-like semantics on Redis, you're welcome to check it out. Below is how it looks, essentially the same as one of the Kafka diagrams above.

Workers coordinate their ownership of partitions via locks implemented in Redis. They communicate with each other through a special "control" stream. For more information, including a detailed breakdown of the architecture and rebalancing algorithm, please refer to the Runnel documentation.
summary
Is Redis a good choice for large-scale event processing? There's a fundamental trade-off: because everything is in memory, you get unparalleled processing speed, but it's not suitable for storing unlimited amounts of data. With Kafka, you might be willing to retain all your events indefinitely, but with Redis, you're definitely storing a fixed window of the most recent events -- just enough for your processors to have a comfortable buffer in case they slow down or crash. This means you might also want to use an external long-term event store, such as S3, to be able to replay them, which adds complexity to your architecture but reduces costs.
The primary motivation for researching this issue is the ease of use and low cost involved in deploying and operating Redis. That's why it's attractive compared to Kafka. It's also a magical toolkit that stood the test of time, quite impressive. It turns out, with effort, it can also support the distributed stream processing paradigm.






