Event Sourcing: Why Kafka is not suitable as an Event Store

Don’t believe everything a company or some consultants want to sell you!

Before we start, if you only have a rough idea about what Event Sourcing is, you might have a look at this great article from Alexey Zimarev.

Update: I just published my own intro to ES.

Event Sourcing explained

The internet contains many confusing articles that mix Event Sourcing with EDA and CQRS. Let’s see what it is and what not.

medium.com

It’s not like nobody has ever written about exactly this topic. Many experts in Event Sourcing, CQRS (Command Query Responsibility Segregation), and EDA (Event-Driven Architecture) have written about it. But here we are, I just read another bad article that pretends Kafka is a great event store for a Microservice architecture. It seems this needs to be repeated 1000 times until everybody hears it.

Chances are that you have read one of those horror stories about how a big Event Sourcing / CQRS “project” failed. In most of those articles, you will find out that they used Kafka as an event store and shit hit the fan massively. To be fair, I have also read one such blog post that was very well written and also explained the advantages of Event Sourcing and which features they will miss, now that they revert to classical storage of full state instead of events. Sadly they drew the wrong conclusion and dumped ES in total, instead of just moving to a suitable event store. I read 20 times more bad articles about the topic, so let’s go.

Before I continue, please note how the title says “not suitable” and not “completely impossible”. I’ll come back to that later.

Event Sourcing is not a top-level architecture

This is a quote from Greg Young — Mr. Event Sourcing and CQRS himself — and here’s the first problem. Many projects that try to use Kafka as an event store build everything around Kafka. They try to develop an “event-sourced, event-driven microservices architecture”. This is compelling because you want to leverage your super powerful and complicated silver bullet solution as much as you can, right?

And Kafka even has one great attribute — you only have one storage/source of truth for your data (events) and your messages (published events) so that you don’t have to solve the problem of distributed transactions while other people might have to implement the outbox pattern or find another solution.

What they miss, though, is that now they have to event source everything, even parts of the application where Event Sourcing is not a good match. If they implement some parts with stateful storage they win nothing — they need to solve the problem of storing data and messages atomically if they don’t want to risk losing one or the other if things go south.

What they also miss is that such a solution is nothing but a distributed monolith with the death star Kafka to bind them all. Every service can read all the events of every other service, that’s no different than having a massive Oracle DB in the center of the world. Eventually, everything will be coupled and changes here can break things there.

Kafka lacks atomic writes

Imagine one command/request/use case results in multiple events. To keep your data consistent you must store them atomically — either one or nothing (think about ACID). As far as I can tell there is no such transaction-like mechanism in Kafka to guarantee atomicity, so we could end up with partially lost or at least delayed events, leaving the application in an inconsistent state.

Update: I missed that Kafka has some sort of transactions now. I had a quick look at Confluent’s blog how it works. Just look at the length of this blog post or where it says that picking a transactional.id “is a bit tricky”. In contrast, with EventStoreDB I just write multiple events without any extra config or tricks and that’s it.

Optimistic concurrency control?

Every application state that a human, or another service, “sees” is already stale. At least some nanoseconds will have passed since the data was requested from an event store (or any other storage). For that reason, it’s crucial that some sort of concurrency control can be implemented. If the system’s state has changed between reading the state and persisting the decision, the update must fail to guarantee that all business rules have been followed.

Typically, it’s better to implement optimistic concurrency control. That means no data locking happens before the read and write, but the write request detects that the data has changed in-between and rejects the update. In most systems, those concurrency conflicts are relatively rare, and optimistic concurrency control is much cheaper than pessimistically locking the data.

Kafka has no mechanism for either optimistic or pessimistic concurrency control. The only way to work around that is to introduce another sort of locking, for example with some key-value store like Redis. While this is possible, it’s error-prone, adds complexity, and nullifies the whole (bad) idea of having one Kafka to rule them all.

Update: I was also informed that Kafka has a way of locking a whole partition and in combination with the transactions mentioned above you can have pessimistic concurrency control. When I rolled my own event store with PostgreSQL we only had to lock one event stream (e.g. for an aggregate/entity) and I bet it was a tid bit simpler than the Kafka solution.

Global ordering of events

In a (partially) event-sourced system you often want to build some read models that aggregate events from multiple streams, or multiple event types. As a quick example, to aggregate what’s on stock (in a warehouse or whatever) you need to aggregate all event types that add and remove commodities to the stock, for each type of commodity. To properly project all the events into the proper current state for any point in time they need to be in the right order. Imagine you want to ask if any commodity was ever below X items. If the ordering of events is wrong and you project an “ItemsRemovedFromStock” event before an “ItemsAddedToStock” event that happened first, you might get a wrong answer.

The problem with Kafka is that it only guarantees the order within partitions, not cross-partition, which leaves you with solving the ordering problem in some other way. And again, now you need to add complexity to solve a problem that you only have because you wanted to have a jack-of-all-trades service.

So, Event Sourcing is a bad idea?

Hell no, you should love it!!!

If done right, used for the problems it matches, and not used as a silver bullet it is a wonderful principle that offers many benefits over classical, stateful systems! That’s not the topic of this article, but some of those benefits are:

great changeability of implementation details of your domain model
excellent debugability — you can see which event introduced an error
time travel — you can see in which state the system was at any given point in the past
temporal features — your code can make decisions not only based on what has happened but also when, how often, and in which order it has happened
free audit — who made a change, when, and based on which state at that time
you won’t need an ORM that adds accidental complexity

Event Sourcing at its core is just a better way to store the state of a system!

It is not an “event-driven architecture”, it is not “Microservices”, and it is orthogonal to CQRS!

If not Kafka, what else?

https://www.eventstore.com/
https://martendb.io/
https://github.com/Eventuous/eventuous
https://eventide-project.org/
there are many more open-source libraries out there, just search and evaluate!
roll your own event store, probably using PostgreSQL with its strong JSONB data type

Warning: You probably should not roll your own for a production system, especially if you are new to Event Sourcing (actually, I did that two times, without any issues, but I still don’t recommend it). If your team has the time to do that in some “learning” session, it’s a great Kata to learn about ES.

You don’t have to take my word for it

I have written the content above out of my head. I might have missed some points, I might be slightly off with some. Generally, it’s a good idea to not trust the opinions of a single dude on the internet.

So here is what some highly respected people in the ES/DDD community have to say:

And another one — this is a long list of articles and videos Oskar has collected — I have not read/viewed them all but generally we should trust Oskar ;-)

Not all of them are about Kafka, but ES antipatterns in general

https://github.com/oskardudycz/EventSourcing.NetCore#this-is-not-event-sourcing

But, but … you mentioned it is not impossible?

I wanted to avoid this, but OTOH I want to play fair. It might be possible, there are tons of articles like this from Confluent, who are earning money by making people use Kafka as a jack-of-all-trades. It might work in many projects, but I bet one buck the chance of running into serious problems that finally cause you to write another “how event sourcing failed us” articles is very real. An article like this one.

The point is, why would you want to spend your time with accidental complexity because you choose a tool that is not built for the job, instead of picking a suitable Event Store and solve the inherent problems of your domain? Or with Alexey’s words:

Tl;dr

Repeat with me: Kafka is not suitable as an Event Store! Just use the right tool for the job!

Thank you for your time and attention! :-)

Questions and comments are very welcome and I would be very happy if you clap for this article (if you like it) or even follow me here on Medium or Twitter!

Join Medium with my referral link - Anton Stöckl

Read each of my stories and the ones of other great tech writers! Your membership fee directly supports me and other…

medium.com