Setting Up Your Local Event-Driven Environment Using Kafka Docker

Spin up a Kafka cluster in a Docker container and learn how to create topics, produce and consume messages

Photo by Jelleke Vanooteghem on Unsplash

Introduction

Event-driven architecture is one of the modern architectures that are implemented in many applications today. There are many tools developed over the past few years to support this kind of architecture, for example, AWS SNS, SQS, RabbitMQ, and Apache Kafka.

The main purpose of an event-driven architecture is to decouple your services by having a message queue or queues the services will publish to and poll or consume from. This way, we can replace the producer or the consumer pretty easily as they are decoupled from each other.

In this piece, we’re going to set up a local Kafka Docker and use its CLI tools to do the basic operations including creating a topic, publishing messages, and consuming them.

The Docker images we will be using are as follows:

confluentinc/cp-zookeeper:5.4.0
confluentinc/cp-server:5.4.0
confluentinc/cp-kafka:5.4.0

Prerequisite

The only prerequisite to follow this tutorial is Docker. You might want to follow along in the official installation instructions.

Write a Docker Compose File

The first thing we’ll do is create a docker-compose.yml file. The advantage of using a compose file is that we can group together the services that compose our application per se. We can then spin up all the containers defined in that file using one command: docker-compose up.

An alternative would be to use the docker run command, but that would mean we need to execute it three times, one for each Docker image we want to spin up a container from.

As mentioned in the Introduction, we will have three Docker containers running. So, our compose file will consist of three services with the following names: zookeeper, broker and kafka-tools.

Our local Kafka cluster is built by the zookeeper and broker containers. broker is Apache Kafka and it relies on Apache Zookeeper, which is why we specify dependency in the compose file.

services:
  ...
  ...
  broker:
    ...
    ...
    depends_on:
      - zookeeper

Lastly, we have the kafka-tools container, which contains the command-line tools that we can use to interact with the broker.

Notice that for the kafka-tools container, we added network_mode: "host". This is so that from inside the container, localhost is interpreted as the Docker host on our machine.

This is so that we can connect to the broker easily as it exposes itself on localhost:9092. We’ll see more about this later.

Start the Containers

We are going to start the three containers we defined in the docker-compose.yml file. Go to your terminal, change directory to where you created the docker-compose.yml file, and execute the following command.

~/demo/kafka-local ❯ ls -l
total 8
-rw-r--r--  1 billyde  staff  1347 12 Feb 23:06 docker-compose.yml

~/demo/kafka-local ❯ docker-compose up -d
Creating network "kafka-local_default" with the default driver
Creating kafka-tools ... done
Creating zookeeper   ... done
Creating broker      ... done

Finally, to check whether all the containers are running, run this command.

~/demo/kafka-local ❯ docker ps
CONTAINER ID        IMAGE                             COMMAND                  CREATED             STATUS              PORTS                                        NAMES
748c5da81038        confluentinc/cp-server:5.4.0      "/etc/confluent/dock…"   5 seconds ago       Up 4 seconds        0.0.0.0:9092->9092/tcp                       broker
5044ae334235        confluentinc/cp-zookeeper:5.4.0   "/etc/confluent/dock…"   5 seconds ago       Up 4 seconds        2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp   zookeeper
ff1f9070dc42        confluentinc/cp-kafka:5.4.0       "tail -f /dev/null"      5 seconds ago       Up 4 seconds        9092/tcp                                     kafka-tools

If you see all of the containers are listed there and their STATUS is Up xx seconds, then we’re good. Happy days!

Create a Kafka Topic

Just like any message broker, Kafka operates by having messages sent to topics. Each topic usually indicates a particular event, for example, the user-activity topic, in which the messages will be related to user activities.

Let’s see how we can create a topic using the cli tools.

First, we need to get into the kafka-tools container because that’s where our Kafka cli tools reside. To do this, execute the following command from your terminal.

~/demo/kafka-local ❯ docker exec -ti kafka-tools bash
root@kafka-tools:/#

If you see root@kafka-tools:/#, you’re in! Let’s create a topic, and we’ll call it to-do-list because it will contain the list of things we need to do.

root@kafka-tools:/# kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 2 --topic to-do-list
root@kafka-tools:/#

The command actually returns nothing, and nothing means good. To check whether the topic was actually created or not, we can run the following command.

root@kafka-tools:/# kafka-topics --list --bootstrap-server localhost:9092
__confluent.support.metrics
_confluent-license
_confluent-metrics
to-do-list

Great stuff, we can see that our topic to-do-list did get created. For more detailed information about the topic, we can run the following command.

root@kafka-tools:/#  kafka-topics --describe --bootstrap-server localhost:9092 --topic to-do-list
Topic: to-do-list PartitionCount: 2 ReplicationFactor: 1 Configs:
 Topic: to-do-list Partition: 0 Leader: 1 Replicas: 1 Isr: 1
 Topic: to-do-list Partition: 1 Leader: 1 Replicas: 1 Isr: 1

Alright, before moving on to the next section, let’s discuss a few things here.

--bootstrap-server localhost:9092: Remember this from the earlier section? If we didn’t add network_mode: "host" to the kafka-tools service, it would not be able to connect to the broker. Instead, this is what we would see.

[2020-02-12 12:40:27,990] WARN [AdminClient clientId=adminclient-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

--replication-factor 1: This specifies how many replications of the topic partitions we want to create. Replication is for fault-tolerance purposes. For instance, if we specify the factor to be 3, that means there will be three servers that contain the same data (topic, its partitions, and its messages). One of the three servers will be the leader and the remaining will be the followers.
--partitions 2: The name is self-descriptive, but this basically allows us to set how many partitions we want to have for our topic. Whenever our Kafka producer sends a message or record to this topic, the record will only be present in one of the available partitions. The placement of the record in the partition is based on the record’s key.
--topic to-do-list: As you might have guessed, this argument specifies the name of the topic we want to create.

Send Messages to the Topic

Our topic is ready, which means we can put some messages into it. To do this, we are going to use kafka-console-producer. Let’s take a look at how we can send key-value messages.

root@kafka-tools:/# kafka-console-producer --broker-list localhost:9092 --topic to-do-list --property "parse.key=true" --property "key.separator=:"
>1:Wash dishes
>2:Clean bathroom
>3:Mop living room

The option --property parse.key=true tells the producer that we want to send a message with a key. Additionally, we also need to tell the producer what separator will be used to separate the key from the value (or the message itself), which is specified by this option --property "key.separator=:".

We can now close the producer as we finish sending messages. Just press ctrl + c from your keyboard to terminate it.

Consume Messages from the Topic

We have published some messages in the previous topic, so let’s see if we can consume them. To do this, we are going to use kafka-console-consumer.

root@kafka-tools:/# kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic to-do-list --property "print.key=true"
1 Wash dishes
3 Mop living room
2 Clean bathroom

Perfect! We can see all three messages we produced in the previous section.

A few things to note:

--from-beginning: Kafka consumer tracks its message consumption by offset. This option tells the consumer to start consuming from the earliest offset available if it does not already have an established offset (which, in this case, it doesn’t).
--property "print.key=true": This option tells the consumer to print the key of the messages to the console. If not specified, the consumer simply will output only the messages.

Same as the producer, to terminate the consumer, just press ctrl + c.

Wrap Up

In this tutorial, we have learnt how to spin up a local Kafka cluster in a Docker container. Additionally, we also did some basic operations, including creating a topic, and producing and consuming messages — all done via the Kafka command-line tools.

By now, we should all have a basic understanding of how Kafka works. I really hope that this inspires you to explore more and build event-driven applications with Kafka.

It’s not uncommon for a Kafka application to have DynamoDB coupled with it. For this reason, you might be interested in setting up a local instance of DynamoDB as well. You can check out this tutorial for that.

Also, check out Avro and Schema Registry. They are awesome tools that will allow you to define schemas that tell your producers and consumers how to serialise and deserialise (respectively) Kafka messages from and to your POJOs or data classes.

Reference

Docker compose file https://github.com/confluentinc/examples/blob/5.4.0-post/cp-all-in-one/docker-compose.yml

Official Kafka doc https://kafka.apache.org/quickstart