Make a mock “real-time” data stream with Python and Kafka
A Dockerized tutorial with everything you need to turn a .csv file of timestamped data into a Kafka stream

With more and more data science work moving towards real-time pipelines, data scientists are in need of learning to write streaming analytics. While some great, user-friendly, streaming data pipeline tools exist (my obvious favorite being Apache Kafka.) It’s hard to develop the code for a streaming analytic without having a friendly dev environment that actually produces a data stream you can test your analytics on.
A simple recipe for a real-time data stream
This post will walk through deploying a simple Python-based Kafka producer that reads from a .csv file of timestamped data, turns the data into a real-time (or, really, “back-in-time”) Kafka stream, and allows you to write your own consumer for applying functions/transformations/machine learning models/whatever you want to the data stream.
Ingredients
All materials are available in my GitHub time-series-kafka-demo repo. To follow along, clone the repo to your local environment. You can run the example with only Docker and Docker Compose on your system.
The repo has a few different components:
- A Dockerfile that can be used to build a Docker image for this tutorial (optionally, if you don’t want to install the requirements locally)
- A Docker Compose file to run Kafka and Zookeeper (Kafka’s friend)
- An example csv data file showing the format for input timestamped data
- The Python script for producing the data file
- The Python script that reads the data and produces the messages to Kafka
- An example Kafka consumer in Python that prints the data to screen
Directions
Clone the repo and cd into directory.
git clone https://github.com/mtpatter/time-series-kafka-demo.git
cd time-series-kafka-demoStart the Kafka broker and Zookeeper
The Compose file pulls Docker images for Kafka and Zookeeper version 6.2.0 from Confluent’s Docker Hub repository. (Gotta pin your versions!)
docker compose upThis starts both Kafka and Zookeeper on the same Docker network for talking to each other. The Kafka broker will be accessible on port 9092 locally, since the Compose file binds the local port to the internal image port.
Build a Docker image (optionally, for the producer and consumer)
If you’re not wanting to install the Python modules in the requirements.txt file, you can use a Docker image for the producer and consumer scripts.
From the main root directory:
docker build -t "kafkacsv" .This command should now work:
docker run -it --rm kafkacsv python bin/sendStream.py -hStart a consumer
We’ll start a consumer first for printing all messages in mock “real time” from the stream “my-stream”. The reason why we’re starting the consumer before the producer is that the producer will reproduce all the “pauses” in time between each of the timestamped data points. If you start the consumer after the producer, the consumer will process all the messages that are already in the queue immediately. But go ahead and do that if you like. ¯\_(ツ)_/¯
python bin/processStream.py my-streamor with Docker:
docker run -it --rm \
-v $PWD:/home \
--network=host \
kafkacsv python bin/processStream.py my-streamThe relevant code for the main function of the consumer is below. Note that I catch the unknown topic error message and let the consumer create the new topic. Note also that the keyboard interrupt will help you shut down the consumer for this example.





