avatarVengateswaran Arunachalam

Summary

The website content outlines a tutorial on building a real-time analytics dashboard using Flask and Kafka on an AWS EC2 instance.

Abstract

The provided content delves into the creation of a real-time analytics dashboard by integrating Apache Kafka with a Flask web application. It begins with an introduction to the growing need for real-time analytics and introduces Flask and Kafka as key technologies for the task. The tutorial covers the prerequisites, such as an AWS account and basic knowledge of Python, Flask, and Kafka, and guides the reader through setting up Kafka on an AWS EC2 instance, including the installation and configuration of Kafka services. It also demonstrates how to implement Kafka producers and consumers in Python, followed by integrating Kafka with Flask to serve real-time data to a web frontend. The article concludes with a reflection on the importance of real-time analytics in the digital age and an invitation for further collaboration on technology projects.

Opinions

  • The author emphasizes the importance of real-time analytics in today's data-driven environment.
  • Flask is highlighted for its utility in microservices and rapid prototyping, particularly for web interfaces that require real-time data processing.
  • Kafka is presented as a robust solution for handling real-time event streaming, capable of managing large volumes of data efficiently.
  • The tutorial advocates for the use of AWS EC2 as a cost-effective platform for deploying such real-time analytics solutions.
  • The author encourages readers to engage with them through various contact channels for further exploration into full-stack development or data engineering projects.
  • There is an endorsement for a cost-effective AI service, ZAI.chat, as an alternative to ChatGPT Plus (GPT-4), suggesting it as a valuable tool for those interested in AI technology.

Building a Real-time Analytics Dashboard with Flask and Kafka on AWS EC2

Kafka — Flask Real Time Analytics

Introduction:

Briefly explain the rise in real-time analytics in today’s data-driven world. Introduce Flask as a web framework and Kafka as a real-time event streaming platform, and highlight the intention to deploy it on AWS EC2.

Below is the high level flow which we are trying to achieve in this use case.

Prerequisites:

  • An active AWS account
  • Basic knowledge of Python, Flask, and Kafka
  • A running EC2 instance (you can link to AWS’s official documentation for setting up an EC2 instance)

In this use case, we have used Micro instance type which is sufficient for learning purpose.

Setup Apache Kafka on AWS EC2:

  1. Install Dependencies:
  • Install Java: sudo apt-get install default-jre

2. Download and Extract Kafka:

3. Start Kafka Services:

  • Start ZooKeeper: ./bin/zookeeper-server-start.sh config/zookeeper.properties
  • Start Kafka broker: ./bin/kafka-server-start.sh config/server.properties

Kafka Producer and Consumer in Python

Setting Up Kafka Producer:

Role of a Kafka Producer: A Kafka producer sends data to Kafka topics. Data can be any type of events, messages, readings, or logs. The purpose of a producer is to collect data from various sources and send them to a topic without worrying about the data being consumed or the consumers that process this data.

Kafka Producer Example:

  1. Dependencies: First, we need to install the Kafka library for Python:
pip install kafka-python

2. Producer Code:

from kafka import KafkaProducer
import time
import json
import random

producer = KafkaProducer(
    bootstrap_servers='YOUR_EC2_IP_ADDRESS:9092',  # Replace with your EC2 IP address.
    value_serializer=lambda v: json.dumps(v).encode('utf-8')
)

while True:
    data = {
        'timestamp': time.time(),
        'value': random.randint(1, 100)
    }
    print(f"Producing data: {data}")  # To see the produced data in the console.
    producer.send('realtime-analytics', value=data)
    time.sleep(1)
  • This script initializes a Kafka producer, which connects to the Kafka instance on your EC2.
  • It then continuously produces random data (a timestamp and a random value between 1 and 100) to the realtime-analytics topic.

Setting Up Kafka Consumer:

Role of a Kafka Consumer: A Kafka consumer fetches data from one or more Kafka topics. A consumer is typically part of a consumer group. When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.

Kafka Consumer :

  1. Consumer Code:
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    'realtime-analytics',
    bootstrap_servers='YOUR_EC2_IP_ADDRESS:9092',  # Replace with your EC2 IP address.
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    print(f"Consumed data: {message.value}")
  • This script initializes a Kafka consumer, which connects to the Kafka instance on your EC2.
  • It then starts listening to the realtime-analytics topic and prints each message it consumes.

Integrating Kafka with Flask:

Flask is a lightweight web framework for Python, which is particularly useful for microservices architecture and rapid prototyping. We’re going to leverage Flask to serve the data consumed from Kafka to a frontend dashboard in real-time.

Setting Up Flask:

  1. Dependencies:

Before we dive into the code, ensure you have Flask installed:

pip install flask

2. Building the Flask Application:

We’ll create a Flask application that starts a Kafka consumer in a separate thread to ensure the main thread (serving the HTTP requests) is not blocked:

from flask import Flask, jsonify
from kafka import KafkaConsumer
import threading
import json

app = Flask(__name__)

# List to hold the data for our dashboard.
dashboard_data = []

consumer = KafkaConsumer(
    'realtime-analytics',
    bootstrap_servers='YOUR_EC2_IP_ADDRESS:9092',  # Replace with your EC2 IP address.
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

def kafka_consumer():
    global dashboard_data
    for message in consumer:
        dashboard_data.append(message.value)
        if len(dashboard_data) > 10:  # Keeping the last 10 data points.
            dashboard_data.pop(0)

threading.Thread(target=kafka_consumer).start()

@app.route('/data', methods=['GET'])
def get_data():
    return jsonify(dashboard_data)

@app.route('/')
def index():
    return "Real-time Analytics Dashboard"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)  # This will make the Flask app accessible via the EC2's public IP.

This Flask application listens for incoming HTTP requests and serves the last 10 messages consumed from Kafka when you access the /data endpoint.

Running the Flask App on EC2:

With the Flask application ready, start the application on your EC2 instance:

python your_flask_app_filename.py

Note: Ensure the security group of the EC2 instance allows inbound traffic on port 5000.

Access the Flask app via the public IP:

http://YOUR_EC2_IP_ADDRESS:5000/

Awesome !!!. Now we are able to see the real time dasshboard analytics on UI.

Conclusion:

In today’s fast-paced digital world, the demand for real-time analytics has never been greater. With the power of Kafka, we can effortlessly stream vast amounts of data in real-time. Flask, with its simplicity and flexibility, then enables us to present and analyze this data through web interfaces, making it accessible to a wide range of users.

If you found this article insightful and wish to delve deeper into full-stack development or data engineering projects, I’d be thrilled to guide and collaborate further. Feel free to reach out through the mentioned channels below, and let’s make technology work for your unique needs.

Contact Channels:

Thank you for embarking on this journey with me through the realms of real-time data processing. Looking forward to our future collaborations.

In Plain English

Thank you for being a part of our community! Before you go:

Kafka Streams
Kafka
Flask
Python
AWS
Recommended from ReadMedium