avatarVengateswaran Arunachalam

Summary

The website content outlines a comprehensive guide to building a real-time web traffic visualization tool using ClickHouse, Flask, and AWS.

Abstract

The article "From Clicks to Charts: Unveiling Web Traffic with ClickHouse, Flask & AWS" provides a detailed tutorial on creating a web traffic analytics tool. It emphasizes the advantages of using ClickHouse, a columnar database designed for high-performance analytics, to process and analyze large datasets efficiently. The guide includes steps for setting up ClickHouse on AWS, simulating web traffic data, and developing a Flask application to visualize the data. The Flask application interacts with ClickHouse to fetch and display web traffic information on a dashboard, demonstrating the scalability and speed of real-time data analytics. The article also includes code snippets for generating fake data, inserting it into ClickHouse, and creating an HTML template for the dashboard. The project serves as an example of the potential of ClickHouse in data engineering and the synergy between ClickHouse, Flask, and AWS in building scalable analytics solutions.

Opinions

  • The author believes in the power of columnar databases like ClickHouse for real-time analytics due to their scalability and performance with large datasets.
  • There is a strong endorsement for the open-source nature of ClickHouse, which provides flexibility and community support.
  • The author suggests that integrating ClickHouse with Flask and AWS offers a robust solution for web traffic visualization, which can be particularly beneficial for businesses looking to optimize user experience.
  • The article conveys enthusiasm for full-stack development and data engineering projects, inviting readers to explore these fields further.
  • The author encourages reader engagement and offers to assist with similar projects, providing contact channels for further collaboration.
  • The article concludes with a call to action for readers to follow and support the writer's work and the broader technology community through various platforms and newsletters.

From Clicks to Charts: Unveiling Web Traffic with ClickHouse, Flask & AWS!

Introduction

In today’s digital era, web traffic data stands as a testament to user behavior. Analyzing this data helps businesses optimize user experience and drive growth. This article delves deep into building a web traffic visualization tool using ClickHouse, Flask, and AWS, showcasing the power and speed of columnar databases in real-time data analytics.

1. Why ClickHouse?

ClickHouse is an open-source columnar database that’s designed to perform real-time analytical queries on large datasets. Its columnar nature means that instead of reading data row by row, it fetches data column by column, making aggregations significantly faster.

  • Scalability: Can easily scale horizontally by adding more nodes.
  • Real-time analytics: Designed for sub-second query responses.
  • Open Source: Provides transparency, flexibility, and a strong community.

2. Setting Up ClickHouse on AWS

a. Launching EC2 Instance: Start by provisioning an EC2 instance.

b. Installing ClickHouse: Once the instance is running:

curl https://clickhouse.com/ | sh

c. Secure the Database: By default, ClickHouse doesn’t require a password. It’s recommended to set one up and also limit outside access.

Starting the ClickHouse Server

After you have installed ClickHouse on your machine or AWS EC2 instance, you can start the ClickHouse server by running:

./clickhouse server

Accessing the ClickHouse Client

Once the server is up, you can interact with ClickHouse using its client. To access the ClickHouse client, run:

./clickhouse client

Creating a Table in ClickHouse

Once inside the ClickHouse client, you can execute SQL commands. Let’s create a web_traffic table as described previously:

CREATE TABLE web_traffic (
    date Date DEFAULT today(),
    time DateTime DEFAULT now(),
    ip_address String,
    user_agent String,
    page_visited String,
    response_code Int32,
    response_time Float32
) ENGINE = MergeTree()
ORDER BY (date, time);

Simulating Web Traffic Data

We have table in Clickhouse. Now its time to generate some fake data.

from clickhouse_driver import Client
import random
import time

client = Client('localhost')
pages = ['/home', '/about', '/contact', '/products', '/blog']

while True:
    log = {
        'ip_address': f"192.168.0.{random.randint(1, 255)}",
        'user_agent': 'Mozilla/5.0',
        'page_visited': random.choice(pages),
        'response_code': random.choice([200, 200, 200, 404, 500]),
        'response_time': round(random.uniform(0.1, 2.5), 2)
    }

    client.execute('INSERT INTO web_traffic (ip_address, user_agent, page_visited, response_code, response_time) VALUES', [log])
    time.sleep(10)

Simulate a web server log reader and push logs to ClickHouse:

Just to make sure, whether our code is pushing the data into our Clickhouse table or not. Just quickly run the select SQL against it.

Create the Flask Application

Here’s a basic Flask application that connects to ClickHouse and fetches the web traffic data:

from flask import Flask, render_template
from clickhouse_driver import Client

app = Flask(__name__)
client = Client('localhost')  # Adjust as needed if not localhost

@app.route('/')
def dashboard():
    query = 'SELECT date, time, ip_address, user_agent, page_visited, response_code, response_time FROM web_traffic ORDER BY time DESC LIMIT 100'
    results = client.execute(query)
    return render_template('dashboard.html', records=results)

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)

Create the HTML Template (dashboard.html)

Inside your project folder, create a folder named templates, and within that folder, create a file named dashboard.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Web Traffic Dashboard</title>
</head>
<body>
    <h1>Recent Web Traffic</h1>
    <table border="1">
        <thead>
            <tr>
                <th>Date</th>
                <th>Time</th>
                <th>IP Address</th>
                <th>User Agent</th>
                <th>Page Visited</th>
                <th>Response Code</th>
                <th>Response Time</th>
            </tr>
        </thead>
        <tbody>
            {% for record in records %}
            <tr>
                <td>{{ record[0] }}</td>
                <td>{{ record[1] }}</td>
                <td>{{ record[2] }}</td>
                <td>{{ record[3] }}</td>
                <td>{{ record[4] }}</td>
                <td>{{ record[5] }}</td>
                <td>{{ record[6] }}</td>
            </tr>
            {% endfor %}
        </tbody>
    </table>
</body>
</html>

This simple template will display the last 100 records from the web_traffic table in tabular form.

Run Flask Application

In your terminal, navigate to your project folder and execute:

python app.py

Now, open your browser and go to http://127.0.0.1:5000/ or your public IP address. You should see a table displaying the recent web traffic data.

This project underscores the potential of ClickHouse in processing vast amounts of data quickly. Coupled with Flask and AWS, we’ve built a scalable web traffic visualization tool, providing insights at a glance. The world of data analytics is vast; this is but a stepping stone into deeper explorations.

Excellent. Our code is perfectly working :)

If you found this article insightful and wish to delve deeper into full-stack development or data engineering projects, I’d be thrilled to guide and collaborate further. Feel free to reach out through the mentioned channels below, and let’s make technology work for your unique needs.

Contact Channels:

Thank you for embarking on this journey with me through the realms of real-time data processing. Looking forward to our future collaborations.

In Plain English

Thank you for being a part of our community! Before you go:

Python Programming
Clickhouse
Flask
AWS
Recommended from ReadMedium