From Clicks to Charts: Unveiling Web Traffic with ClickHouse, Flask & AWS!
Introduction
In today’s digital era, web traffic data stands as a testament to user behavior. Analyzing this data helps businesses optimize user experience and drive growth. This article delves deep into building a web traffic visualization tool using ClickHouse, Flask, and AWS, showcasing the power and speed of columnar databases in real-time data analytics.
1. Why ClickHouse?
ClickHouse is an open-source columnar database that’s designed to perform real-time analytical queries on large datasets. Its columnar nature means that instead of reading data row by row, it fetches data column by column, making aggregations significantly faster.
- Scalability: Can easily scale horizontally by adding more nodes.
- Real-time analytics: Designed for sub-second query responses.
- Open Source: Provides transparency, flexibility, and a strong community.
2. Setting Up ClickHouse on AWS
a. Launching EC2 Instance: Start by provisioning an EC2 instance.
b. Installing ClickHouse: Once the instance is running:
curl https://clickhouse.com/ | sh
c. Secure the Database: By default, ClickHouse doesn’t require a password. It’s recommended to set one up and also limit outside access.
Starting the ClickHouse Server
After you have installed ClickHouse on your machine or AWS EC2 instance, you can start the ClickHouse server by running:
./clickhouse server
Accessing the ClickHouse Client
Once the server is up, you can interact with ClickHouse using its client. To access the ClickHouse client, run:
./clickhouse client
Creating a Table in ClickHouse
Once inside the ClickHouse client, you can execute SQL commands. Let’s create a web_traffic
table as described previously:
CREATE TABLE web_traffic (
date Date DEFAULT today(),
time DateTime DEFAULT now(),
ip_address String,
user_agent String,
page_visited String,
response_code Int32,
response_time Float32
) ENGINE = MergeTree()
ORDER BY (date, time);
Simulating Web Traffic Data
We have table in Clickhouse. Now its time to generate some fake data.
from clickhouse_driver import Client
import random
import time
client = Client('localhost')
pages = ['/home', '/about', '/contact', '/products', '/blog']
while True:
log = {
'ip_address': f"192.168.0.{random.randint(1, 255)}",
'user_agent': 'Mozilla/5.0',
'page_visited': random.choice(pages),
'response_code': random.choice([200, 200, 200, 404, 500]),
'response_time': round(random.uniform(0.1, 2.5), 2)
}
client.execute('INSERT INTO web_traffic (ip_address, user_agent, page_visited, response_code, response_time) VALUES', [log])
time.sleep(10)
Simulate a web server log reader and push logs to ClickHouse:
Just to make sure, whether our code is pushing the data into our Clickhouse table or not. Just quickly run the select SQL against it.
Create the Flask Application
Here’s a basic Flask application that connects to ClickHouse and fetches the web traffic data:
from flask import Flask, render_template
from clickhouse_driver import Client
app = Flask(__name__)
client = Client('localhost') # Adjust as needed if not localhost
@app.route('/')
def dashboard():
query = 'SELECT date, time, ip_address, user_agent, page_visited, response_code, response_time FROM web_traffic ORDER BY time DESC LIMIT 100'
results = client.execute(query)
return render_template('dashboard.html', records=results)
if __name__ == '__main__':
app.run(host='0.0.0.0', debug=True)
Create the HTML Template (dashboard.html)
Inside your project folder, create a folder named templates
, and within that folder, create a file named dashboard.html
:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Web Traffic Dashboard</title>
</head>
<body>
<h1>Recent Web Traffic</h1>
<table border="1">
<thead>
<tr>
<th>Date</th>
<th>Time</th>
<th>IP Address</th>
<th>User Agent</th>
<th>Page Visited</th>
<th>Response Code</th>
<th>Response Time</th>
</tr>
</thead>
<tbody>
{% for record in records %}
<tr>
<td>{{ record[0] }}</td>
<td>{{ record[1] }}</td>
<td>{{ record[2] }}</td>
<td>{{ record[3] }}</td>
<td>{{ record[4] }}</td>
<td>{{ record[5] }}</td>
<td>{{ record[6] }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</body>
</html>
This simple template will display the last 100 records from the web_traffic
table in tabular form.
Run Flask Application
In your terminal, navigate to your project folder and execute:
python app.py
Now, open your browser and go to http://127.0.0.1:5000/
or your public IP address
. You should see a table displaying the recent web traffic data.
This project underscores the potential of ClickHouse in processing vast amounts of data quickly. Coupled with Flask and AWS, we’ve built a scalable web traffic visualization tool, providing insights at a glance. The world of data analytics is vast; this is but a stepping stone into deeper explorations.
Excellent. Our code is perfectly working :)
If you found this article insightful and wish to delve deeper into full-stack development or data engineering projects, I’d be thrilled to guide and collaborate further. Feel free to reach out through the mentioned channels below, and let’s make technology work for your unique needs.
Contact Channels:
Thank you for embarking on this journey with me through the realms of real-time data processing. Looking forward to our future collaborations.
In Plain English
Thank you for being a part of our community! Before you go:
- Be sure to clap and follow the writer! 👏
- You can find even more content at PlainEnglish.io 🚀
- Sign up for our free weekly newsletter. 🗞️
- Follow us: Twitter(X), LinkedIn, YouTube, Discord.
- Check out our other platforms: Stackademic, CoFeed, Venture.