This article provides six visualization techniques to handle ultra-long time-series data using Python.
Abstract
The article discusses the problem of visualizing ultra-long time-series data, which can result in a messy chart due to overlapping areas. The author suggests six visualization techniques using Python to make the results more reader-friendly and focus on important data points. These techniques include using interactive functions, changing the point of view, and using alternative visualizations such as box plots, heat maps, and radar charts. The article also provides code examples and references to other data visualization articles.
Opinions
The author suggests that a simple time-series plot can result in a messy chart due to overlapping areas when dealing with ultra-long time-series data.
The author believes that using interactive functions and changing the point of view can make the results more reader-friendly and focus on important data points.
The author provides code examples and references to other data visualization articles to support their suggestions.
The author encourages readers to leave comments and suggestions for other visualizations that can also be used to solve the problem.
6 Visualization Tricks with Python to Handle Ultra-Long Time-Series Data
Simple ideas using a few lines of Python code to deal with a long time-series plot
Typically, a time-series plot consists of an X-axis representing the timeline and a Y-axis showing data values. This visualization is common in showing the progress of data over time. It has some benefits in extracting insight information such as trends and seasonal effects.
There is a concern when dealing with an ultra-long timeline. Even though long time-series data can be easily fitted into a plotting area using data visualization tools, the result can be messy. Let’s compare the two samples below.
The first image shows daily temperature data in 2021. The second image shows daily temperature data from 1990–2021. Dublin Airport daily data from Met Éireann. Images by the author.
While we can see the details on the first chart, it can be noticed that the second one is too dense to read due to containing long time-series data. This has one major drawback in that some interesting data points may be hidden.
To solve the problem, this article will guide six simple techniques that help present long time-series data more efficiently.
An example of a method to deal with long time-series data. Image by the author.
Get Data
For example, this article will use Dublin Airport Daily Data, which contains meteorological data measured at Dublin Airport since 1942. The dataset consists of daily weather information, such as temperature, wind speed, pressure, etc.
For more information about Dublin Airport’s daily data, see the About the dataset section below.
Start with import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
%matplotlib inline
Fortunately, with a quick look, the dataset has no missing values.
Prepare data
We will work with the maximum and minimum temperature data. The period used is from 1990 to 2021, which is 32 years in total. If you want to select other variables or ranges, please feel free to modify the code below.
Create month, year, and month-year columns for use later.
Plot the time-series plot
From the DataFrame, the code below shows how to plot a basic time-series plot. The result can be compared later with other visualizations in this article.
plt.figure(figsize=(16,9))
sns.set_style('darkgrid')
sns.lineplot(data=df_temp, y='meantp', x ='date')
plt.show()
A time-series plot showing 32 years of the average daily temperature. Image by the author.
As previously mentioned, the obtained chart is too dense. In the next section, let’s see how we can deal with the problem.
Visualizations to handle ultra-long time-series data
6 simple tricks can be applied to present a long time-series plot:
#1 zoom in and zoom out
#2 focus on what matters
#3 draw lines
#4 use distribution
#5 group by and apply a color scale
#6 circle the line
Trick #1: Zoom in and zoom out
We can create an interactive chart in which the result can be zoomed in or zoomed out to see more details. This is a good idea to expand a dense area on the chart. Plotly is a helpful library that will help us create an interactive chart.
From the DataFrame we have, we can directly plot a simple interactive time-series plot with just one line of code.
px.line(df_temp, x='date', y='meantp')
Voila!!
A time-series plot with a zoom-in function using Plotly. Image by the author.
From the result, we can see the overall data while being able to zoom in on the area that we want to expand.
Trick #2: Focus on what matters
In case some values are needed to be paid attention to, highlighting the data points with markers can be a good solution. Adding scatters to an interactive plot has benefits in marking interesting or critical data points and zooming in to see more details.
Now let’s add scatters to the previous interactive plot. For example, we will focus on the average temperature higher and lower than 20.5°C and -5°C, respectively.
A time-series plot with markers and a zoom-in function using Plotly. Image by the author.
Trick #3: Draw lines
Like the previous technique, drawing lines can separate specific data values if some areas need to be focused on. For example, I will add two lines to separate the day with average temperatures higher and lower than 20.5°C and -5°C.
A time-series plot with lines and a zoom-in function using Plotly. Image by the author.
From the result, we can focus on data points above or under the lines.
Trick #4: Use distribution
A box plot is a method for demonstrating data distribution through their quartiles. The information on a box plot shows the locality, spread, and skewness. This plot is also helpful in distinguishing outliers, data points that stand out significantly from other observations.
Since the DataFrame is already prepared, we can directly plot the box plot with just one line of code.
px.box(df_temp, x='month_year', y='meantp')
Box plots showing data distributions with a zoom-in function using Plotly. Image by the author.
Trick #5: Group by and apply a color scale
Basically, this method converts a time-series plot into a heat map. The result will show the overall average monthly temperatures in which we can compare the magnitude of data by using the color scale.
To facilitate the plot, the DataFrame is needed to be converted into two dimensions. First, let’s group the DataFrame by year and month.
An interactive heat map shows the average monthly temperature. Image by the author.
Trick #6: Circle the line
When visualizing time-series data, it’s common to think about continuous lines moving over time. By the way, we can change the point of view. These lines can be plotted in a circular graphic, like moving them on a clock. In this case, a radar chart can be a good choice.
Theoretically, a radar chart is a visualization used to compare data in the same categories. We can apply the concept by plotting the months around the circle to compare the data values at the same time of the years.
Prepare a list of months, years, and colors for use in the next step.
months = [str(i) for i in list(set(df_mean.month))] + ['1']
years = list(set(df_mean.year))
pal = list(sns.color_palette(palette='viridis',
n_colors=len(years)).as_hex())
Use the for loop function to plot the lines on a radar chart.
Creating an interactive radar chart allows the result to be filtered, and the information can be shown by hovering the cursor over data points.
An interactive radar chart shows the average monthly temperature. Image by the author.
Summary
A time-series plot is a helpful chart that can extract insightful information such as trends or seasonal effects. However, showing ultra-long time-series data with a simple time-series plot can result in a messy chart due to the overlapping area.
This article has shown 6 visualization ideas to plot the long time-series data. We can make the result reader-friendly by using interactive functions and changing the point of view. Moreover, some methods also help focus on important data points.
Lastly, these methods are just some ideas. I am sure that there are other visualizations that can also be used to solve the problem. If you have any suggestions, please feel free to leave a comment.
Thanks for reading.
These are other data visualization articles that you may find interesting:
8 Visualizations with Python to Handle Multiple Time-Series Data (link)
9 Visualizations with Python that Catch More Attention than a Bar Chart (link)
9 Visualizations with Python to show Proportions instead of a Pie chart (link)
Maximizing Clustering’s Scatter Plot with Python (link)
About the dataset
Dublin Airport Daily Data is retrieved from www.met.ie, copyright Met Éireann. The dataset is published under Creative Commons Attribution 4.0 International (CC BY 4.0). Disclaimer from the source: Met Éireann does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.