The article provides a comprehensive guide on creating an interactive datetime filter for time-series data using Pandas and Streamlit in Python.
Abstract
The article titled "Creating an Interactive Datetime Filter with Pandas and Streamlit" focuses on the implementation of a visual datetime filter for time-series data analysis in Python. It introduces Pandas as a robust tool for data manipulation and Streamlit as a versatile framework for developing web applications. The author explains the importance of filtering time-series data effectively and demonstrates how to use these tools to create an interactive filter that can be visualized in real-time. The article includes code snippets for setting up the necessary packages, describes the dataset used, and details the steps to implement the datetime filter, including formatting datetime, using slider widgets, and downloading the filtered dataset as a CSV file. The final Streamlit app integrates all components to provide a user-friendly interface for filtering and visualizing time-series data.
Opinions
The author expresses a high regard for Pandas, describing it as an agile, efficient, and user-friendly tool for data manipulation in Python.
Streamlit is portrayed as a powerful and multifaceted tool that simplifies the process of creating interactive web applications for data science purposes.
The author assumes familiarity with Pandas among readers but anticipates that many may be unfamiliar with Streamlit, suggesting it is an emerging technology in the data science community.
The article emphasizes the practicality of the interactive datetime filter for everyday data analysis tasks involving time-series data.
The author provides a subjective opinion on the ease of use and efficiency of the proposed solution, implying that it can significantly enhance the data analysis workflow.
By offering a downloadable CSV file of the filtered dataset, the author acknowledges the importance of data portability and sharing in collaborative environments.
The inclusion of affiliate-linked courses on Streamlit and Python data visualization suggests the author's endorsement of these resources for further learning.
Creating an Interactive Datetime Filter with Pandas and Streamlit
Implementing a visual datetime filter for time-series data in Python
Introduction
Perhaps the most proliferated type of data that we grapple with on a daily basis is time-series data. Basically, anything that is indexed using date, time, or both can be considered as a time-series dataset. And more often than not, you may require to filter your time-series data with, well, date and time themselves. Filtering your data frame based on any other form of index is a rather trivial task; the same cannot be stated about datetime however, especially when the date and time are quoted in different columns. Even after you manage to filter them, it is another task to apply it to your data frame and instantaneously visualize it.
Luckily we have Pandas and Streamlit to assist us in this regard in order to create and visualize interactive datetime filters. I assume that most of us are more than familiar with Pandas and possibly use it routinely in our data lives, but I suspect that many are unacquainted with Streamlit given that it is the new kid around the block. Regardless I am going to tender a quick introduction to both lest anyone asks.
Pandas
Pandas is arguably the most agile, efficient, flexible, robust, resilient, and user-friendly binding when it comes to wrestling with data in Python. And in case you think I threw in far too many hyperboles into that previous sentence, then you have greatly underestimated Pandas. This mighty toolkit gives you the ability to manipulate, mutate, transform and not least visualize data in frames, all with a couple of lines of code. In this application, we will use Pandas to read/write our data from/into a CSV file and to resize our data frames based on selected start and end dates/times.
Streamlit
Streamlit, as characterized by its founders themselves, is a pure Python API that allows you to create machine learning applications. Wrong. It is actually a lot more than that. Streamlit is a web framework, a quasi-port forwarding proxy server, and a frontend UI library all mixed into one bundle of goodness. Simply put you can develop and deploy countless web apps (or local apps) for a whole slew of purposes. For our application, we will be utilizing Streamlit to render an interactive sliding filter for our time-series data that will also be visualized instantaneously.
Packages
Without further ado, let’s go ahead and insert the stack of packages we’ll be using.
And in case you need to install any of the above packages, please proceed by using ‘pip install’ in Anaconda prompt.
pip install streamlit
Dataset
We will be using this randomly generated dataset — CC0 (No Rights Reserved, Public Domain) [1], which has a column for the date, time, and value as shown below.
The date is formatted as follows:
YYYYMMDD
While the time is formatted as:
HHMM
You can format your datetime with any other formatting that suits your needs, but you will have to make sure that you declare it in your script as explained in the proceeding section.
Datetime Filter
In order to implement our filter, we will use the following function that takes as arguments — message and df which correspond to the message displayed by the slider widget and the raw data frame that needs to be filtered.
Initially, we will invoke the Streamlit slider widget which is documented as follows.
label (str or None) — A short label explaining to the user what this slider is for.
min_value (a supported type or None) — The minimum permitted value. Defaults to 0 if the value is an int, 0.0 if a float, value — timedelta(days=14) if a date/datetime, time.min if a time
max_value (a supported type or None) — The maximum permitted value. Defaults to 100 if the value is an int, 1.0 if a float, value + timedelta(days=14) if a date/datetime, time.max if a time
value (a supported type or a tuple/list of supported types or None) — The value of the slider when it first renders. If a tuple/list of two values is passed here, then a range slider with those lower and upper bounds is rendered. For example, if set to (1, 10) the slider will have a selectable range between 1 and 10. Defaults to min_value.
step (int/float/timedelta or None) — The stepping interval. Defaults to 1 if the value is an int, 0.01 if a float, timedelta(days=1) if a date/datetime, timedelta(minutes=15) if a time (or if max_value — min_value < 1 day)
Please note that our slider will return two values, i.e. the start datetime and end datetime values. Therefore we must declare the initial value of the slider using an array as:
[0,len(df)-1]
And we must equate the widget to two variables as shown below, i.e. the start and end datetime indices that will be used to filter the data frame:
Subsequently, we need to remove any trailing decimal places from our start/end time column and add leading zeroes in case the time is less than a whole hour, i.e. 12:00AM quoted as 0, as shown below:
while len(str(df.iloc[slider_1][1]).replace('.0','')) < 4:
df.iloc[slider_1,1] = '0' + str(df.iloc[slider_1][1]).replace('.0','')
Then we need to append our date to time and parse our datetime in a format that is comprehensible by using the datetime.strptime binding in Python as shown below:
In order to use other datetime formatting’s please refer to this article. Finally, we will display the selected datetimes and will apply the filtered indices to our dataset as shown below:
You may find it convenient to download your filtered data frame as a CSV file. If so please use the following function to create a downloadable file in your Streamlit app.
This function’s arguments — name and df correspond to the name of the downloadable file and data frame that needs to be converted to a CSV file respectively.
Streamlit App
Finally, we can bind everything together in the form of a Streamlit application that will render the datetime filter, data frame, and a line chart that will all be updated instantaneously when we move our sliders.
You can run your final app, by typing the following commands in Anaconda prompt. First, change your root directory to where your source code is saved:
cd C:/Users/...
Then type the following to run your app:
streamlit run file_name.py
Results
And there you have it, an interactive dashboard that allows you to visually filter your time-series data and visualize it at the same time!
If you want to learn more about data visualization and Python, then feel free to check out the following (affiliate linked) courses: