avatarAvra

Summary

This context discusses the use of ChatGPT, a language model developed by OpenAI, to build a data science web application using Streamlit-Python for exploratory data analysis.

Abstract

The content explains how ChatGPT can be used to create a data science web application with Streamlit-Python, focusing on exploratory data analysis (EDA) of the Iris dataset. The article mentions the importance of EDA in understanding the dataset's characteristics, patterns, trends, and outliers. The author describes the interactive process with ChatGPT, highlighting the AI's ability to suggest code snippets and guide users through coding challenges. The article also covers debugging and customizing the AI-generated code, and the deployment of the app using Streamlit Cloud.

Opinions

  1. ChatGPT is a valuable resource for developers looking for help with coding problems.
  2. The AI's ability to suggest code snippets and offer guidance is particularly useful for both beginner and experienced developers.
  3. The code generated by ChatGPT may require debugging and customization to meet the user's specific needs.
  4. The app created with ChatGPT's help is capable of displaying various types of plots, summaries, and descriptions for the Iris dataset.
  5. Deploying the app using Streamlit Cloud allows for global accessibility.
  6. The author believes that ChatGPT can be super useful in debugging and commenting the code.
  7. The article suggests that ChatGPT can be leveraged to build simple Machine Learning Web Apps.

ChatGPT helped me to built this Data Science Web App using Streamlit-Python

Learn and build an Exploratory Data Science Web app in no time

👨🏾‍💻 GitHub ⭐️| 🐦 Twitter | 📹 YouTube | ☕️ BuyMeaCoffee | Ko-fi💜

If you’re a developer looking for help with a coding problem, ChatGPT is the perfect solution. ChatGPT has quickly become the go-to tool for developers looking to get help with coding problems. Its ability to suggest code snippets and offer guidance on how to solve coding challenges has made it incredibly popular, and it’s no surprise that ChatGPT has taken the internet by storm. Whether you’re a beginner or an experienced developer, ChatGPT is a valuable resource that can help you solve any coding question.

Not a medium member ? Use this friend link to read this article for free !

Streamlit

Streamlit is a popular open-source Python library that I used to build the app. It makes it easy to create interactive, web-based data applications and allows me to create beautiful, intuitive dashboards and data visualizations that can be shared with others. You will find lot of other Streamlit related posts here in my Medium profile !

ChatGPT

Let's refer to Wikipedia here, ChatGPT (Generative Pre-trained Transformer)is a chatbot launched by OpenAI in November 2022. It is built on top of OpenAI’s GPT-3.5 family of large language models, and is fine-tuned with both supervised and reinforcement learning techniques.

https://commons.wikimedia.org/wiki/File:ChatGPT.png

Here, we used ChatGPT to build a simple data science exploratory data analysis (EDA) Streamlit Python app. EDA is an essential step in the data science process that involves examining and summarizing a dataset in order to better understand its characteristics and relationships. It is a crucial step in the data science process because it helps identify patterns, trends, and outliers in the data, which can inform further analysis and modeling.

In this blog post , I described the interactive process with the AI in three sections :

  • 👨🏾‍💻 Prompt : Basically where we feed the AI (ChatGPT) with tasks or commands
  • 🤖 The output from the ChatGPT
  • 🤔 Human intervention : In this case, it's our reviews on the output supplied by the AI

Check the full Youtube vide here .

A. Interaction with ChatGPT (Part 1) :

The app’s basic design

It's necessary to be precise and explicit with the keywords fed to the ChatGPT prompt.

👨🏾‍💻 Prompt : Hey , help me to create a Streamlit web app , using pandas, Plotly , pandas, seaborn — showing exploratory data analysis of Iris dataset ? Make sure we show various types of plots , with summary and description etc . Lets build a dashboard.

🤖 ChatGPT : Certainly! Here is an example of how you can create a Streamlit web app that performs exploratory data analysis on the Iris dataset using pandas, Plotly, and seaborn:

  1. First, you will need to install Streamlit, pandas, Plotly, and seaborn using pip. You can do this by running the following command:
pip install streamlit pandas plotly seaborn

2. Next, you will need to import the necessary libraries in your Python script. You can do this by adding the following lines at the top of your script:

import streamlit as st
import pandas as pd
import plotly.express as px
import seaborn as sns

3. Next, you can use pandas to read the Iris dataset from a CSV file and store it in a dataframe. You can do this by adding the following lines to your script:

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
                 names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

4. Now, you can use Streamlit’s built-in functions to create a dashboard for exploring the Iris dataset. You can start by adding a title and a header to your app using the st.title and st.header functions:

st.title('Exploratory Data Analysis of the Iris Dataset')
st.header('This app allows you to explore the Iris dataset and visualize the data using various plots.')

5. Next, you can add a side panel to your app using the st.sidebar function. Inside the side panel, you can add a dropdown menu that allows the user to select which column to visualize using the st.selectbox function:

selected_column = st.sidebar.selectbox('Select a column to visualize', df.columns)

6. You can then use the st.histogram function to display a histogram of the selected column:

st.histogram(df[selected_column])

7. You can also use the px.scatter function from Plotly to display a scatterplot of the Iris data. You can specify the x and y axis using the st.selectbox function:

x_axis = st.sidebar.selectbox('Select the x-axis', df.columns)
y_axis = st.sidebar.selectbox('Select the y-axis', df.columns)
fig = px.scatter(df, x=x_axis, y=y_axis)
st.plotly_chart(fig)

8. You can also use the sns.pairplot function from seaborn to display a pairplot of the Iris data. You can specify the hue parameter to color the points by class:

sns.pairplot(df, hue='class')
st.pyplot()

9. Finally, you can use the df.describe function to display a summary of the Iris data. You can use

🤔 Author: Well, ChatGPT gave us the information we needed to start building our app! But all these suggestions by the AI correct? Not really! A small debugging was necessary. However, the pieces of information from chatGPT were still helpful in getting our app started.

For instance,`st.histogram(df[selected_column])` is an invalid Streamlit syntax and requires to be debugged. Here's the debugged code snippet with added UI features 👇🏽:

# Import libraries
import streamlit as st
import pandas as pd
import plotly.express as px
import seaborn as sns

# Disable SSL verification to allow data to be loaded from URL
# (Not necessay in all use-cases)
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

# Suppress warning about using Matplotlib in Streamlit
st.set_option('deprecation.showPyplotGlobalUse', False)

# Load Iris dataset from URL
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
                names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])

# Set app title and header
st.title('Exploratory Data Analysis of the Iris Dataset')
st.header('This app allows you to explore the Iris dataset and visualize the data using various plots.')

# Display DataFrame
st.subheader("DataFrame")
st.dataframe(df)

# Allow user to select a column to visualize
selected_column = st.sidebar.selectbox('Select a column to visualize', df.columns)

# Create histogram plot of selected column
st.write("Histogram Plot")
sns.histplot(df[selected_column])
st.pyplot()

# Allow user to select x and y axes for scatter plot
st.write("Scatter Plot")
x_axis = st.sidebar.selectbox('Select the x-axis', df.columns)
y_axis = st.sidebar.selectbox('Select the y-axis', df.columns)

# Create scatter plot using Plotly
fig = px.scatter(df, x=x_axis, y=y_axis)
st.plotly_chart(fig)

# Create pair plot using Seaborn
st.write("Pair Plot")
sns.pairplot(df, hue='class')
st.pyplot()

B. Interaction with ChatGPT (Part 2) :

Continuation …

ChatGPT stopped after displaying a certain section of the code. This happens probably when the output size is large enough.

👨🏾‍💻 Prompt : continue from where you left

🤖 ChatGPT: Certainly! Here is the rest of the code for creating a Streamlit web app that performs exploratory data analysis on the Iris dataset using pandas, Plotly, and seaborn:

st.table(df.describe())
st.header('Correlation Matrix')
corr = df.corr()
st.heatmap(corr, annot=True)
st.header('Boxplot')
fig = px.box(df, y=selected_column)
st.plotly_chart(fig)

With these lines of code, you can display a summary of the Iris data, a correlation matrix, and a boxplot of the selected column.

To run the app, you can use the streamlit run command followed by the name of your Python script:

streamlit run my_app.py

This will launch a web server and open the app in your default web browser. You can then use the dropdown menus and buttons to explore the Iris dataset and visualize the data using various plots.

I hope this helps! Let me know if you have any questions or need further assistance.

🤔 Author : Few small tweaks were made and the app ran pretty well in the http://localhost:8501 ! Altered code block,

# Display description of the data
st.write("Description of the data")
st.table(df.describe())

# Display correlation matrix
st.header('Correlation Matrix')
corr = df.corr()
sns.heatmap(corr, annot=True)
st.pyplot()

# Display boxplot
st.header('Boxplot')
fig = px.box(df, y=selected_column)
st.plotly_chart(fig)
This code is displaying the statistical description of the data, a heatmap of the correlation matrix, and a boxplot (not shown in the above figure) of the selected column in the data.The heatmap is created using the Seaborn library.

Here's a quick glimpse of the app ⤵️

App demo

C. Interaction with ChatGPT (Part 3)

Additional needs for the app

How about asking ChatGPT to make the visualisation bit more aesthetic ?

👨🏾‍💻 Prompt : can you also add a violin plot in this same app ?

🤖 ChatGPT : Certainly! Here is an example of how you can add a violin plot to the Streamlit web app that performs exploratory data analysis on the Iris dataset using pandas, Plotly, and seaborn:

  1. First, you will need to import the px.violin function from Plotly by adding the following line to the top of your script:
import plotly.express as px

2. Next, you can add a button to the side panel of the app using the st.sidebar.button function. When the button is clicked, the app will display a violin plot of the selected column:

if st.sidebar.button('Show Violin Plot'):
    fig = px.violin(df, y=selected_column)
    st.plotly_chart(fig)

3. You can also add a dropdown menu to the side panel that allows the user to select which class to visualize in the violin plot. You can do this by adding the following lines to the side panel:

selected_class = st.sidebar.selectbox('Select a class to visualize', df['class'].unique())

if st.sidebar.button('Show Violin Plot'):
    fig = px.violin(df[df['class'] == selected_class], y=selected_column)
    st.plotly_chart(fig)

With these lines of code, you can display a violin plot of the selected column for a specific class when the button is clicked.

I hope this helps! Let me know if you have any questions or need further assistance.

🤔 Author : It worked pretty well ! ✨

Generated box and violin plots.

D. Interaction with ChatGPT (Part 4)

Deployment of the app

All this time we were interacting with the web app in our local host. But in order to make it globally available, we need to deploy the app. Streamlit cloud is one of the best place to deploy such web. For deploying the web app in the cloud, we need to create a dependency file called requirements.txt file and push our code to the GitHub repository file. These steps were demonstrated in details in the related video. We asked ChatGPT to create the dependency file for us.

👨🏾‍💻Prompt :Can you please create a requirements.txt file for this above app ?

🤖ChatGPT : Certainly! Here is a requirements.txt file for the Streamlit web app that performs exploratory data analysis on the Iris dataset using pandas, Plotly, and seaborn and includes a violin plot:

pandas==1.1.5
plotly==4.11.1
seaborn==0.11.1
streamlit==0.73.0

This file lists the required versions of pandas, Plotly, seaborn, and Streamlit for the app.

To create this file, you can run the following command:

pip freeze > requirements.txt

This command will scan your Python environment and create a requirements.txt file that lists all the installed packages and their versions.

I hope this helps! Let me know if you have any questions or need further assistance.

🤔 Author : This is amazing ! Every aspect of the app is created by ChatGPT and the app is ready to deploy 🚀 . The entire code is well documented here and the app demonstrated is live here!

Conclusion

So we sucessfully build a Exploratory Data Analysis web App with the help of ChatGPT. Let's talk about the features of the app :

  • The app allows you to explore the Iris dataset, which is loaded from a URL and displayed as a DataFrame.
  • The app includes various plots for visualizing the data such as histograms, scatter plots, pair plots, a correlation matrix, box plots, and violin plots. The user can select which columns to visualize and which class to focus on using sidebar widgets.

ChatGPT can be super useful is debugging and commenting the code as demostrated in the video and in this blog . Also build a simple Machine Learning Web App leveraging the power of ChatGPT , you can find the video tutorial here ⤵️

🎈 You can also find my YouTube channel here for more demo tutorials on Streamlit, Python, and Data Science. And make sure to follow me here over Medium for more interesting blog posts! Cheers ✨🥳

Hi there ! I’m always on the lookout for sponsorship, affiliate links and writing gigs to keep growing my online contents. Any support, feedback and suggestions is very much appreciated ! Drop an email here : [email protected]🤗

👨🏾‍💻 GitHub ⭐️| 🐦 Twitter | 📹 YouTube | ☕️ BuyMeaCoffee | Ko-fi💜

ChatGPT
OpenAI
Streamlit
Exploratory Data Analysis
Python
Recommended from ReadMedium