Rapid Python Data Visualizations with ChatGPT: A Step-By-Step Guide
Prompting ChatGPT for quick and beautiful data visualizations
Through a well-thought out process of prompting, ChatGPT can drastically simplify and expedite the complex process of creating Python data visualizations.
As a Comp Sci professor with 20+ years of computer programming experience, I have been working for months on refining this process — so that you don’t have to!
Here I demonstrate a concrete example of this process— creating multiple visualizations of an Olympic medals dataset. The 5 steps are:
STEP 1. Downloading and cleaning an “Olympic Medals by Country” dataset using Python code generated by ChatGPT.
STEP 2. Crafting a Plotly map to visually depict the total cumulative medals by country from 1992 to 2020.
STEP 3. Adding a stacked area chart to the dashboard, which displays the top 10 countries by total medal count over time.
STEP 4. Adding a second chart, a bar chart showcasing the top 10 countries with the most gold medals during the same period.
STEP 5. Refining the aesthetics of the dashboard for a more pleasing and insightful user experience.
Let’s get to it!
Step 1. Downloading and Cleaning the Dataset
The data used for this example is the “Olympic Medals by Country” dataset.
It is available on Kaggle, HERE.
In this dataset, the data is organized by year, country, and a count of “Gold”, “Silver” and “Bronze” medals. For the sake of a more “complete” set of data points, we will just extract the totals since 1992 (when all Eastern European countries joined independently).
Prompt to ChatGPT: Please write Python code (using the plotly library) to solve this problem: There is a data file is called “Country_Medals.csv” and it uses “;” as a separator. In the CSV file Country_Medals.csv, in the column “Country_Name”, anywhere you find “United States”, replace it with “United States of America”. Save as a pandas dataframe.
ChatGPT Response: Certainly, here’s the code:
import pandas as pd
# Load the CSV file
df = pd.read_csv('Country_Medals.csv', sep=';')
# Replace 'United States' with 'United States of America' in the 'Country_Name' column
df['Country_Name'] = df['Country_Name'].replace('United States', 'United States of America')
This will create a pandas DataFrame df
where all instances of "United States" in the "Country_Name" column have been replaced with "United States of America". AND it handles the semi-colon separator with ease.
Now that we have a nice, clean pandas data frame, we can build a Plotly map from it.
Step 2. Creating a Plotly Map to Display Medal Totals by Country
As the data in the CSV file is organized by country, and by year, we can use a map to display the total cumulative medals by country for the years 1992–2020.
The simplest solution here is to use a Plotly map.
Let’s prompt ChatGPT to do this for us.
Prompt to ChatGPT: Now using the code just created, and using Python plotly, create a map that displays total medals (Count of “Gold”, “Silver”, and “Bronze”) from the years 1992–2020 by country (using the “Country_Name” field)
ChatGPT Response: Sure, to do this we’ll:
- Filter the data between the years 1992 and 2020.
- Sum the medals by country.
- Create a plotly map using this data.
Here’s the code:
import pandas as pd
import plotly.express as px
# Load the CSV file
df = pd.read_csv('Country_Medals.csv', sep=';')
# Replace 'United States' with 'United States of America' in the 'Country_Name' column
df['Country_Name'] = df['Country_Name'].replace('United States', 'United States of America')
# Filter the data between 1992 and 2020
df = df[(df['Year'] >= 1992) & (df['Year'] <= 2020)]
# Calculate total medals for each country
df['Total_Medals'] = df['Gold'] + df['Silver'] + df['Bronze']
df_country_medals = df.groupby('Country_Name')['Total_Medals'].sum().reset_index()
# Generate a plotly map
fig = px.choropleth(df_country_medals,
locations='Country_Name', # DataFrame column with country names
locationmode='country names', # Set to plot as country names
color='Total_Medals', # DataFrame column with color values
hover_name='Country_Name', # DataFrame column hover info
color_continuous_scale=px.colors.sequential.Plasma, # Set the color scale
title='Total Medals from 1992 to 2020') # Title of the plot
fig.show()
Now because we’ve built this up in careful steps (and assuming you have pandas and plotly libraries installed), you can just copy/paste this into your Python editor, Save it and Run… the result:
Boom! Plotly map displayed in a web browser.
Now I am definitely not a big fan of the default “Plasma” color scheme. The colors are all over the place — hard to tell what is what. We can (and will) change this.
But for now, let’s move on to something more interesting here — let’s add in some charts to help us tell a more complete story about this data.
Step 3. Adding Stacked Area Chart
Now we know the total count of medals for each country, but how has this changed over time?
A stacked area chart is a nice-looking time-series representation that can provide more insight into this.
Let’s create one:
Prompt to ChatGPT: Can add to the code by creating a stacked area chart that shows the top 10 countries by total medal count from 1992 to 2020. Can youdisplay the plotly map and the stacked area chart in a dashboard using Plotly dash
ChatGPT Response: Certainly, here is the code:
import pandas as pd
import plotly.express as px
import dash
import dash_core_components as dcc
import dash_html_components as html
# Load the CSV file
df = pd.read_csv('Country_Medals.csv', sep=';')
# Replace 'United States' with 'United States of America' in the 'Country_Name' column
df['Country_Name'] = df['Country_Name'].replace('United States', 'United States of America')
# Filter the data between 1992 and 2020
df = df[(df['Year'] >= 1992) & (df['Year'] <= 2020)]
# Calculate total medals for each country
df['Total_Medals'] = df['Gold'] + df['Silver'] + df['Bronze']
df_country_medals = df.groupby(['Country_Name', 'Year'])['Total_Medals'].sum().reset_index()
# Generate a plotly map
map_fig = px.choropleth(df_country_medals.groupby('Country_Name')['Total_Medals'].sum().reset_index(),
locations='Country_Name',
locationmode='country names',
color='Total_Medals',
hover_name='Country_Name',
color_continuous_scale=px.colors.sequential.Plasma,
title='Total Medals from 1992 to 2020')
# Create a stacked area chart for the top 10 countries by total medal count
top_countries = df_country_medals.groupby('Country_Name')['Total_Medals'].sum().nlargest(10).index
df_top_countries = df_country_medals[df_country_medals['Country_Name'].isin(top_countries)]
area_fig = px.area(df_top_countries, x="Year", y="Total_Medals", color="Country_Name", title='Top 10 Countries by Total Medal Count from 1992 to 2020')
# Create a Dash Application
app = dash.Dash(__name__)
# Define the layout
app.layout = html.Div(children=[
dcc.Graph(figure=map_fig),
dcc.Graph(figure=area_fig)
])
# Run the application
if __name__ == '__main__':
app.run_server(debug=True)
This Python code will create a dashboard with two plots: the map and the stacked area chart:
The map displays the total medals by country from 1992 to 2020, and the area chart shows the top 10 countries by total medal count from 1992 to 2020.
So far, so good! Let’s add in one more visualization.
Step 4. Adding a Bar chart for Gold Medals by Country
Everyone likes a winner, so let’s find out the top 10 countries by gold medal, and let’s display using a bar chart.
We can see from the previous example that it is probably best to display our Top 10 Countries by Gold Medal bar chart beside our stacked area chart. This will “balance” our layout nicely — and it should all fit within one window for display purposes.
Prompt to ChatGPT: Using the same data, let’s add in a bar chart showing the top 10 countries with the most gold medals (from the “Gold” column) from 1992 to 2020. Please display this chart on the same row as the stacked area chart (using dcc and html for layout).
ChatGPT Response: Here’s the code: (NOTE: I’ve only included new code here)
# Create a bar chart for the top 10 countries with most gold medals
df_gold = df_country_medals.groupby('Country_Name')['Gold'].sum().nlargest(10).reset_index()
bar_fig = px.bar(df_gold, x='Country_Name', y='Gold', title='Top 10 Countries with Most Gold Medals from 1992 to 2020')
# Create a Dash Application
app = dash.Dash(__name__)
# Define the layout
app.layout = html.Div([
dcc.Graph(figure=map_fig, id='map'),
html.Div([
dcc.Graph(figure=area_fig, id='area-chart'),
dcc.Graph(figure=bar_fig, id='bar-chart')
], style={'display': 'flex'})
])
It is also worth noting here that when adding the 2nd chart (the bar chart) ChatGPT wanted to add in more libraries and use dash_bootstrap_components (more complicated), so I specify in the prompt to stick with the same layout process.
Our working result from copy/paste/save and then Run:
Wow, so here it is, all working! Super-duper easy.
Still… not the best usage of space — and our colors are “all over the map” So if these things bug you (like they bug me), then applying som data visualization best-practices can help clean this all up.
Step 5. Styling the Display
The display for this data visualization is not ideal — there are definitely some things we can improve on.
I want to change the “Plasma” default to something that is a bit easier to interpret, and maybe a bit more relevant to this Olympic theme. Looking at the code, I can see which property we need to modify. Let’s ask GPT for alternatives:
Prompt to ChatGPT: What are the options for Plotly choropleth color_continuous_scale property?
ChatGPT Response: The color_continuous_scale
property in Plotly's choropleth maps is used to set the colorscale for continuous color scales. There are quite a few predefined color scales you can use directly. Here's a list of some common ones:
YlOrRd
Bluered
Electric
YlGnBu
This is where ChatGPT can really save you from nit-picky details. Let’s go with the “YlOrRd” theme.
And to fit this color theme, let’s also change the bar chart away from blue to “gold”
Prompt to ChatGPT: How do I change the color of the bar chart bars for px.bar to “gold”?
ChatGPT Response: You can use the update_traces
function of the figure to change the color without affecting the legend. Here’s the revised line to set the color of bars in the bar chart to gold:
bar_fig = px.bar(df_gold, x='Country_Name', y='Gold', title='Top 10 Countries with Most Gold Medals from 1992 to 2020')
bar_fig.update_traces(marker_color='gold')
Presto! Now you just need to copy/paste this change into the main Python code.
And one last change — let’s fill in the space better for the map (make it display a little larger within its
Prompt to ChatGPT: One last change — can you make the map display more fully within its div container?
ChatGPT Response: In Plotly, you can adjust the size of a figure using the height
and width
attributes in the layout
of the figure. Here’s the revised code changes:
app.layout = html.Div([
dcc.Graph(figure=map_fig, id='map', style={'height': '50vh', 'width': '100%'}),
html.Div([
dcc.Graph(figure=area_fig, id='area-chart'),
dcc.Graph(figure=bar_fig, id='bar-chart')
], style={'display': 'flex'})
])
The only change is in the app.layout. The 'height': '50vh', 'width': '100%'
style in the dcc.Graph component makes it take up 50% of the viewport height and 100% of the viewport width. You can adjust these values as necessary to fit your needs.
The final result:
Looks pretty darn good!
I mean, we could fiddle and diddle with it some more, but this gives you a really good idea of what you can do (and how you can do it) with ChatGPT.
Well done!
In Summary…
ChatGPT is changing the approach coding by removing low-level, nit-picky tasks.
This paradigm-shift allows developers to concentrate more on higher-level problem-solving and creativity. Programmers can now focus more on the “big picture”.
And generative AI is only going to get better at this. A lot better. And very soon.
The ability to create code using this sort of technique will be invaluable going forward, so jump on the band-wagon.
Before you go… If you want to start writing on Medium yourself and earn money passively you only need a membership for $5 a month. If you sign up with my link, you support me with a part of your membership fee without additional costs.
🔔 Follow Me: LinkedIn | Twitter
If you’re interested, here’s a link to more articles I’ve written. There’s articles on Python, Generative AI, Expat living, Marathon training, Travel, and more!
Level Up Coding
Thanks for being a part of our community! Before you go:
- 👏 Clap for the story and follow the author 👉
- 📰 View more content in the Level Up Coding publication
- 💰 Free coding interview course ⇒ View Course
- 🔔 Follow us: Twitter | LinkedIn | Newsletter
🚀👉 Join the Level Up talent collective and find an amazing job