avatarJohn Loewen, PhD

Summary

The article demonstrates how to leverage GPT-4 to create interactive data visualizations using Python's folium library and Streamlit framework, showcasing the integration of AI with mapping and dashboard tools to enhance data analysis workflows.

Abstract

The article titled "Prompting GPT-4: Streamlit and Folium are a Match Made for Mapping" details a method for enhancing the Python coding process for complex data visualizations by utilizing GPT-4's prompt engineering capabilities. It guides readers through using the folium library for creating maps and the Streamlit framework for building interactive dashboards, with the streamlit-folium library enabling their integration. The author illustrates this by visualizing the Global Peace Index dataset, demonstrating how to prompt GPT-4 to generate code for displaying safety scores on an interactive map with year-based selection, and subsequently adding bar charts and scatter plots to provide a more comprehensive data narrative. The article emphasizes the efficiency gained by focusing on data-driven questions rather than the intricacies of coding syntax, suggesting that this approach can significantly streamline a data scientist's workflow.

Opinions

  • The author expresses that the combination of folium and Streamlit is particularly advantageous for creating interactive data visualizations.
  • GPT-4's ability to auto-generate working Python code for cross-framework workflows is considered a "super-cool development" by the author.
  • The author suggests that prompt engineering with GPT-4 can significantly speed up the coding process, allowing for more time to be spent on data interpretation.
  • There is an acknowledgment of the challenge in setting the width of the map relative to the screen and to the charts below, due to the use of different libraries (folium for maps and Plotly for charts).
  • The author believes that GPT-4 is a valuable tool for data scientists, as it allows them to focus more on what the data can do rather than on the technical aspects of coding.
  • The author encourages readers interested in data analysis and visualization with Python/AI to subscribe to their Substack for exclusive content.

Prompting GPT-4: Streamlit and Folium are a Match Made for Mapping

UN GPI data visuals with Python’s streamlit-folium library

Dall-E image: impressionist painting of multi-visual dashboard as a mural on a building

It’s just nuts how much prompt engineering with GPT-4 can speed up the Python coding process for complex data visualizations.

Just recently, I figured out how to leverage GPT-4 to implement interactive data visualizations using the Python folium library and the Streamlit framework.

Why is this a big deal? Let me explain:

  • Folium is a fantastic Python library that allows you to create beautiful maps. You can choose map layers and legends and all kinds of cool functionality.
  • Streamlit is a Python framework that is used to create fully functional interactive dashboards. For example, users can choose options from a dropdown menu to display associated maps and charts.
  • streamlit-folium is a library that allows you to integrate folium maps into a Streamlit dashboard — and GPT-4 can auto-create this code for you!

I know! This is a super-cool development — GPT-4 can now write working Python code for your cross-framework workflow.

Let‘s use a cool data set to step through this together and I’ll show you how it works!

Step 1: The Dataset

Today, let’s find a data set that gives us information about which countries are relatively safe to visit these days. We can go and dig around on Google, or how about we just ask GPT-4!

Prompt to GPT-4: Find the most relevant specific data set that has been used to determine the relative safety of a country for the past 15 years. Describe each data point in a detailed list.

Response from GPT-4: The most relevant dataset for determining the relative safety of a country for travel during the past 15 years is the Global Peace Index (GPI) dataset. The dataset contains the following data points:

  1. Country: The name of the country.
  2. iso3c: The three-digit ISO 3166–1 alpha-3 code for each country.
  3. Score for each year (2008–2022): The safety score of the country for each year, with lower scores indicating higher levels of safety.

By asking GPT-4 to describe each data point, we are setting ourselves up to be able to ask GPT-4 to have a better idea of how to use these data points later on.

You can initiate the download (in XSLX format) at this LINK

Once the file is downloaded, we can open it up in Excel (or Numbers) and click on the “Overall Scores” tab. You should see something similar to:

Overall Scores tab in Numbers — delete the first 3 rows and export to CSV format (global_peace_index.csv)

Next, you want to select and delete the first 3 rows (highlighted in the image above). Once these rows are removed, you can then Export this sheet as a CSV file called global_peace_index.csv.

Now we are ready to prompt GPT-4. Open a chat window, and from the main chat window, click on the attachments icon:

Click on attachments icon and find your newly created global_peace_index.csv file

You will be prompted to upload a file from your computer. Find your newly created global_peace_index.csv file and click Open.

Now we are ready to prompt GPT-4 to create our Python data visualization code.

Step 2: Display Results on a Folium Map

This is where ChatGPT really shines. If you don’t know the specifics of data visualization libraries, all you need to be able to do is to ask the right questions.

Prompt to ChatGPT-4: Access the data set (using only the relevant data from the dataset) and display the average for all the years’ results on an interactive map (using Python folium and StreamLit). The user must be able to select the year from a dropdown to see the results on the map.

Response from ChatGPT-4: To display the data on an interactive map, follow these steps:

  1. Read the cleaned CSV file using pandas.
  2. Create a Dash app with a dropdown menu for selecting years and an empty map container.
  3. Define a callback function that updates the map based on the selected year.
  4. Use folium to create a choropleth map, displaying the safety scores for the selected year.
  5. Display the map in the Dash app using an iframe.

GPT-4 then goes ahead and starts spewing out the code. It’s really that easy.

GPT-4 Response (Working Python Code!):

import streamlit as st
import pandas as pd
import folium
from streamlit_folium import st_folium

# Function to load and cache the dataset using st.cache_data
@st.cache_data
def load_data():
    data = pd.read_csv('global_peace_index.csv')
    return data

data = load_data()

# Streamlit app setup
st.title('Global Safety Score Interactive Map')

# Year selection
year = st.selectbox('Select Year', options=[str(y) for y in data.columns if y.isnumeric()])

# Function to create the map
def create_map(year):
    # Initialize a Folium map
    m = folium.Map(location=[20, 0], zoom_start=2)
    # Creating a choropleth map based on the selected year
    folium.Choropleth(
        geo_data='https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json',
        name='choropleth',
        data=data,
        columns=['iso3c', year],
        key_on='feature.id',
        fill_color='YlGnBu',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name='Safety Score ' + year
    ).add_to(m)
    return m

# Display the map in Streamlit
st_data = create_map(year)
st_folium(st_data, width=725, height=500)

Instructions for Running the Code

  1. Save the Code: Copy the above Python code into a file named streamlit_map01.py or any other filename you prefer.
  2. Run the Streamlit App: Open your terminal, navigate to the directory containing your Python script, and execute the command streamlit run streamlit_map01.py.
  3. Interact with the Dashboard: Use the dropdown menu in the Streamlit app to select different years. The map will update to reflect the safety scores for the chosen year.

NOTE 1: You need to make sure you have all of the libraries available for the code to run. If you are using PyCharm, you can add it by clicking Preferences/Project/Python Interpreter.

Then click on the ‘+’ icon and type in the library name (ie. plotly, folium, streamlit-folium), like so:

After clicking the ‘+’ icon, you can type in the name of the library, then click “Install”

Once you have found the library, don’t forget to click “Install”.

NOTE 2: I use PyCharm to edit/run my Python Code. You can open a terminal window within PyCharm to test your code, like so:

Open the terminal using the View menu. Type “streamlit run [filename]

The resulting interactive folium map, displayed by Streamlit in your default browser

Initial map with folium and Streamlit integration (without additional data visualization charts)

Yes, there it is!

We immediately have a fully interactive Streamlit Dashboard with a drop-down menu that allows us to select a year to update the folium map.

How cool is that? But we’re not done! Let’s add in a few charts to tell us a more complete story!

Which countries do we really NOT want to travel to, and is the world getting more safe or less safe over time?

Step 3: Adding Data Visualization Charts

To take it up a notch here (this is where GPT-4 makes it insanely easy), we are going to also ask GPT-4 to add in 2 more data visualizations to provide a more complete story for the data presented in the global map.

Prompt to GPT-4: Working from the successful code created in the previous step, below the map, display a horizontal bar chart showing the top 10 most dangerous countries for the selected year and a scatter plot displaying scores for all countries in the selected year (do not display country names on the x-axis) Display the charts side by side below the map. Use the “YlOrRd” color scheme for all visualizations.

Response from GPT-4 (Working Python Code!):

import streamlit as st
import pandas as pd
import folium
from streamlit_folium import st_folium
import plotly.express as px


# Streamlit app setup
st.set_page_config(layout="wide")  # Explicitly setting page layout to wide
st.title('Global Safety Score Interactive Map and Visualizations')

# Function to load and cache the dataset using st.cache_data
@st.cache_data
def load_data():
    data = pd.read_csv('global_peace_index.csv')
    return data

data = load_data()

# Year selection
year = st.selectbox('Select Year', options=[str(y) for y in data.columns if y.isnumeric()])

# Function to create the map
def create_map(year):
    m = folium.Map(location=[20, 0], zoom_start=2, width='100%', height='50%')
    folium.Choropleth(
        geo_data='https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json',
        name='choropleth',
        data=data,
        columns=['iso3c', year],
        key_on='feature.id',
        fill_color='YlOrRd',
        fill_opacity=0.7,
        line_opacity=0.2,
        legend_name=f'Safety Score {year}'
    ).add_to(m)
    return m

# Display the map in Streamlit
st_data = create_map(year)
st_folium(st_data, width=1600)  # Adjusted width to 1000 for the map

# Prepare data for charts
data_year = data[['Country', year]].dropna()
data_year[year] = pd.to_numeric(data_year[year], errors='coerce')

# 10 most dangerous countries
top_10_dangerous = data_year.nlargest(10, year)  # Correctly selecting the 10 most dangerous

# Scatter plot data
scatter_data = data_year

# Adjustments for chart width and alignment
fig_bar = px.bar(top_10_dangerous, x=year, y='Country', orientation='h',
                 color=year,  # Apply color based on the year's value
                 color_continuous_scale='YlOrRd',
                 title=f"10 Most Dangerous Countries in {year}",
                 width=1000)  # Explicit width for the bar chart

fig_scatter = px.scatter(scatter_data, y=year, x='Country',
                         title=f"Safety Scores for All Countries in {year}",
                         color=year,  # Apply color based on the year's value
                         color_continuous_scale='YlOrRd',
                         width=1000)  # Explicit width for the scatter plot

fig_scatter.update_xaxes(visible=False)  # Hide x-axis labels for clarity

# Display charts side by side
col1, col2 = st.columns([1, 1], gap="small")
with col1:
    st.plotly_chart(fig_bar, use_container_width=True)
with col2:
    st.plotly_chart(fig_scatter, use_container_width=True)

Additions to this interactive display:

  1. Top 10 Safest Countries: The bar chart visualizes the top 10 safest countries based on their safety scores, with the safest countries at the top.
  2. Scatter Plot: Displays safety scores for all countries. To comply with your request of not displaying country names on the x-axis, country names have been hidden by setting update_xaxes(visible=False).
  3. Layout: Streamlit’s columns functionality is used to display the charts side by side.

Copy/Paste and Save the code in a Python file and run it as a Streamlit application.

GPT-4 generated code: Streamlit dashboard with a dropdown menu by year, updating map and both charts

That looks pretty darn good. Looks great with a common color scheme.

You can click on the dropdown to select a year and the map, bar chart and scatterplot diagrams all update accordingly.

NOTE: The one issue that GPT-4 really has difficulty with here is setting the width of the map relative to the screen and to the charts below. The map is created using the folium library and the charts are created with the Plotly library. You may need to manually configure the width to get it to line up just right (if you care about such things).

In Summary…

The use of prompt engineering, a technique involving iterative refinement of prompts, allows for better collaboration with AI models like GPT-4.

As a data scientist and coder, GPT-4 is another tool in your toolbox to improve your workflow.

How, you ask? You can think more about what the data can do and how to ask the right question(s), rather than spending the bulk of your time fiddling with things like the data frame and the library-specific syntax.

This can only improve and expedite your workflow.

Thank you for reading.

If this type of story is right up your alley, and you want to support me as a writer, subscribe to my Substack.

On Substack, I publish a newsletter and articles that you won’t find on the other platforms where I create content.

ChatGPT
Prompt Engineering
Data Visualization
Python
Streamlit
Recommended from ReadMedium