avatarJohn Loewen, PhD

Summary

The website content provides a tutorial on data storytelling using Python and Streamlit to analyze and visualize UN food security statistics from 2000 to 2022, revealing global trends and country-specific insights into undernourishment.

Abstract

The article presents a comprehensive guide to data storytelling with Python, focusing on the visualization of United Nations food security data. It begins with an introduction to the importance of data storytelling in understanding global food security challenges, such as ensuring access to sufficient, safe, and nutritious food. The tutorial uses the open-source app framework Streamlit to process and visualize data on global food security, including indicators like the prevalence of undernourishment and severe food insecurity. The author, whose wife works for the UN Food and Agriculture Organization, demonstrates how to download, clean, and preprocess the dataset for analysis. The article then delves into creating a Pandas data frame for visualizing the global prevalence of undernourishment over time, highlighting a successful reduction from 2001 to 2012, stability until 2018, and a concerning increase attributed to the COVID-19 pandemic. A country-level analysis follows, comparing trends in undernourishment for selected countries, with insights into the impact of government policies, economic growth, and conflict on food security. The conclusion summarizes the key findings, emphasizing the correlation between political stability and food security, and suggests the creation of interactive dashboards as a next step for more detailed analysis.

Opinions

  • The author believes that data storytelling is a powerful tool for exploring and communicating development data to the world.
  • There is an emphasis on the importance of understanding global food security to aid in ensuring food access for all.
  • The article suggests that the UN food security dataset is a valuable resource for analysis and visualization to uncover trends and progress in addressing hunger.
  • The author expresses a personal connection to the subject matter through his wife's work with the UN Food and Agriculture Organization.
  • The tutorial implies that Python, Pandas, and Streamlit are effective tools for data preprocessing and visualization.
  • The author points out the challenges in handling the dataset, such as the representation of years as ranges and the presence of comparison operators in value fields.
  • The analysis of global trends in undernourishment indicates a successful global effort up to 2012, followed by a setback during the COVID-19 pandemic.
  • The country-level analysis reveals that political stability, comprehensive agricultural policies, and economic growth are key factors in improving food security.
  • The author highlights the negative impact of conflict and political instability on food security, as seen in the case of Afghanistan.
  • The article concludes with a call to action for creating more interactive visualizations and dashboards to further explore food security data.
  • The author invites readers to subscribe to his Substack for exclusive content on data analysis and visualization, indicating his commitment to and enthusiasm for the subject.

Data Storytelling With Python: A Tutorial With UN Food Security Stats

From CSV to data frame to data visualization to the full story

Data storytelling offers a powerful means to explore and communicate development data to the world

As an example, understanding global food security aids in ensuring that all people have access to sufficient, safe, and nutritious food.

Today, let’s take a look at Streamlit, an open-source app framework, to analyze and visualize data on global food security.

By examining indicators from the United Nations dataset over the past 20 years, we can uncover trends, challenges, and progress in the fight against hunger.

Through simple Python data visualizations, we can look at the data from different angles to tell a story about where the world stands in achieving food security for all.

You may find the story is a bit different than you initially thought.

Let’s take a look.

Finding and Downloading The Dataset

My wife works for the UN Food and Agriculture Organization and is always involved with decisions related to global food security.

To find common ground for dialog and to support her work, today I am going to use a new dataset (to me). The UN food security dataset can be found HERE.

There are lots of options to consider when downloading the dataset:

UN food security indicators for download

For the CSV file to download, I have chosen all of the countries for all of the years available (2000–2022), along with 4 important indicators:

  1. Prevalence of undernourishment (%)
  2. Number of people undernourished (million)
  3. Prevalence of severe food insecurity in the total population (%)
  4. Number of severely food insecure people (million)

After downloading, I saved the file as UN_food_security.csv.

Cleaning the Data

The dataset consists of the following fields (only the relevant fields are displayed here):

  1. Area: The name of the geographical area, such as a country or region.
  2. Item: The name or description of the item or indicator.
  3. Year: The actual year or range of years as a string.
  4. Unit: The unit of measurement for the data values.
  5. Value: The data value.

Now on previewing the actual data, there are a few things to consider here:

  1. For the Item — this is the indicator field that we specified during the download. We know that there are 4 indicators. We can decide which indicator(s) we want to show with our data visuals.
  2. For the Year — This one is a bit tricky. It is shown as a “range of years” with the middle year being the “actual” year. We will need to kludge some code to handle this
  3. For the Value — some of the data use a comparison operator (i.e.,>2.5). We need to have a way to handle this data. The simplest way is to just remove the operator.

So to ready the entire data frame, the Python pandas code:

import pandas
# Load and preprocess the dataset
data = pd.read_csv('UN_food_security.csv')

# Adjusted preprocess for the 'Year' column to accurately represent the middle year of the range
def get_middle_year(year_range):
    start_year, end_year = map(int, year_range.split('-'))
    # Calculate middle year correctly as per the given instruction
    return start_year + (end_year - start_year) // 2

data['Year'] = data['Year'].apply(get_middle_year)

# Preprocess the 'Value' column to remove any comparison operators and convert to float
def clean_value(value):
    if isinstance(value, str):
        if value.startswith('<') or value.startswith('>'):
            # Assume values like '<0.1' to be '0.1' for simplicity
            return float(value[1:].strip())
        return float(value.strip())
    return value

data['Value'] = data['Value'].apply(clean_value)

And then we can do a bit of fancy footwork with Python code — to create a Pandas data frame that we can use to visualize our data.

Looking at the dataset

Story 1. Global Prevalence of Undernourishment

For this exercise, we want to find the real story about the levels of undernourishment around the globe. Higher levels of undernourishment are a pretty clear indicator that food security in a particular area is low.

With our data set, to analyze the “Prevalence of undernourishment” indicator across all countries, we can first aggregate the data to calculate the global average prevalence of undernourishment per year.

This involves filtering the dataset for the specific indicator, converting the Year column to a numeric format for easier analysis, and then calculating the mean value of undernourishment for each year across all countries.

Our Python code to do this:

import pandas as pd
import matplotlib.pyplot as plt

# Load and preprocess the dataset
data = pd.read_csv('UN_food_security.csv')

# INCLUDE PANDAS DATAFRAME CODE (FROM ABOVE) PASTE IT ALL HERE

data['Value'] = data['Value'].apply(clean_value)

# Filtering and calculating global average for undernourishment
undernourishment_data = data[data['Item'] == 'Prevalence of undernourishment (percent) (3-year average)']
global_avg = undernourishment_data.groupby('Year')['Value'].mean().reset_index()

# Creating the Matplotlib plot
plt.figure(figsize=(10, 6))
plt.plot(global_avg['Year'], global_avg['Value'], marker='o', linestyle='-')
plt.xlabel('Year')
plt.ylabel('Prevalence of Undernourishment (%)')
plt.title('Global Prevalence of Undernourishment Over Time')
plt.grid(True)

# Correcting the display of years on the x-axis
plt.xticks(global_avg['Year'], rotation=45)

# Display the plot
plt.tight_layout()
plt.show()

The relevant code in this example includes:

  1. Create the Matplotlib plot: Plot the data using plt.plot, specifying 'Year' on the x-axis and 'Value' on the y-axis. Customize the plot by adding labels for the x and y-axes and setting a title.
  2. Dsplay of years on the x-axis: Use plt.xticks to set the x-axis tick positions to match the years from the 'Year' column, and rotate the tick labels by 45 degrees for better readability.
  3. Display the plot: Use plt.tight_layout() to ensure that all elements of the plot fit nicely within the figure. Finally, use plt.show() to display the plot.

And the resulting time-series global average chart:

What Data Stories Can We See Here?

Wow, this is a very interesting chart. We can see a concerted, and successful global effort to reduced the prevalence of undernourishment from 2001–2012.

From 2012 to 2018, we see the world maintaining the same level — then what the heck happened from 2018–2021?

According to the United Nations, the number of people in the world affected by hunger increased during this time mostly because of the COVID-19 pandemic. e shadow of the COVID-19 pandemic. It is estimated that between 720 and 811 million people in the world faced hunger in 2020 (somewhere around 118 million more people in 202 than in 2019).

Story 2. What Is The Story at the Country Level?

Country focus

First, we’ll filter the dataset for the selected countries and the “Prevalence of undernourishment” indicator. Then, we’ll create a line chart to compare these countries’ undernourishment trends over time.

Our Python Code:

import pandas as pd
import matplotlib.pyplot as plt

# Load and preprocess the dataset
data = pd.read_csv('UN_food_security.csv')

# INCLUDE PANDAS DATAFRAME CODE (FROM ABOVE) PASTE IT ALL HERE

# Select countries for comparison
selected_countries = ['Afghanistan', 'Brazil', 'China', 'Egypt', 'France', 'India']

# Filter data for selected countries and undernourishment indicator
selected_data = data[(data['Area'].isin(selected_countries)) &
                     (data['Item'] == 'Prevalence of undernourishment (percent) (3-year average)')]

# Calculate the average undernourishment for each country per year
country_avg = selected_data.groupby(['Year', 'Area'])['Value'].mean().unstack()

# Create the plot
plt.figure(figsize=(10, 6))
for country in selected_countries:
    plt.plot(country_avg.index, country_avg[country], marker='o', label=country)

plt.xlabel('Year')
plt.ylabel('Prevalence of Undernourishment (%)')
plt.title('Undernourishment Trends Across Selected Countries')
plt.grid(False)
plt.legend(title='Country')

plt.show()

This code snippet performs the following actions:

  1. Choose a set of countries: You can hard-code any valid country into this list to display as you like. The selected_data variable then contains only the data for those countries in the list.
  2. Matplotlib Figure Initialization: plt.figure(figsize=(10, 6)): A blank canvas for the plot is created with specific dimensions (10 inches in width and 6 inches in height).
  3. Loop Through Selected Countries: for country in selected_countries:: Iterates through the selected countries for comparison.
  4. Creating Line Plots: plt.plot(country_avg.index, country_avg[country], marker='o', label=country): Inside the loop, this line generates individual line plots for each country's undernourishment data. It uses markers to indicate data points and labels each line with the country's name.
  5. Labeling Axes and Title: plt.xlabel('Year'): Sets the x-axis and y-axis label. Set the title.

And presto:

What Data Stories Can We See Here?

The trends of countries such as Afghanistan, India, and Brazil follow the same as we saw in the previous global average chart. It is interesting to note that France maintains a stable food supply, even during Covid-19.

China and Brazil have demonstrated notable decreases in undernourishment rates, indicating the success of their comprehensive agricultural policies and economic growth that enhanced food access.

This suggests strong governmental interventions and investments in agriculture have been effective in these nations.

Conversely, Afghanistan’s data reflect persistently high levels of undernourishment, underscoring the detrimental impacts of political instability and conflict on food security.

This highlights a clear (and known) correlation between peace, political stability, and food security, where conflict directly impedes agricultural productivity and food distribution.

In Summary…

Two main stories emerge from this analysis:

  1. Global Prevalence of Undernourishment: The analysis shows a reduction in global undernourishment from 2001 to 2012, followed by stability until a spike from 2018 to 2021, attributed to the COVID-19 pandemic’s impact.
  2. Undernourishment at the Country Level: France exhibits stable food security, while China and Brazil show significant improvements due to effective agricultural policies and economic growth. India’s situation reflects a rapidly increasing population outpacing economic growth that strains natural resources and land availability. Afghanistan’s situation reflects the negative impact of political instability and conflict on food security.

I hope you were successful in generating the data visualizations to tell this story. If you are looking for more detailed interactive visuals, a great next step would be the creation of an interactive dashboard showing maps and charts,

For example, allowing the user to select a year and showing the global food security landscape on a choropleth map, and a bar chart of the top 10 countries with the highest undernourished population (I will write this article next!).

Thank you for reading!

If this type of story is right up your alley, and you want to support me as a writer, subscribe to my Substack.

On Substack, I publish a newsletter and articles that you won’t find on the other platforms where I create content.

I would love to hear/read your comments!

Data Storytelling
Python
Data Visualization
Data Analysis
Storytelling
Recommended from ReadMedium