avatarBenjamin Obi Tayo Ph.D.

Summary

The web content provides a comprehensive tutorial on data visualization using Python's Matplotlib library, focusing on the analysis and visualization of weather data from Ann Arbor, Michigan, over a ten-year period.

Abstract

The tutorial presented on the website demonstrates the process of data visualization in Python, specifically for weather data analysis. It utilizes a dataset from NOAA, which includes daily temperature records from 2005 to 2015 near Ann Arbor, Michigan. The code examples provided in the tutorial cover the importation of necessary libraries, data preparation, and the creation of a line graph illustrating record high and low temperatures. The graph includes a shaded area representing the temperature range and scatter points indicating where the ten-year record was broken in 2015. The tutorial emphasizes the artistry involved in data visualization and showcases how Python's Matplotlib library can be used to generate insightful plots.

Opinions

  • Data visualization is considered more of an art than a science, requiring a combination of various code elements to produce an effective visualization.
  • The tutorial suggests that an excellent end result in data visualization is achieved by carefully piecing together code for an artistic representation of data.
  • The author implies that the provided visualization of weather data is a good example of data visualization, as it clearly communicates the temperature trends and record breaks over a decade.

Data Visualization, Python

Tutorial on Data Visualization: Weather Data

Weather data analysis and visualization using Python’s Matplotlib

Data Visualization is more of an Art than Science. To produce a good visualization, you need to put several pieces of code together for an excellent end result. This tutorial demonstrates how a good data visualization can be produced by analyzing weather data.

This code performs the following:

  1. It returns a line graph of the record high and records low temperatures by day of the year over the period 2005–2014. The area between the record high and record low temperatures for each day of the year is shaded.
  2. Overlays a scatter of the 2015 data for any points (highs and lows) for which the ten-year record (2005–2014) record high or record low was broken in 2015.

Dataset: The NOAA dataset used for this project is stored in the file weather_data.csv. This data comes from a subset of the National Centers for Environmental Information (NCEI) Daily Global Historical Climatology Network (GHCN-Daily). The GHCN-Daily is comprised of daily climate records from thousands of land surface stations across the globe. The data was collected from data stations near Ann Arbor, Michigan, United States.

The complete code for this article can be downloaded from this repository: https://github.com/bot13956/weather_pattern.

1. Import necessary libraries and dataset

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df=pd.read_csv('weather_data.csv')
df.head()

2. Data preparation and analysis

#convert temperature from tenths of degree C to degree C
df['Data_Value']=0.1*df.Data_Value
days=list(map(lambda x: x.split('-')[-2]+'-'+x.split('-')[-1], df.Date))
years=list(map(lambda x: x.split('-')[0], df.Date))
df['Days']=days 
df['Years']=years
df_2005_to_2014=df[(df.Days!='02-29')&(df.Years!='2015')]
df_2015=df[(df.Days!='02-29')&(df.Years=='2015')]
df_max=df_2005_to_2014.groupby(['Element','Days']).max()
df_min = df_2005_to_2014.groupby(['Element','Days']).min()
df_2015_max=df_2015.groupby(['Element','Days']).max()
df_2015_min = df_2015.groupby(['Element','Days']).min()
record_max=df_max.loc['TMAX'].Data_Value
record_min=df_min.loc['TMIN'].Data_Value
record_2015_max=df_2015_max.loc['TMAX'].Data_Value
record_2015_min=df_2015_min.loc['TMIN'].Data_Value

3. Generate Data Visualization

plt.figure(figsize=(10,7)) 
plt.plot(np.arange(len(record_max)),record_max, '--k', label="record high") 
plt.plot(np.arange(len(record_max)),record_min, '-k',label="record low") 
plt.scatter(np.where(record_2015_min < record_min.values),             record_2015_min[record_2015_min < record_min].values,c='b',label='2015 break low')
plt.scatter(np.where(record_2015_max > record_max.values),             record_2015_max[record_2015_max > record_max].values,c='r',label='2015 break high') 
plt.xlabel('month',size=14) 
plt.ylabel('temperature($^\circ C$ )',size=14) 
plt.xticks(np.arange(0,365,31), ['Jan','Feb', 'Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']) 
ax=plt.gca() 
ax.axis([0,365,-40,40]) 
plt.gca().fill_between(np.arange(0,365),record_min, record_max,                   facecolor='blue',alpha=0.25) 
plt.title('Record temperatures for different months between 2005-2014',size=14) 
plt.legend(loc=0) 
plt.show()

In summary, we’ve shown how a simple data visualization plot can be generated using Python’s Matplotlib library.

The complete code for this article can be downloaded from this repository: https://github.com/bot13956/weather_pattern.

Data Science
Data Visualization
Python
Weather
Matplotlib
Recommended from ReadMedium