avatarShu Lee

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

5492

Abstract

yword">import</span> date <span class="hljs-keyword">import</span> matplotlib.ticker <span class="hljs-keyword">as</span> ticker</pre></div><p id="9741">Now, paste your API key into the code. Then paste in the series IDs you want to pull. Separate your series IDs by commas. In the code below, the “PADD_NAMES” variable serves as the column names you want in your final dataframe. You can also choose the date range you want to pull using the “startDate” and “endDate” variables.</p><div id="957f"><pre><span class="hljs-comment"># API Key from EIA</span> <span class="hljs-attr">api_key</span> = <span class="hljs-string">'PASTE YOUR API KEY HERE'</span></pre></div><div id="a97a"><pre><span class="hljs-comment"># PADD Names to Label Columns</span> <span class="hljs-comment"># Change to whatever column labels you want to use.</span></pre></div><div id="c53f"><pre><span class="hljs-attr">PADD_NAMES</span> = [<span class="hljs-string">'PADD 1'</span>,<span class="hljs-string">'PADD 2'</span>,<span class="hljs-string">'PADD 3'</span>,<span class="hljs-string">'PADD 4'</span>,<span class="hljs-string">'PADD 5'</span>]</pre></div><div id="b9cf"><pre><span class="hljs-comment"># Enter all your Series IDs here separated by commas</span> <span class="hljs-attr">PADD_KEY</span> = [<span class="hljs-string">'PET.MCRRIP12.M'</span>, <span class="hljs-string">'PET.MCRRIP22.M'</span>, <span class="hljs-string">'PET.MCRRIP32.M'</span>, <span class="hljs-string">'PET.MCRRIP42.M'</span>, <span class="hljs-string">'PET.MCRRIP52.M'</span>]</pre></div><div id="918c"><pre><span class="hljs-comment"># Initialize list - this is the final list that you will store all the data from the json pull. Then you will use this list to concat into a pandas dataframe. </span></pre></div><div id="690a"><pre><span class="hljs-attr">final_data</span> = []</pre></div><div id="9dc3"><pre><span class="hljs-comment"># Choose start and end dates</span> <span class="hljs-attribute">startDate</span> = '<span class="hljs-number">2009</span>-<span class="hljs-number">01</span>-<span class="hljs-number">01</span>' <span class="hljs-attribute">endDate</span> = '<span class="hljs-number">2021</span>-<span class="hljs-number">01</span>-<span class="hljs-number">01</span>'</pre></div><p id="02d7">Finally, make calls to the API to pull the data in json format. Your url link may change depending on what data set you are pulling. To check the exact url needed, look at the “API CALL TO USE” link in the API Query Browser.</p><figure id="fbe7"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*Xk0HSC-EJxIL8tOOyFvX_A.png"><figcaption>API CALL TO USE Link is the link you need to use to pull in the API. May be different than the code provided below.</figcaption></figure><div id="ca3f"><pre><span class="hljs-comment"># Pull in data via EIA API</span></pre></div><div id="9d18"><pre><span class="hljs-keyword">for</span> <span class="hljs-selector-tag">i</span> <span class="hljs-keyword">in</span> <span class="hljs-built_in">range</span>(<span class="hljs-built_in">len</span>(PADD_KEY)):</pre></div><div id="aabc"><pre><span class="hljs-attribute"> url</span> = <span class="hljs-string">'http://api.eia.gov/series/?api_key='</span> + api_key + <span class="hljs-string">'&series_id='</span> + PADD_KEY[i]</pre></div><div id="7d1d"><pre> r = requests.<span class="hljs-built_in">get</span>(url) json_data = r.json()

<span class="hljs-keyword">if</span> r.status_code == 200:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">'Success!'</span>)
<span class="hljs-keyword">else</span>:
    <span class="hljs-built_in">print</span>(<span class="hljs-string">'Error'</span>)

df = pd.DataFrame(json_data.<span class="hljs-built_in">get</span>(<span class="hljs-string">'series'</span>)[0].<span class="hljs-built_in">get</span>(<span class="hljs-string">'data'</span>),
                  columns = [<span class="hljs-string">'Date'</span>, PADD_NAMES[i]])</pre></div><div id="0f6d"><pre>    df.set_index(<span class="hljs-string">'Date'</span>, <span class="hljs-attribute">drop</span>=<span class="hljs-literal">True</span>, <span class="hljs-attribute">inplace</span>=<span class="hljs-literal">True</span>)
final_data.append(df)</pre></div><h1 id="4b13">Creating a Pandas DataFrame</h1><p id="0de1">Now combine all your data into one pandas dataframe and edit the dates to create a time series.</p><div id="a78c"><pre><span class="hljs-comment"># Combine all the data into one dataframe</span>

<span class="hljs-attr">crude</span> = pd.concat(final_data, axis=<span class="hljs-number">1</span>)</pre></div><div id="3436"><pre># <span class="hljs-symbol">Create</span> date as datetype datatype crude[<span class="hljs-string">'Year'</span>] = crude.index.astype(str).str[:<span class="hljs-number">4</span>] crude[<span class="hljs-string">'Month'</span>] = crude.index.astype(str).str[<span class="hljs-number">4</span>:] crude[<span class="hljs-string">'Day'</span>] = <span class="hljs-number">1</span> crude[<span class="hljs-string">'Date'</span>] = pd.to_datetime(crude[[<span class="hljs-string">'Year'</span>,<span class="hljs-string">'Month'</span>,<span class="hljs-string">'Day'</span>]]) crude.set_index(<span class="hljs-string">'Date'</span>,drop=<span class="hljs-symbol">True</span>,inplace=<span class="hljs-symbol">True</span>) crude.sort_index(inplace=<span class="hljs-symbol">True</span>)

Options

crude = crude[startDate:endDate] crude = crude.iloc[:,:<span class="hljs-number">5</span>]</pre></div><p id="380b">Now you have a pandas dataframe that is easy to manipulate, analyze, and visualize however you want!</p><figure id="6bd5"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*XC7l2IhG1k6p6YGnkVOKRA.png"><figcaption></figcaption></figure><p id="dd06">Below is the graph of the dataframe above. Again, the code for the visualization comes from <a href="https://towardsdatascience.com/visualizing-covid-19-data-beautifully-in-python-in-5-minutes-or-less-affc361b2c6a">this article</a>, but I’ve reproduced my exact code below.</p><figure id="250b"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*YhaXtU7ZSM5qv6kQrmjxIQ.jpeg"><figcaption></figcaption></figure><div id="55a0"><pre><span class="hljs-attr"># Generating Colours and Style colors = {'PADD 1</span><span class="hljs-string">':'</span><span class="hljs-attr">#045275</span><span class="hljs-string">', '</span>PADD <span class="hljs-number">2</span><span class="hljs-string">':'</span><span class="hljs-attr">#089099</span><span class="hljs-string">', '</span>PADD <span class="hljs-number">3</span><span class="hljs-string">':'</span><span class="hljs-attr">#7</span>CCBA<span class="hljs-number">2</span><span class="hljs-string">', '</span>PADD <span class="hljs-number">4</span><span class="hljs-string">':'</span><span class="hljs-attr">#7</span>C<span class="hljs-number">1</span>D<span class="hljs-number">6</span>F<span class="hljs-string">', '</span>PADD <span class="hljs-number">5</span><span class="hljs-string">':'</span><span class="hljs-attr">#DC3977</span><span class="hljs-string">'}</span></pre></div><div id="1a52"><pre>plt<span class="hljs-selector-class">.style</span><span class="hljs-selector-class">.use</span>(<span class="hljs-string">'fivethirtyeight'</span>)</pre></div><div id="6aa7"><pre><span class="hljs-comment"># Creating the Visualization</span> <span class="hljs-attr">plot</span> = crude.plot(figsize=(<span class="hljs-number">12</span>,<span class="hljs-number">8</span>), <span class="hljs-attr">color</span>=list(colors.values()), <span class="hljs-attr">linewidth</span>=<span class="hljs-number">5</span>, <span class="hljs-attr">legend</span>=<span class="hljs-literal">False</span>)</pre></div><div id="1b34"><pre>plot<span class="hljs-selector-class">.yaxis</span><span class="hljs-selector-class">.set_major_formatter</span>(ticker<span class="hljs-selector-class">.StrMethodFormatter</span>(<span class="hljs-string">'{x:,.0f}'</span>)) plot<span class="hljs-selector-class">.grid</span>(<span class="hljs-attribute">color</span>=<span class="hljs-string">'#d4d4d4'</span>) plot<span class="hljs-selector-class">.set_xlabel</span>(<span class="hljs-string">'Date'</span>) plot<span class="hljs-selector-class">.set_ylabel</span>(<span class="hljs-string">'Crude Inventory Levels (KBD)'</span>)</pre></div><div id="845b"><pre><span class="hljs-comment"># Assigning Colour</span> <span class="hljs-keyword">for</span> padd <span class="hljs-keyword">in</span> list(colors.<span class="hljs-built_in">keys</span>()): plot.<span class="hljs-keyword">text</span>(x = crude.index[<span class="hljs-number">-1</span>], y = crude[padd].<span class="hljs-built_in">max</span>(), color = colors[padd], s = padd, weight = <span class="hljs-string">'bold'</span>)</pre></div><div id="f694"><pre># Adding Labels plot.text(x = crude.<span class="hljs-keyword">index</span>[<span class="hljs-number">1</span>], y = <span class="hljs-type">int</span>(crude.max().max())+<span class="hljs-number">1300</span>, s = "Crude Consumption by PADD (thousand barrels per day)", fontsize = <span class="hljs-number">23</span>, weight = <span class="hljs-string">'bold'</span>, alpha = <span class="hljs-number">.75</span>)</pre></div><div id="8b52"><pre>plot.text(<span class="hljs-keyword">x</span> = crude.<span class="hljs-built_in">index</span>[<span class="hljs-number">1</span>], <span class="hljs-keyword">y</span> = <span class="hljs-keyword">int</span>(crude.<span class="hljs-built_in">max</span>().<span class="hljs-built_in">max</span>())+<span class="hljs-number">900</span>, s = <span class="hljs-comment">"Crude consumption by refineries and blenders is a </span> proxy <span class="hljs-keyword">for</span> crude demand in each region<span class="hljs-comment">", </span> fontsize = <span class="hljs-number">16</span>, alpha = .<span class="hljs-number">75</span>)</pre></div><div id="425d"><pre>plot.text<span class="hljs-params">(<span class="hljs-attr">x</span> = crude.index[1], <span class="hljs-attr">y</span> = -1000, <span class="hljs-attr">s</span> = 'Source: Energy Information Administration www.eia.gov', <span class="hljs-attr">fontsize</span> = 10)</span></pre></div><h1 id="b3db">If you found this interesting:</h1><p id="eff5"><i>You can <a href="https://shuzlee.medium.com/subscribe"><b>subscribe</b></a> to get notified when new articles are published. You can also find me on <a href="https://www.linkedin.com/in/shuzlee/"><b>LinkedIn</b></a><b> </b>or at my <a href="https://github.com/shuzlee"><b>GitHub repository</b></a><b>.</b></i></p></article></body>

Use Python To Pull Energy Data From The Department of Energy’s EIA API

Photo by Appolinary Kalashnikova on Unsplash

The Energy Information Administration (EIA) is responsible for the US Department of Energy’s statistics and data. There is a wealth of data and information about all things related to energy in the United States including data on renewable energy, nuclear energy, crude oil, gasoline, jet fuel, diesel, electricity, and natural gas.

However, navigating and finding data in the EIA’s website can be tricky. To help with this, the EIA created an API for ease of access. You can find the API at www.eia.com/opendata. The EIA releases new data weekly, monthly, or annually depending on the different types of data. The API is very useful for refreshing weekly and monthly data when they are released.

For example, I created the graph below using data pulled from the EIA’s API. Every time monthly data is released, I can re-run the code and the graph will update automatically. For the visualization, I got the code from this article.

API Access — API Key

To begin, you need to obtain an API key from the EIA. Go to www.eia.com/opendata and click on the “Register Now” button.

This will bring you to a registration form that looks like this. Enter your information and click “Register”. You will be sent an email containing your new API key. It might be sent to your junk folder so be sure to check there if you haven’t received the API key within a few minutes.

Finding Data Sets

Now you can search for the data sets you want to pull via the API. You can browse the data sets by clicking on “API Query Browse” on the API’s homepage.

On the next page, you can click on the links under “Children Categories” to search for the data you need.

Another way to search for data is through clicking “Sources & Uses” at the top of the page and browsing the website. When you come across a data set you want, the EIA typically publishes a link to that data set in the API.

Obtaining the Series ID

Every series of data comes with a particular Series ID. You’ll use this Series ID along with your API key to pull the data set from the database. For example, the picture below shows the Series ID for crude oil consumption data in PADD 3 (US Gulf Coast). The code allows you to pull multiple series in at once. Keep track of all the series IDs you want to use.

Fun fact: PADD stands for Petroleum Administration for Defense District. These districts were created in 1942 during World War II to organize the distribution of fuel like gasoline and diesel. Today PADDs are still used to organize data by region.

Source: EIA

Pulling in Data Using Python

Now that we have our API key and the Series IDs, we can write the Python code to access the data. First, import the necessary libraries. We’ll use pandas, numpy, requests, matplotlib, and datetime. The data is in json format so we’ll need the requests library to read the data correctly.

#Import libraries
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
from datetime import date
import matplotlib.ticker as ticker

Now, paste your API key into the code. Then paste in the series IDs you want to pull. Separate your series IDs by commas. In the code below, the “PADD_NAMES” variable serves as the column names you want in your final dataframe. You can also choose the date range you want to pull using the “startDate” and “endDate” variables.

# API Key from EIA
api_key = 'PASTE YOUR API KEY HERE'
# PADD Names to Label Columns
# Change to whatever column labels you want to use.
PADD_NAMES = ['PADD 1','PADD 2','PADD 3','PADD 4','PADD 5']
# Enter all your Series IDs here separated by commas
PADD_KEY = ['PET.MCRRIP12.M',
'PET.MCRRIP22.M',
'PET.MCRRIP32.M',
'PET.MCRRIP42.M',
'PET.MCRRIP52.M']
# Initialize list - this is the final list that you will store all the data from the json pull. Then you will use this list to concat into a pandas dataframe. 
final_data = []
# Choose start and end dates
startDate = '2009-01-01'
endDate = '2021-01-01'

Finally, make calls to the API to pull the data in json format. Your url link may change depending on what data set you are pulling. To check the exact url needed, look at the “API CALL TO USE” link in the API Query Browser.

API CALL TO USE Link is the link you need to use to pull in the API. May be different than the code provided below.
# Pull in data via EIA API
for i in range(len(PADD_KEY)):
    url = 'http://api.eia.gov/series/?api_key=' + api_key +
          '&series_id=' + PADD_KEY[i]
    r = requests.get(url)
    json_data = r.json()
    
    if r.status_code == 200:
        print('Success!')
    else:
        print('Error')
    
    df = pd.DataFrame(json_data.get('series')[0].get('data'),
                      columns = ['Date', PADD_NAMES[i]])
    df.set_index('Date', drop=True, inplace=True)
    final_data.append(df)

Creating a Pandas DataFrame

Now combine all your data into one pandas dataframe and edit the dates to create a time series.

# Combine all the data into one dataframe
crude = pd.concat(final_data, axis=1)
# Create date as datetype datatype
crude['Year'] = crude.index.astype(str).str[:4]
crude['Month'] = crude.index.astype(str).str[4:]
crude['Day'] = 1
crude['Date'] = pd.to_datetime(crude[['Year','Month','Day']])
crude.set_index('Date',drop=True,inplace=True)
crude.sort_index(inplace=True)
crude = crude[startDate:endDate]
crude = crude.iloc[:,:5]

Now you have a pandas dataframe that is easy to manipulate, analyze, and visualize however you want!

Below is the graph of the dataframe above. Again, the code for the visualization comes from this article, but I’ve reproduced my exact code below.

# Generating Colours and Style
colors = {'PADD 1':'#045275', 
          'PADD 2':'#089099', 
          'PADD 3':'#7CCBA2', 
          'PADD 4':'#7C1D6F', 
          'PADD 5':'#DC3977'}
plt.style.use('fivethirtyeight')
# Creating the Visualization
plot = crude.plot(figsize=(12,8), 
                  color=list(colors.values()), 
                  linewidth=5, 
                  legend=False)
plot.yaxis.set_major_formatter(ticker.StrMethodFormatter('{x:,.0f}'))
plot.grid(color='#d4d4d4')
plot.set_xlabel('Date')
plot.set_ylabel('Crude Inventory Levels (KBD)')
# Assigning Colour
for padd in list(colors.keys()):
    plot.text(x = crude.index[-1], y = crude[padd].max(), color = 
    colors[padd], s = padd, weight = 'bold')
# Adding Labels
plot.text(x = crude.index[1], 
          y = int(crude.max().max())+1300, 
          s = "Crude Consumption by PADD (thousand barrels per
               day)", 
          fontsize = 23, 
          weight = 'bold', 
          alpha = .75)
plot.text(x = crude.index[1], 
          y = int(crude.max().max())+900, 
          s = "Crude consumption by refineries and blenders is a  
          proxy for crude demand in each region", 
          fontsize = 16, 
          alpha = .75)
plot.text(x = crude.index[1], 
          y = -1000,
          s = 'Source: Energy Information Administration
          www.eia.gov', 
          fontsize = 10)

If you found this interesting:

You can subscribe to get notified when new articles are published. You can also find me on LinkedIn or at my GitHub repository.

Data Science
Python
API
Energy
Machine Learning
Recommended from ReadMedium