avatarAbdishakur

Summary

The article explains how to perform geocoding in Python using Geopy and Geopandas libraries to convert physical addresses to geographic locations (latitude and longitude).

Abstract

The article begins by defining geocoding as the process of converting physical addresses to geographic locations using computational methods. It then introduces Geopy and Geopandas libraries for performing geocoding in Python. The article provides step-by-step instructions on how to install these libraries, geocode single addresses, and geocode addresses from Pandas DataFrame. The article also includes code snippets and visualizations to help readers understand the process.

Opinions

  • The article suggests using Geopy library for geocoding single addresses and Geopandas library for geocoding addresses from Pandas DataFrame.
  • The article recommends using Nominatim Geocoding service built on top of OpenStreetMap data for geocoding addresses.
  • The article advises delaying geocoding by 1 second between each address when geocoding a large number of physical addresses to avoid being denied access to the service.
  • The article suggests using Folium for mapping out the geocoded points but encourages readers to use any other Geovisualization tool of their choice.
  • The article concludes by mentioning that there are other services that provide either free or paid geocoding services that readers can experiment with in GeoPy.
  • The article recommends using Google Maps geocoding services for more powerful results, but it requires an API key.
  • The article provides a GitHub repository and a Jupyter notebook Binder link for readers to interact and experiment with the tutorial without any installation.

Geocode with Python

How to Convert physical addresses to Geographic locations → Latitude and Longitude

Photo by Thor Alvis on Unsplash

Datasets are rarely complete and often require pre-processing. Imagine some datasets have only an address column without latitude and longitude columns to represent your data geographically. In that case, you need to convert your data into a geographic format. The process of converting addresses to geographic information — Latitude and Longitude — to map their locations is called Geocoding.

Geocoding is the computational process of transforming a physical address description to a location on the Earth’s surface (spatial representation in numerical coordinates) — Wikipedia

In this tutorial, I will show you how to perform geocoding in Python with the help of Geopy and Geopandas Libraries. Let us install these libraries with Pip if you have already Anaconda environment setup.

pip install geopandas
pip install geopy

If you do not want to install libraries and directly interact with the accompanied Jupyter notebook of this tutorial, there are Github link with MyBinder at the bottom of this article. This is a containerised environment that will allow you to experiment with this tutorial directly on the web without any installations. The dataset is also included in this environment so there is no need to download the dataset for this tutorial.

Geocoding Single Address

To geolocate a single address, you can use Geopy python library. Geopy has different Geocoding services that you can choose from, including Google Maps, ArcGIS, AzureMaps, Bing, etc. Some of them require API keys, while others do not need.

Geopy

As our first example, we use Nominatim Geocoding service, which is built on top of OpenStreetMap data. Let us Geocode a single address, the Eifel tower in Paris.

locator = Nominatim(user_agent=”myGeocoder”)
location = locator.geocode(“Champ de Mars, Paris, France”)

We create locator that holds the Geocoding service, Nominatim. Then we pass the locator we created to geocode any address, in this example, the Eifel tower address.

print(“Latitude = {}, Longitude = {}”.format(location.latitude, location.longitude))

Now, we can print out the coordinates of the location we have created.

Latitude = 48.85614465, Longitude = 2.29782039332223

Try some different addresses of your own. In the next section, we will cover how to geocode many addresses from Pandas Dataframe.

Geocoding addresses from Pandas

Let us read the dataset for this tutorial. We use an example of Store addresses dataset for this tutorial. The CSV file is available in this link.

Download the CSV file and read it in Pandas.

df = pd.read_csv(“addresses.csv”)
df.head()

The following table provides the first five rows of the DataFrame table. As you can see, there are no latitude and longitude columns to map the data.

Ddataframe

We concatenate address columns into one that is appropriate for geocoding. For example, the first address is:

Karlaplan 13,115 20,STOCKHOLM,Stockholms län, Sweden

We can join address columns in pandas like this to create an address column for the geocoding:

Once we create the address column, we can start geocoding as below code snippet.

  • #1 — We first delay our Geocoding 1 second between each address. This is convenient when you are Geocoding a large number of physical addresses as the Geocoding service provider can deny access to the service.
  • #2 — Create a df['location'] column by applying geocode we created.
  • #3 — Third, we can create latitude, longitude, and altitude as a single tuple column.
  • #4 — Finally, We split latitude, longitude, and altitude columns into three separate columns.

The above code produces a Dataframe with latitude and longitude columns that you can map with any Geographic visualisation tool of your choice. Let us look at the first few raws of our DataFrame, but first, we will clean out the unwanted columns.

df = df.drop([‘Address1’, Address3’, Address4’, Address5’,’Telefon’, ADDRESS’, ‘location’, ‘point’], axis=1)
df.head()
cleaned table with latitude and longitude

I will use Folium to map out the points we created but feel free to use any other Geovisualization tool of your choice. First, we display the locations as a circle map with Folium.

The map produced below shows the geocoded addresses as circles.

Map

Or if you prefer a dark background with an aggregated cluster of points, you can do the following:

Below is a dark background map with Clustered points map in Folium.

Clustered map

Conclusion

Geocoding is a critical task in many location tasks that require coordinate systems. In this article, we have seen how to do geocoding in Python. There are a lot of other services that provide either free or paid geocoding services that you can experiment within GeoPy. I find Google Maps geocoding services more powerfull than the Openstreetmap services we have used in this tutorial, but it requires an API key.

To interact and experiment with this tutorial without any installation, I created a Binder. Go this GitHub repository and click on launch binder.

Or directly to the Jupyter notebook Binder link here:

Pandas
Python
Geospatial
Geographicdatascience
Geocoding
Recommended from ReadMedium