Python Hands-on Tutorial
Comparison of Geocoding Services Applied to Stroke-Care Facilities in Vietnam with Python
With a close look at match rate and spatial accuracy
This work was co-authored with Kai Kaiser and Mahdi Fayazbakhsh. All errors and omissions are those of the author(s).
Geocoding is the process of converting addresses in text format — the information necessary to locate a building, a plot of land, or structure, generally used in a specific format and contains things like political boundaries, street names, building numbers, organization names, and postal codes — into geographic coordinates like latitude and longitude. Geocoding when only addresses are available is the first step to location validation (e.g., by satellite overlay), and analytics (e.g., access analysis, climate exposure, and any spatially based study require geocoding if the solution requires the knowledge of where points of interest (e.g., health facilities, population, schools, roads) are located.
Especially when geocoding with existing tools is applied in developing countries, analysts need to pay close attention to both the process and pitfalls of doing this conversion. Especially in settings with weak addressing, this may result in gaps in translation.
We demonstrate this by using a validated data set of addresses of stroke facilities in Vietnam manually geocoded and validated by the World Bank team in Vietnam.
In this blog, we will geocode the addresses of stroke care facilities in Vietnam with multiple services available through OpenStreetMap, Mapbox, and Google in Python and demonstrate a method to calculate the quality of geocoding. There are various metrics reported in the literature for measuring the quality of geocoding.

In this blog, we will report on the match rate (percentage of all records capable of being geocoded), and the spatial accuracy (frequency distribution of the distances between matchable geocodes and ground truth locations).
The goal is not to recommend the use of one over the other but to showcase how such an evaluation can be easily done in a Jupyter Notebook environment (JPNE) in Python for different country contexts.
The dataset that we are using for this exercise is the list of 106 stroke care facilities in Vietnam for which the complete address is known. Our team has also manually compiled the ground truth data of these locations and the latitude and longitude are verified.

1. Geocoding using OpenStreetMaps (OSM) Data through Geopy
Geopy is a Python Client that makes it possible for Python developers to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources. Nominatim is a tool to search OpenStreetMap data by name and address (geocoding) and to generate synthetic addresses of OSM points (reverse geocoding). Through the geopy client in Python, it is possible to use Nominatim to query OSM data for geocoding addresses as demonstrated in the code below -












