avatarMatthew Gazzano

Summary

The web content describes the Haversine Formula as a crucial tool for calculating geospatial distances, particularly in SQL-based geospatial reporting and logistics planning, without the need for external APIs.

Abstract

The Haversine Formula is presented as an essential method for accurately determining the distance between two points on Earth's surface, taking into account the planet's curvature. This formula, which can be implemented using SQL, is particularly useful for professionals involved in sales territory mapping, logistics planning, and geospatial reporting. The article explains the mathematical components of the formula, including the use of trigonometry and the conversion of latitude and longitude into radians. It also provides a practical example of how to apply the Haversine Formula in SQL, specifically within Google's BigQuery platform, to calculate distances and identify the nearest points of interest, such as finding the closest New York City Subway station to a Citibike station. The author acknowledges that while the Haversine Formula is not perfectly accurate due to Earth's oblate spheroid shape, it is sufficiently precise for many applications, distinguishing between the calculated "Great Circle Distance" and actual "Driving Distance."

Opinions

  • The Haversine Formula is preferred over API calls for its simplicity and independence from third-party services, which may have rate limits or costs associated with them.
  • The author emphasizes the importance of considering Earth's curvature when calculating distances, which the Haversine Formula accounts for.
  • There is an acknowledgment that Earth is not a perfect sphere but an oblate spheroid, which introduces a slight inaccuracy in the Haversine Formula's calculations.
  • The distinction between "Great Circle Distance" and "Driving Distance" is highlighted, with the former being the shortest path over the Earth's surface and the latter being the actual travelable distance on roadways.
  • The article suggests that the Haversine Formula is a valuable asset for logistics professionals, aiding in route optimization, resource allocation, and identifying hotspots for operational efficiency.

The Haversine Formula: A Must-Have for Geospatial Reporting

Determining the Distance from Point A to Point B using SQL

Photo by NASA on Unsplash

If you’ve ever been tasked with sales territory mapping, logistics planning, or general geospatial reporting, you’ve likely needed to calculate the distance from point A to point B. Sure, you can pull distance calculations from the Google Maps API, but who needs to worry about rate limits or budgeting for another API connection when you can just apply logic in your code?

Income the Haversine Formula — a powerful formula using spherical trigonometry that you can quickly implement to calculate distances in scale. This formula takes into account earth’s curvature when given latitude and longitude inputs, where the output can be expressed in the measurement of your choosing (miles / kilometers etc.)

The Formula

Image of the Law of Haversines, by Steven G. Johnson, published under the GNU Free Documentation License

The haversine formula can be expressed in 3 calculations:

a = sin²(φB — φA/2) + cos φA * cos φB * sin²(λB — λA/2) c = 2 * atan2( √a, √(1−a) ) d = R ⋅ c

a, c, and d represent a series of calculations built on top of each other, where we will need to plug in both series of our latitude and longitude points and apply some trigonometry calculations to them.

To make these mathematical symbols more accessible, let’s break them down in the following section, especially if you’re not familiar with trigonometric notation.

Understand how to calculate these:

π = The value of Pi, or 3.141592653589793238462643383279502884197

√ = The Square Root

  • The Square root denotes what number multiplied by itself will equal the given number.

φ = latitude in radians

  • A radian is a measurement of angles but uses the properties of circles instead of straight lines.
  • So, in this case φ = latitude * (π / 180)
  • simplified further: φ = latitude * (3.1415 / 180)
  • simplified once more: φ = latitude * 0.017453

λ = longitude in radians

  • λ = latitude * (π / 180)

R = earth’s average radius, expressed in miles or kilometers of your choosing

  • 3958.8 is earth’s average radius in miles.
  • 6371.04 is earth’s average radius in kilometers.

Use your programming language to calculate these:

sin²( ) = The squared sine of the value within the parenthesis

  • The sine function shows the ratio of the length of the side opposite to an angle to the length of the hypotenuse in a right triangle.
  • In the haversine formula, it enables us to calculate the angular distance between the points, which is crucial for determining their great-circle distance (shortest distance) accurately.
Photo by Author

cos = the cosine

  • The cosine is the ratio of the length of the adjacent side to the length of the hypotenuse in a right triangle
Photo by Author

atan2 = The arctangent of the ratio from 2 arguments

  • The arctangent is also known as the inverse of the tangent.
  • a Tangent is the ratio of the length of the side opposite to an angle to the length of the adjacent side in a right triangle.
Photo by Author

Disclaimers

The Haversine formula isn’t perfectly accurate, but it's very close.

You’ll notice I mentioned Earth’s average radius above. We use the average because the formula assumes that Earth is a perfect sphere. While it is a sphere (hopefully you can agree with this), it's not a perfect sphere. Earth is actually considered an oblate spheroid, which means it is mostly spherical but slightly flattened. This is due to earth’s rotation, which causes Earth to slightly flatten at the poles and bulge at the equator.

Image by Author

The Haversine Formula calculates “Great Circle Distance”, not “Driving Distance”

These are two separate metrics.

The Great Circle Distance calculates distances as a plane would fly from point A to point B and does not factor in available roadways. You can't calculate driving distance with a standalone formula- so if that's what you need, you’re best suited using a tool like the Google Maps API. As a general rule, the Great Circle Distance will always be less than or equal to the Driving Distance. See the below example of calculating distances from New York, NY to Chicago, IL to illustrate this:

Driving Distance: 790 Miles / 1271.38 Kilometers

Screenshot from Google Maps

Great Circle Distance: 711.19 Miles / 1144.54936 Kilometers

Screenshot from Google Maps

A full example using SQL against 2 location-based datasets

We can follow the below template using BigQuery SQL to execute the Haversine formula:

SELECT
latitude_1
, longitude_1
, latitude_2
, longitude_2
, ROUND(3958.8 * ACOS(COS(latitude_1 * 0.017453) * COS(latitude_2 * 0.017453) * COS((longitude_1 - longitude_2) * 0.017453) + SIN(latitude_1 * 0.017453) * SIN(latitude_2 * 0.017453)), 2) AS haversine_distance_in_miles
, ROUND(6371.04 * ACOS(COS(latitude_1 * 0.017453) * COS(latitude_2 * 0.017453) * COS((longitude_1 - longitude_2) * 0.017453) + SIN(latitude_1 * 0.017453) * SIN(latitude_2 * 0.017453)), 2) AS haversine_distance_in_kilometers
FROM table_with_locations

In the following example, I make use of the following BigQuery public datasets to determine the nearest New York City Subway station against a given New York City Citibike station:

Table 1: a list of NYC Citibike stations with their corresponding latitude and longitude

`bigquery-public-data.new_york_citibike.citibike_stations`

Screenshot from BigQuery

Table 2: a list of NYC Subway stations with their corresponding latitude and longitude

`bigquery-public-data.new_york_subway.station_entrances`

The below query is the full example of how I determine the closest subway station for every citibike station:

WITH distances AS (
  SELECT
  cb.station_id AS citibike_station_id
  , cb.name AS citibike_station_name
  , cb.latitude AS citibike_lat
  , cb.longitude AS citibike_lon
  , sub.station_name AS subway_station_name
  , sub.entrance_lat AS subway_lat
  , sub.entrance_lon AS subway_lon
  , ROUND(3958.8 * ACOS(COS(cb.latitude * 0.017453) * COS(sub.entrance_lat * 0.017453) * COS((cb.longitude - sub.entrance_lon) * 0.017453) + SIN(cb.latitude * 0.017453) * SIN(sub.entrance_lat * 0.017453)), 2) AS haversine_distance_in_miles
  , ROUND(6371.04 * ACOS(COS(cb.latitude * 0.017453) * COS(sub.entrance_lat * 0.017453) * COS((cb.longitude - sub.entrance_lon) * 0.017453) + SIN(cb.latitude * 0.017453) * SIN(sub.entrance_lat * 0.017453)), 2) AS haversine_distance_in_kilometers
  FROM `bigquery-public-data.new_york_citibike.citibike_stations` cb
  CROSS JOIN `bigquery-public-data.new_york_subway.station_entrances` sub
)

, distances_with_min AS (
  SELECT
  *
  , MIN(haversine_distance_in_miles) OVER (PARTITION BY citibike_station_id) AS min_distance_cal
  FROM distances
)

SELECT
citibike_station_id
, citibike_station_name
, citibike_lat
, citibike_lon
, subway_station_name AS nearest_subway_station_name
, subway_lat
, subway_lon
, haversine_distance_in_miles
, haversine_distance_in_kilometers
FROM distances_with_min
WHERE haversine_distance_in_miles = min_distance_cal
ORDER BY citibike_station_name

Breaking this query down:

  • In the distances common table expression, I perform a cross join to get both tables into one output, creating the cartesian product of all rows. Within this, I create two columns using the haversine formula haversine_distance_in_miles and haversine_distance_in_kilometers.
  • Then in distances_with_min, I determine the minimum value in miles of a related subway location to a given citibike station using a windows function. I gave it an alias of min_distance_cal.
  • Finally, I return all related fields from distances_with_min, and filter the output where the haversine_distance_in_miles = min_distance_cal. This only takes rows with the smallest distance between each set of latitude and longitude, effectively giving us the closest subway station to each citibike station.

Proof that this checks out:

Let's compare one record to ensure that we’re accurate — according to my query the Citi Bike station at Troy Ave & Park Pl, Brooklyn, NY 11213 is closest to the Utica Ave Subway Station on Utica Ave, Brooklyn, NY 11213. The query also outputs a value of 0.26 miles.

Screenshot from Google Maps

Google Maps calculates the distance as 1,818.08 feet or .0287 miles. The small delta between these numbers could be due to a number of factors:

  • The exact coordinates of where I placed each pin manually in the Google Maps UI (I couldn’t type in the coordinates for the “Measure Distance” option)
  • The rounding of the Pi value within my SQL query (I'd suggest using a Pi function if there is one available in your SQL environment, BigQuery Standard SQL doesn't explicitly have one, and I rounded the values for simplicity)
  • Small differences in recorded coordinate values in the latitude and longitude within the BigQuery datasets.

Conclusion

The Haversine formula is an invaluable tool in data analysis for logistics planning. Its ability to accurately calculate distances between geographical coordinates helps logistics professionals make informed decisions regarding route optimization, resource allocation, and hotspot identification. By leveraging this formula, organizations can streamline their operations, reduce costs, and enhance overall efficiency in the complex landscape of logistics management. Its simplicity and accuracy make it a dependable asset to navigate the challenges of geospatial planning with greater precision and effectiveness.

Geospatial
Sql
Bigquery
Maps
Trigonometry
Recommended from ReadMedium