avatarJan Kammerath

Summary

The article discusses extracting data on sushi restaurants in Manhattan using OpenStreetMap (OSM) and the Go programming language.

Abstract

The article delves into the process of using OpenStreetMap, a community-driven mapping service, to extract comprehensive data on sushi restaurants located in Manhattan, New York. It explains the challenges of obtaining such data from commercial map services due to cost and complexity, and introduces OSM as a cost-effective alternative. The author provides a detailed Go code example that filters OSM data to identify sushi restaurants in Manhattan by parsing Protobuf (PBF) files, which contain all the map data in a compact format. The code example demonstrates how to extract relevant information such as the restaurant's name, address, and cuisine type, while ensuring the data is specific to Manhattan by filtering based on zip codes. The article concludes by highlighting the potential of OSM for data science applications and encourages readers to support the OSM Foundation.

Opinions

  • The author suggests that commercial map services can be expensive for large-scale data extraction, positioning OSM as a more accessible and community-supported alternative.
  • OpenStreetMap is likened to the "Wikipedia of maps" due to its reliance on volunteer contributions and detailed information.
  • The complexity of OSM data, with its nodes, ways, and relations, is acknowledged, but the article demonstrates that with the right tools and knowledge, it can be effectively navigated and utilized.
  • The author expresses that while extracting data from OSM can be straightforward, understanding the various tags and community-managed data can be tricky.
  • There is an emphasis on the performance and speed of Go for processing large datasets, which is beneficial for data analytics purposes.
  • The article promotes the idea of contributing to OSM, either through donations or participation, to support the project's sustainability and growth.
  • The author recommends considering OSM and tools like Leaflet as alternatives to commercial map services for future projects.
  • The author endorses an AI service, ZAI.chat, as a cost-effective alternative to ChatGPT Plus (GPT-4), suggesting it offers similar performance and functionality.

Extracting OpenStreetMap With Go: Sushi šŸ£ Restaurants In Manhattan

Do you know how many Sushi restaurants exist in Manhattan, New York? It’s 67 in total. Try to Google it, you won’t easily find the number. Even more complicated to find a complete list of them.

I love Sushi šŸ£ — places are sometimes hard to find!

There are various map services out there that provide APIs to retrieve such information. However if you want to extract a large number of information like all public toilets in the United States, it’ll become pricey. Luckily, there’s OpenStreetMap.

What is OpenStreetMap?

OpenStreetMap (often OSM in short) is often referred to as the ā€œWikipedia of mapsā€ because it relies on a community of volunteers to contribute and maintain map data. Founded in 2004, it has achieved a level of detail in information about our planet that very few commercial services have achieved. Even greater, many places in OSM have the related Wikipedia articles in their metadata.

OpenStreetMap provides a variety of services and APIs. Most of them rate-limited, since it’s a volunteer project that relies on donations. Just like Wikipedia does. Hammering their servers would be plain evil. You don’t need to.

What OSM data can you download?

The answer is simple: absolutely everything. I won’t be going into the details of great software like Leaflet that essentially allows you to drop your Google Maps. Instead, I’ll be focusing on OSM and PBF files. The OSM files are essentially XML files of the individual data point (here’s an example of a sushi restaurant: Rolls & Bowls in West 62nd Street, NY). A data point can be a node, a relation or a way. As an alternatively of the pretty large XML files, there are Protobuf files (Protocol Buffer). The PBF file of the entire state of New York is ā€œjustā€ 414 MB: New York PBF file provided by Geofabrik. Note that both the OSM and the PBF file contain absolutely everything there is.

What nodes, relations and ways are

Nodes are the basic point features in OSM. Each node is defined by a unique identifier and has a set of geographic coordinates (latitude and longitude). Ways are ordered lists of nodes that form a polyline or polygon. A way can represent linear features (like roads, rivers, or trails) or area features (like parks or buildings). Relations are used to describe how other elements work together. They can be used to model complex relationships between nodes, ways, and even other relations.

The OSM relation of JFK Terminal 8 AirTran station in NY

A sushi restaurant in Manhattan is mostly either a way (the complete building structure) or a node (single point of interest). More complex things like JFK’s Terminal 8 are a relation containing ways and nodes.

Extracting the data with Go

The following code takes in the ā€œnew-york-latest.osm.pbfā€ file and decodes every entity in the file using the ā€œosmpbfā€ package. It check’s if the entity is either a way or a node. It then goes on to ensure that the entity has a tag ā€œamenityā€ with a value of ā€œrestaurantā€. The documentation on amenities is in the OSM wiki: Key:amenity. Further, the code extracts the zip code to ensure that the restaurant is in Manhattan. Remember that our PBF file contains the entire state of New York. Hence, we need to filter out everything else. Once that is done, it ensures there’s a ā€œcuisineā€ tag that contains ā€œsushiā€.

package main

import (
 "fmt"
 "log"
 "os"
 "sort"
 "strconv"
 "strings"
 "text/tabwriter"

 "github.com/qedus/osmpbf"
)

type Place struct {
 ID          int64
 Name        string
 Street      string
 ZipCode     int
 HouseNumber string
 City        string
}

func (p Place) GetAddressText() string {
 return fmt.Sprintf("%s %s, %d %s", p.HouseNumber, p.Street, p.ZipCode, p.City)
}

func parsePBF(file string) ([]Place, error) {
 places := []Place{}

 r, err := os.Open(file)
 if err != nil {
  return nil, err
 }
 defer r.Close()

 d := osmpbf.NewDecoder(r)
 err = d.Start(1)
 if err != nil {
  return nil, err
 }

 for {
  if v, err := d.Decode(); err != nil {
   break
  } else {
   isPlace := false
   placeId := int64(0)
   tags := make(map[string]string)

   switch obj := v.(type) {
   case *osmpbf.Node:
    tags = obj.Tags
    placeId = obj.ID
    isPlace = true
   case *osmpbf.Way:
    tags = obj.Tags
    placeId = obj.ID
    isPlace = true
   }

   if isPlace {
    zipCode, _ := strconv.Atoi(tags["addr:postcode"])
    if strings.ToLower(tags["amenity"]) == "restaurant" &&
     strings.Contains(strings.ToLower(tags["cuisine"]), "sushi") &&
     zipCode > 10001 && zipCode < 10282 {
     placeName := tags["name"]
     if placeName == "" {
      for k, v := range tags {
       if strings.HasPrefix(k, "name:") {
        placeName = v
        break
       }
      }
     }

     if placeName != "" {
      places = append(places, Place{
       ID:          placeId,
       Name:        placeName,
       Street:      tags["addr:street"],
       ZipCode:     zipCode,
       HouseNumber: tags["addr:housenumber"],
       City:        tags["addr:city"],
      })
     }
    }
   }
  }
 }

 return places, nil
}

func main() {
 pbfFile := "osm/new-york-latest.osm.pbf"
 places, err := parsePBF(pbfFile)
 if err != nil {
  log.Fatal(err)
 }

 // sort by name asc
 sort.Slice(places, func(i, j int) bool {
  return places[i].Name < places[j].Name
 })

 w := tabwriter.NewWriter(os.Stdout, 1, 1, 1, ' ', 0)
 fmt.Fprintln(w, "OSM ID\t\tName\t\tAddress")
 for _, place := range places {
  fmt.Fprintln(w, strconv.Itoa(int(place.ID))+"\t\t"+place.Name+"\t\t"+place.GetAddressText())
 }
 w.Flush()

 fmt.Printf("Total places: %d\n", len(places))
}

Pretty straight forward, isn’t it? Depending on what you want to extract it can be tricky at times to figure out the different tags, how they are managed by the community and what to look out for. Especially opening hours can be really challenging to extract given the complex format. The output of the code would be the following

OSM ID       Name                        Address
6216923972   Ajisai                      795 Lexington Avenue, 10065 
3676993343   Akina Sushi                 424 East 14th Street, 10009 New York
2724189347   Amber                       1406 3rd Avenue, 10075 
6219132403   Bayard Sushi                83A Bayard Street, 10013 
2081537413   Bonchon Chicken             525 5th Avenue, 10016 New York
5810383921   Bravo Kosher Pizza          17 Trinity Place, 10006 New York
9630352308   Douska                      63 Delancey Street, 10002 
8375908251   Dragon Sushi                1272 Amsterdam Avenue, 10027 New York
2111692824   Gan Asia                    691 Amsterdam Avenue, 10025 New York
3452345665   Geisha Sushi                3468 Broadway, 10031 
9654088078   Ger Sushi                   926 Amsterdam Avenue, 10025 New York
8250837717   Hanabi                      1450 2nd Avenue, 10021 
2827035685   Hane                        346 1st Avenue, 10009 
8669771128   Haru Sushi                  229 West 43rd Street, 10036 
6270650582   Hiramasa                    1312 Madison Avenue, 10128 
5707209574   IKYU                        1718 2nd Avenue, 10128 
11048684505  Ikyu                        1718 2nd Avenue, 10128 
2033992917   Imi Sushi                   656 Amsterdam Avenue, 10025 New York
9690757650   Inase                       1586 1st Avenue, 10028 
2494161089   Ise                         63 Cooper Square, 10003 
10619086494  Ito                         75 Barclay Street, 10007 New York
2641379134   JR Sushi 2                  86A West Broadway, 10007 
8599156665   Junko Sushi                 33-02 33rd Street, 10010 Astoria
10224616960  Kanoyama                    175 2nd Avenue, 10003 
7161597623   Kitaro Sushi                510 Amsterdam Avenue, 10024 New York
1383955419   Lili's 57                   200 West 57th Street, 10019 New York
5090752521   Lilli and Loo               1026 3rd Avenue, 10065 
2565861094   M & J                       600 East 14th Street, 10009 
3932353664   Masa                        10 Columbus Circle, 10019 New York
7159234141   Miyako                      642 Amsterdam Avenue, 10025 New York
7360048872   MoMo                        239 Park Avenue South, 10003 
10982822001  Momokawa                    1466 1st Avenue, 10075 
10115105172  Nakaji                      48 Bowery, 10013 
3450977039   Nara Sushi                  76 Pearl Street, 10004 New York
2768405698   Nikko Hibachi Asian Fusion  1280 Amsterdam Avenue, 10027 
8792950866   Nobi Sushi                  437 3rd Avenue, 10016 
8791561182   Nobu                        40 West 57th Street, 10019 New York
9579615886   Ooki Sushi                  1575 3rd Avenue, 10128 
10893513469  Poke                        343 East 85th Street, 10128 
7657278602   Rolls & Bowls               150 West 62nd Street, 10023 New York
5757232056   SUGARFISH by sushi nozawa   33 East 22nd Street, 10003 New York
6178461400   Sabi Sushi                  2 Pennsylvania Plaza, 10121 New York
9611990778   Sake Bar Hagi               245A West 51st Street, 10019 New York
10991141052  Sasabune                    401 East 73rd Street, 10021 
6285668319   Sushi By Bou                132 W 47th St, 10036 New York
8202108197   Sushi Dairo                 208 3rd Avenue, 10003 
2717836837   Sushi Ginza                 1065 1st Avenue, 10022 
5781268723   Sushi Ginza Onodera         461 5th Avenue, 10017 
10777014284  Sushi Goda                  1574 3rd Avenue, 10128 
9497022154   Sushi Jin                   316 East 84th Street, 10028 
7098499225   Sushi Kai                   332 East 9th Street, 10003 New York
10830817799  Sushi Mumi                  130 Saint Mark's Place, 10009 New York
10830817798  Sushi Nakazawa              23 Saint Mark's Place, 10009 New York
248757328    Sushi On Jones              217 Eldridge Street, 10002 New York
2724057626   Sushi Ren                   1584 2nd Avenue, 10028 
266206761    Sushi Seki                  1143 1st Avenue, 10065 New York
6852273047   Sushi Zo                    127 East 39th Street, 10016 
3142935176   Sushi of Gari               402 East 78th Street, 10075 
5189146499   Sushi teria                 350 5th Avenue, 10118 New York
2716012023   Sushiann                    38 East 51st Street, 10022 
2797681668   Tampopo Kitchen             805 West 187th Street, 10040 
5703494368   Tenzan 89                   1714 2nd Avenue, 10128 
6342212317   The Lobster Club            98 East 53rd Street, 10022 New York
2717845637   Totoya                      1144 1st Avenue, 10065 New York
266199261    Totoya                      1144 1st Avenue, 10065 New York
6109307685   Unique Omakase              120 1st Avenue, 10009 
3676873366   Yaki Sushi                  225 East 14th Street, 10003 New York
Total places: 67

There you’ll see, we’ve got 67 restaurants in New York that are tagged to be sushi restaurants. You could the exactly the same with absolutely any data there is in OpenStreetMap. The possibilities of that data are endless since you don’t have the limitations that exist with commercial services.

OpenStreetMap is great for data science

If you need geographic information for data analytics purposes, such as the number of fast food restaurants by chain and zip code, the OSM data is great. The performance of Go also allows you to extract large datasets very quickly. My export of sushi restaurants only took a few seconds to extract with my Macbook M2 Pro. You can also create ORC or parquet files to enrich your analytics software with it. Whenever you need large geographic datasets, OSM is pretty much the best choice out there.

If you like OSM, I’d highly recommend consider donating to the folks: Make a donation to the OpenStreetMap Foundation. There are also tons of other ways to participate and help the OpenStreetMap community. Just have a look at their website. Next time you consider using commercial maps services, have a look at OpenStreetMap and tools like Leaflet.

Thanks for reading. Jan

Data Science
Software Development
Programming
Openstreetmap
Maps
Recommended from ReadMedium