avatarChristianlauer

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

1778

Abstract

png"><figcaption>Illustration of nested and recurring data — Image by Author</figcaption></figure><p id="da94">The address column contains an array of values. The different addresses in the array are the recurring data. The different fields within each address are the nested data.</p><h2 id="335d">First Example</h2><p id="2000">So we start with our first query, in this record nested fields are all_tags.key and all_tags.value in which information like the name, address or additional information about what kind of object it is is stored. While the fields like osm_id and osm_timestamp are normal fields.</p><figure id="a016"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/1*G7vfjaBvQrJbIRXmeRC28Q.png"><figcaption>Data Example — Image by Author</figcaption></figure><p id="549d">In this example I want to query points which represent a supermarket:</p><div id="be80"><pre><span class="hljs-keyword">SELECT</span> *</pre></div><div id="f2b2"><pre> FROM bigquery-<span class="hljs-keyword">public</span>-<span class="hljs-keyword">data</span>.geo_openstreetmap.planet_features</pre></div><div id="b7d5"><pre><span class="hljs-keyword">Where</span> (‘shop’,’supermarket’) <span class="hljs-keyword">IN</span></pre></div><div id="b3e7"><pre> (<span class="hljs-built_in">SELECT</span> (<span class="hljs-built_in">KEY</span>, value)</pre></div><div id="5e5e"><pre> <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">unnest</span> (all_tags))</pre></div><p id="24ad">This would give the above result. But feel free to test it yourself. More keys and values you can query you will find on the OpenStreetMaps <a href="https://www.openstreetmap.de/">documentary</a>[2].</p><h2 id="1b2d">Second Example</h2><p id="e585">It would be too simple, howev

Options

er, if it were to always follow the same pattern. In JSON formats, there are also double-nested fields — that is, an array within an array. This is mapped in nested field like here in OpenStreetMap dataset with colons:</p><div id="2f1f"><pre><span class="hljs-keyword">SELECT</span> *</pre></div><div id="ac7d"><pre> FROM bigquery-<span class="hljs-keyword">public</span>-<span class="hljs-keyword">data</span>.geo_openstreetmap.planet_features</pre></div><div id="79c9"><pre>Where (‘addr<span class="hljs-symbol">:city</span>’,’Berlin’) IN</pre></div><div id="11ff"><pre> (<span class="hljs-built_in">SELECT</span> (<span class="hljs-built_in">KEY</span>, value)</pre></div><div id="49d3"><pre> <span class="hljs-keyword">FROM</span> <span class="hljs-built_in">unnest</span> (all_tags))</pre></div><p id="0b49">So here I have to specify the desired connected object city after the key addr and the colon. The result would then be all objects that are assigned to the address:city = Berlin. Actually the same pattern as in the first example, but you have to know it. Here you usually have to study the data source carefully, so that you do not stumble over such pitfalls.</p><h2 id="d535">Summary</h2><p id="afe3">New systems like BigQuery offer extremely high computing power and fast results through column-based databases. With the above examples, you should be able to query and process most of your nested data use cases very well.</p><h2 id="2c8e">Sources and further Readings</h2><p id="9f16">[1] Google, <a href="https://console.cloud.google.com/marketplace/details/openstreetmap/geo-openstreetmap">OpenStreetMap Public Dataset</a></p><p id="7a1c">[2] OpenStreetMap, <a href="https://www.openstreetmap.de/">https://www.openstreetmap.de/</a> (2021)</p></article></body>

How to work with Nested Data

Cheat Sheet when working with Nested Data Structures

Photo by S Migaj on Unsplash

In the field of Data Science or Business Intelligence, one often comes into contact with nested data, for example when working with data from Google Maps or its open source counterpart OpenStreetMap and many other data sources. How you can easily query this data I will show shortly so you can use it as a template for your own use cases.

Toolset

In this walkthrough, I will use Google’s BigQuery (You can also use it in the FreeTier), SQL and the free dataset OpenStreetMap Public Dataset. It’s great that Google published it for free in their public data sets. So you can easily query geo information with SQL [1]. Keep in mind that Google uploaded this data set once without any updates yet.

What is Nested Data?

BigQuery but also many other newer database systems supports loading and querying nested and recurring data from source formats that support object-based schemas for example JSON. Here is a short illustration of nested data:

Illustration of nested and recurring data — Image by Author

The address column contains an array of values. The different addresses in the array are the recurring data. The different fields within each address are the nested data.

First Example

So we start with our first query, in this record nested fields are all_tags.key and all_tags.value in which information like the name, address or additional information about what kind of object it is is stored. While the fields like osm_id and osm_timestamp are normal fields.

Data Example — Image by Author

In this example I want to query points which represent a supermarket:

SELECT *
 FROM `bigquery-public-data.geo_openstreetmap.planet_features`
Where (‘shop’,’supermarket’) IN
 (SELECT (KEY, value)
 FROM unnest (all_tags))

This would give the above result. But feel free to test it yourself. More keys and values you can query you will find on the OpenStreetMaps documentary[2].

Second Example

It would be too simple, however, if it were to always follow the same pattern. In JSON formats, there are also double-nested fields — that is, an array within an array. This is mapped in nested field like here in OpenStreetMap dataset with colons:

SELECT *
 FROM `bigquery-public-data.geo_openstreetmap.planet_features`
Where (‘addr:city’,’Berlin’) IN
 (SELECT (KEY, value)
 FROM unnest (all_tags))

So here I have to specify the desired connected object city after the key addr and the colon. The result would then be all objects that are assigned to the address:city = Berlin. Actually the same pattern as in the first example, but you have to know it. Here you usually have to study the data source carefully, so that you do not stumble over such pitfalls.

Summary

New systems like BigQuery offer extremely high computing power and fast results through column-based databases. With the above examples, you should be able to query and process most of your nested data use cases very well.

Sources and further Readings

[1] Google, OpenStreetMap Public Dataset

[2] OpenStreetMap, https://www.openstreetmap.de/ (2021)

Nested Data Structures
Json
Data Science
Bigquery
Technology
Recommended from ReadMedium