How To Call REST API & Store Data in Databricks

Summary

The article provides a comprehensive guide on how to call a REST API from Databricks, process the JSON response, and store the data in delta tables within Databricks.

Abstract

The article in question delves into the technical process of interfacing with a REST API using Databricks, a cloud-based data analytics platform. It begins by introducing the concept of REST APIs and their role in modern web interactions, emphasizing their popularity and utility in software development. The author then illustrates how to utilize Databricks to process JSON data obtained from a REST service, specifically demonstrating this with the open-source project postcodes.io, which provides UK postcode data. The step-by-step guide includes establishing a connection to the API, setting up the necessary parameters for the REST call, making the GET request, and handling the response by converting it into a JSON format. Subsequently, the article explains how to transform the JSON data into a DataFrame, select relevant columns, and ultimately store the processed data in a delta table within Databricks. The article concludes by providing a visual representation of the results and encouraging readers to explore further topics in data engineering through a series of learning articles. Additionally, the author recommends an AI service as a cost-effective alternative to ChatGPT Plus (GPT-4).

Opinions

The author positions Databricks as a powerful and user-friendly platform for handling large volumes of data from diverse sources, including REST APIs.
There is an emphasis on the importance of understanding how to work with JSON data structures, which are commonly returned by REST APIs.
The author suggests that converting JSON data into a DataFrame is a critical step that is often done incorrectly, highlighting a common pitfall in data processing.
By showcasing the use of postcodes.io, the author implies that leveraging open-source projects can be beneficial for developers seeking to integrate various data sets into their analytics workflows.
The recommendation of ZAI.chat as an AI service indicates the author's belief in the value of cost-effective AI solutions that offer comparable functionality to more expensive options like ChatGPT Plus (GPT-4).

Databricks

Databricks is a popular cloud-based computing platform for data science and analytics. It simplifies the process of processing large volumes of data from files, streams, databases, and also REST services.

Databricks is a powerful platform that enables you to process large volumes of data from various sources. One of the main reasons for its success is that it makes it easy to identify and identify large amounts of data stored in disparate formats, databases, and other storage systems.

What is REST service

A REST API is an application program interface that facilitates interaction between computer systems on the Internet. REST APIs are typically specified using XML, JSON, YAML, or some other data serialization format.

REST APIs are becoming very popular, which means they are being used more often by programmers to build software applications. The architecture of a RESTful system is based on resources and these can include, but are not limited to data records in a database

Select only required columns

previous yields a dataframe with 2 columns, status and result. Column result is again in JSON format, extract the values from result as following

df_temp = df.selectExpr("string(status) as status","result['country'] as country", "result['european_electoral_region'] as european_electoral_region", "string(result['latitude']) as latitude", "string(result['longitude']) as longitude", "result['parliamentary_constituency'] as parliamentary_constituency", "result['region'] as region","'' as vld_status","'' as vld_status_reason")

Result looks as following

REST call Results in databricks

Here is the full snippet of code

Hope you had an insightful learning with REST API call from databricks and storing in a delta table.

Curious about learning further about Graph Databases, Neo4J, Python, Databricks Spark, and Data Engineering — Please follow this series of learning Articles to read and learn more about Python, Spark, and Data Engineering please follow Ramesh Nelluri and subscribe to medium

How To Call REST API & Store Data in Databricks

Databricks

What is REST service

Credits to postcodes.io

Code snippet explanation

Import required python packages

Setup input parameters to REST call

Make the call with API endpoint

Receive response and store as JSON

Convert the JSON data into a dataframe

Select only required columns

Finally write the data into a table

Result looks as following