avatarAntonio Cachuan

Summary

This article demonstrates how to build serverless Python Data APIs using Dockers on Google Cloud, utilizing services like Cloud Run, Cloud Shell, Cloud Build, Container Registry, and BigQuery.

Abstract

The article titled "Building Serverless Python Data APIs with Dockers on Google Cloud" explains how to create serverless data APIs using Python and deploying them with Dockers on Google Cloud. The architecture of the system utilizes various Google Cloud Platform (GCP) services such as Cloud Run for handling API requests, Cloud Shell for development, Cloud Build for building Dockers, Container Registry for hosting Dockers, and BigQuery as the data warehouse.

The article uses a public BigQuery dataset called "covid19_ecdc" for demonstration purposes. A GitHub project containing the necessary files for building the REST API is referenced. The main files include App.py for the Flask base project, Dockerfile for creating the Docker image, and requirements.txt for listing the required Python libraries.

Three functions are coded for achieving the data requirements, which include returning all the data from the BigQuery table, data from all countries on a specific date, and data from a specific country. The Docker image is built using Cloud Build and published to the Container Registry, and then deployed on the web using Cloud Run.

Bullet points

  • Article demonstrates building serverless Python Data APIs with Dockers on Google Cloud
  • Utilizes GCP services like Cloud Run, Cloud Shell, Cloud Build, Container Registry, and BigQuery
  • Uses public BigQuery dataset "covid19_ecdc" for demonstration
  • GitHub project containing necessary files for building REST API is referenced
  • Three functions coded for achieving data requirements
  • Docker image built using Cloud Build and published to Container Registry
  • Docker image deployed on the web using Cloud Run

Building Serverless Python Data APIs with Dockers on Google Cloud

In this article, I’ll show you a simple way to build in minutes a few Data APIs for exploiting data from a BigQuery dataset. These APIs will be deployed with dockers using a GCP serverless service called Cloud Run.

Architecture

Architecture

The idea behind is to work with serverless components. First, let’s understand these services and their purpose on the architecture.

  • Cloud Run: Cloud Run is a fully managed compute platform that automatically scales your stateless containers [Cloud Run Doc]. It will handle all the APIs requests since It’s fully managed we don’t need to worry about scaling. To achieve that first a docker must be deployed.
  • Cloud Shell: Cloud Shell provides you with command-line access to your cloud resources directly from your browser [Cloud Shell Doc]. Always a development environment is needed, and in this case, is Cloud Shell. The python code will be developed here.
  • Cloud Build: Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build execute a build to your specifications and produce artifacts such as Docker containers [Doc]. When we are ready to build our docker, we’ll call Cloud Build to do the job and by default, this docker will be published on the GCP Container registry.
  • Container Registry: provides secure, private Docker image storage on Google Cloud Platform [Doc]. In this place will be hosted our docker ready to be called by Cloud Build.
  • BigQuery: BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. BigQuery is NoOps — there is no infrastructure to manage and you don’t need a database administrator [BigQuery Doc]. BigQuery is our data warehouse so all the data needed by the APIs live here.

Getting the data

For this project, we will be using a BigQuery public dataset called covid19_ecdc this contains data of confirmed cases and deaths by country, and by date.

Dataset

Create Dataset and View

Let’s create a BigQuery Dataset and a View or a Materialized View to get the data in our project.

Data

Making a plain exploration, we can identify that the schema and data. For this project, our API will extract

  • All the data
  • Cases from all countries on a specific day
  • All the Cases reported from a specific country
Table used

Boilerplate Project

Building a REST API with python is not difficult. To make simpler I published a GitHub project containing all the files.

Let’s understand the files

Project files

App.py

Main file. Flask base project, this includes other libraries to make a simpler API with flask_sqlalchemy and flask_marshmallow.

The line of codes that are needed to modify include your GCP ID Project, Dataset, and your GCP credentials. Find more information about how to get your GCP credentials.

Dockerfile

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image [Docker doc].

This project uses a slim python image, installing all the required libraries and finally initiating an HTTP Server with gunicorn to handle the API requests.

Dockerfile

Requirements.txt

List of all python libraries required for the project.

Designing API Requests

Object Relational Mapping

We use SQLAlchemy as an ORM, to translates Python classes to BigQuery tables. Marshmallow is applied to make object serialization and deserialization easier.

To make this code works is important to define the class name equal to your BigQuery table (by default Covid internally will change to covid to search for that table on BigQuery). In case you have other table name use __table__='YOUR_TABLE_NAME' on line 23.

CodidSchema let’s define you all the columns you want to be able to be called.

Requests

Achieving the three data requirements defined, we code 3 functions.

First our Data API with return all the data from the BigQuery table.

The function get_day() receive parameters like2020-04-04and return the data from all countries on that specific date.

The function country_detail() receive the country geocodes likePEand return the data from a specific country.

Build a docker image with Cloud Build

Each time we need to update our project you need to build a new version of your docker image. Exists many alternatives, here we use Cloud Build a tool that builds your docker image and publishes it to the Container Registry. Don’t forget to define your DOCKER_IMAGE_NAME.

gcloud builds submit — tag gcr.io/YOUR_PROJECT_ID/DOCKER_IMAGE_NAME

I run it on Cloud Shell and remember to be located on the same level as your Dockerfile and your project_id must be settled (you need to see the yellow letters)

Build Docker image

If everything goes well you could see your Docker images published.

Container Registry

Deploy docker using Cloud Run

The last step is to deploy your Docker image on the Web! Cloud Run will in charge of everything, decide a SERVICENAME, and update the reference to your Docker image.

gcloud run deploy SERVICENAME --image gcr.io/YOUR_PROJECT_ID/DOCKER_IMAGE_NAME --platform managed --region us-central1 --allow-unauthenticated

Finally, you get a public URL to start getting the data!

Data API Working

It’s time to get the data making the requests thought our browser.

Data from all countries

https://datahackservice-xx-uc.a.run.app/countries

Data from a specific date

https://datahackservice-xx-uc.a.run.app/day/2020-05-01

Data from a specific country

https://datahackservice-xx-uc.a.run.app/country/PE

Conclusion and Future work

This article show you how easy is to develop a Data API with Python and serverless products like Cloud Run and BigQuery.

Having this API could enable us to build excellent Data Visualizations, an excellent example of the power of the Data API comes from the World Bank review the article from Sébastien Pierre here.

A special thanks to Martin Omander and Sagar Chand for excellent repositories and articles that help me to develop this article.

PS if you have any questions, or have an interesting data idea, you can find me on Twitter and LinkedIn. Also, if you are considering taking a Google Cloud certification I wrote a technical article describing my experiences and recommendations.

Python
Data
Serverless
Docker
Towards Data Science
Recommended from ReadMedium