This article demonstrates how to build serverless Python Data APIs using Dockers on Google Cloud, utilizing services like Cloud Run, Cloud Shell, Cloud Build, Container Registry, and BigQuery.
Abstract
The article titled "Building Serverless Python Data APIs with Dockers on Google Cloud" explains how to create serverless data APIs using Python and deploying them with Dockers on Google Cloud. The architecture of the system utilizes various Google Cloud Platform (GCP) services such as Cloud Run for handling API requests, Cloud Shell for development, Cloud Build for building Dockers, Container Registry for hosting Dockers, and BigQuery as the data warehouse.
The article uses a public BigQuery dataset called "covid19_ecdc" for demonstration purposes. A GitHub project containing the necessary files for building the REST API is referenced. The main files include App.py for the Flask base project, Dockerfile for creating the Docker image, and requirements.txt for listing the required Python libraries.
Three functions are coded for achieving the data requirements, which include returning all the data from the BigQuery table, data from all countries on a specific date, and data from a specific country. The Docker image is built using Cloud Build and published to the Container Registry, and then deployed on the web using Cloud Run.
Bullet points
Article demonstrates building serverless Python Data APIs with Dockers on Google Cloud
Utilizes GCP services like Cloud Run, Cloud Shell, Cloud Build, Container Registry, and BigQuery
Uses public BigQuery dataset "covid19_ecdc" for demonstration
GitHub project containing necessary files for building REST API is referenced
Three functions coded for achieving data requirements
Docker image built using Cloud Build and published to Container Registry
Docker image deployed on the web using Cloud Run
Building Serverless Python Data APIs with Dockers on Google Cloud
In this article, I’ll show you a simple way to build in minutes a few Data APIs for exploiting data from a BigQuery dataset. These APIs will be deployed with dockers using a GCP serverless service called Cloud Run.
Architecture
Architecture
The idea behind is to work with serverless components. First, let’s understand these services and their purpose on the architecture.
Cloud Run: Cloud Run is a fully managed compute platform that automatically scales your stateless containers [Cloud Run Doc]. It will handle all the APIs requests since It’s fully managed we don’t need to worry about scaling. To achieve that first a docker must be deployed.
Cloud Shell: Cloud Shell provides you with command-line access to your cloud resources directly from your browser [Cloud Shell Doc]. Always a development environment is needed, and in this case, is Cloud Shell. The python code will be developed here.
Cloud Build: Cloud Build is a service that executes your builds on Google Cloud Platform infrastructure. Cloud Build execute a build to your specifications and produce artifacts such as Docker containers [Doc]. When we are ready to build our docker, we’ll call Cloud Build to do the job and by default, this docker will be published on the GCP Container registry.
Container Registry: provides secure, private Docker image storage on Google Cloud Platform [Doc]. In this place will be hosted our docker ready to be called by Cloud Build.
BigQuery: BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse. BigQuery is NoOps — there is no infrastructure to manage and you don’t need a database administrator [BigQuery Doc]. BigQuery is our data warehouse so all the data needed by the APIs live here.
Getting the data
For this project, we will be using a BigQuery public dataset called covid19_ecdc this contains data of confirmed cases and deaths by country, and by date.
Dataset
Create Dataset and View
Let’s create a BigQuery Dataset and a View or a Materialized View to get the data in our project.
Data
Making a plain exploration, we can identify that the schema and data. For this project, our API will extract
All the data
Cases from all countries on a specific day
All the Cases reported from a specific country
Table used
Boilerplate Project
Building a REST API with python is not difficult. To make simpler I published a GitHub project containing all the files.
The line of codes that are needed to modify include your GCP ID Project, Dataset, and your GCP credentials. Find more information about how to get your GCP credentials.
Dockerfile
A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image [Docker doc].
This project uses a slim python image, installing all the required libraries and finally initiating an HTTP Server with gunicorn to handle the API requests.
Dockerfile
Requirements.txt
List of all python libraries required for the project.
Designing API Requests
Object Relational Mapping
We use SQLAlchemy as an ORM, to translates Python classes to BigQuery tables. Marshmallow is applied to make object serialization and deserialization easier.
To make this code works is important to define the class name equal to your BigQuery table (by default Covid internally will change to covid to search for that table on BigQuery). In case you have other table name use __table__='YOUR_TABLE_NAME' on line 23.
CodidSchema let’s define you all the columns you want to be able to be called.
Requests
Achieving the three data requirements defined, we code 3 functions.
First our Data API with return all the data from the BigQuery table.
The function get_day() receive parameters like2020-04-04and return the data from all countries on that specific date.
The function country_detail() receive the country geocodes likePEand return the data from a specific country.
Build a docker image with Cloud Build
Each time we need to update our project you need to build a new version of your docker image. Exists many alternatives, here we use Cloud Build a tool that builds your docker image and publishes it to the Container Registry. Don’t forget to define your DOCKER_IMAGE_NAME.
I run it on Cloud Shell and remember to be located on the same level as your Dockerfile and your project_id must be settled (you need to see the yellow letters)
Build Docker image
If everything goes well you could see your Docker images published.
Container Registry
Deploy docker using Cloud Run
The last step is to deploy your Docker image on the Web! Cloud Run will in charge of everything, decide a SERVICENAME, and update the reference to your Docker image.
This article show you how easy is to develop a Data API with Python and serverless products like Cloud Run and BigQuery.
Having this API could enable us to build excellent Data Visualizations, an excellent example of the power of the Data API comes from the World Bank review the article from Sébastien Pierre here.
A special thanks to Martin Omander and Sagar Chand for excellent repositories and articles that help me to develop this article.
PS if you have any questions, or have an interesting data idea, you can find me on Twitter and LinkedIn. Also, if you are considering taking a Google Cloud certification I wrote a technical article describing my experiences and recommendations.