avatarAman Ranjan Verma

Summary

This context provides a comprehensive guide on how to call REST APIs in Airflow using three effective techniques: HttpOperator, PythonOperator, and BashOperator.

Abstract

The context is an extensive guide on how to call REST APIs in Airflow, a platform designed to programmatically author, schedule, and monitor workflows. The guide discusses three techniques for incorporating REST API requests into Airflow DAGs: HttpOperator, PythonOperator, and BashOperator. Each technique is demonstrated with examples using an open REST endpoint provided by httpbin.org. The guide also highlights the advantages and disadvantages of each technique, helping readers make informed decisions based on their specific requirements.

Opinions

  • The HttpOperator provides a high-level abstraction that simplifies the process of making HTTP requests, but it lacks extensive control over handling requests and responses.
  • The PythonOperator allows complete freedom to write custom Python code for interacting with APIs, but it requires a greater amount of coding and a deeper understanding of Python.
  • The BashOperator enables the execution of shell commands, such as cURL or other command-line tools, but it has limitations in terms of control and flexibility compared to Python-based approaches.
  • The guide is intended for technical readers who are familiar with Airflow and have a basic understanding of REST APIs.
  • The guide encourages readers to choose the most suitable method for invoking REST APIs in Airflow based on their specific requirements.
  • The guide concludes by recommending readers to take into account the pros and cons of each operator in order to make well-informed choices and optimize their DAGs.
  • The guide is written by Aman Ranjan Verma, who encourages readers to subscribe to his upcoming blogs, applaud this article, and follow him on Medium, LinkedIn, and Twitter.

Airflow: 3 ways to call a REST API

Learn how to use HttpOperator, PythonOperator, and BashOperator to call a REST API in your DAG

Welcome to this extensive guide on how to call REST APIs in Airflow! In this blog post, we will discuss three effective techniques — HttpOperator, PythonOperator, and BashOperator — for smoothly incorporating REST API requests into your Airflow DAGs. No matter if you’re new to Airflow or have experience using it, this tutorial will give you the knowledge and tips you need to make the most of these operators.

ARV Original Creation, Airflow: 3 ways to call a REST API

Note: This blog is intended for technical readers who are familiar with Airflow and have a basic understanding of REST APIs.

Table of contents

· REST API example · Use HttpOperator to call a REST API endpointAdvantages of HttpOperator:Disadvantages of HttpOperator: · Use PythonOperator to call a REST API endpointAdvantages of PythonOperatorDisadvantages of PythonOperator · Use BashOperator to call a REST API endpointAdvantages of BashOperatorDisadvantages of BashOperator · Airflow Native way of making an API call: HttpHook · Conclusion

REST API example

In this blog, we will use an open REST endpoint provided by httpbin. httpbin.org is a simple HTTP request and response service.

We will be using it’s GET route in all of our examples. You don’t need to sign up to use this endpoint. https://www.httpbin.org/get

curl -X GET "https://www.httpbin.org/get" -H "accept: application/json"

Use HttpOperator to call a REST API endpoint

from airflow.providers.http.operators.http import SimpleHttpOperator
from airflow import DAG
from datetime import datetime

default_args = {
    "owner": "arv",
    "start_date": datetime(2023, 1, 1),
    "email_on_failure": False,
    "email_on_retry": False,
}


with DAG('api_call_dag', default_args=default_args, schedule_interval='@daily') as dag:
    http_call_api = SimpleHttpOperator(
        task_id='http_call_api',
        method='GET',
        endpoint='get',
        headers={"accept": "application/json"},
        log_response=True
    )

    http_call_api
Output Log

Advantages of HttpOperator:

  • It provides a high-level abstraction that simplifies the process of making HTTP requests.
  • Airflow connections are used to handle authentication and headers.
  • It supports different HTTP methods and validates the responses.

Disadvantages of HttpOperator:

  • Using lower-level libraries directly provides more flexibility compared to using these higher-level libraries.
  • Complex API interactions may require additional configuration.
  • There is a lack of extensive control over handling requests and responses.

Use PythonOperator to call a REST API endpoint

import json

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
import requests

default_args = {
    "owner": "arv",
    "start_date": datetime(2023, 1, 1),
    "email_on_failure": False,
    "email_on_retry": False,
}


def call_api():
    url = 'https://www.httpbin.org/get'
    headers = {'accept': 'application/json'}
    response = requests.get(url, headers=headers)
    print("Response", json.dumps(response.json(), indent=2))


with DAG('api_call_dag', default_args=default_args, schedule_interval='@daily') as dag:
    python_call_api = PythonOperator(
        task_id='python_call_api',
        python_callable=call_api,
    )

    python_call_api
Output Log

Advantages of PythonOperator

  • Allows complete freedom to write custom Python code for interacting with APIs.
  • It enables smooth integration with any Python library or framework.
  • It allows for intricate data processing and manipulation both before and after making API calls.

Disadvantages of PythonOperator

  • It requires a greater amount of coding and a deeper understanding of Python.
  • Additional error handling and exception management may be necessary.
  • There is a lack of native features for managing authentication and headers.

Use BashOperator to call a REST API endpoint

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

default_args = {
    "owner": "arv",
    "start_date": datetime(2023, 1, 1),
    "email_on_failure": False,
    "email_on_retry": False,
}


with DAG('api_call_dag', default_args=default_args, schedule_interval='@daily') as dag:
    bash_call_api = BashOperator(
        task_id='bash_call_api',
        bash_command='curl "https://www.httpbin.org/get" -H "accept: application/json"'
    )

    bash_call_api
Output Log

Advantages of BashOperator

  • Enables the execution of shell commands, such as cURL or other command-line tools.
  • Simple API calls without complex logic are convenient.
  • It is straightforward for developers who are familiar with shell scripting to use.

Disadvantages of BashOperator

  • Python-based approaches offer more control and flexibility, whereas this approach has limitations in terms of control and flexibility.
  • Additional configuration may be necessary for authentication and headers.
  • Handling complex data processing or error handling can be challenging.

Airflow Native way of making an API call: HttpHook

Conclusion

So far, you have learned three distinct methods for invoking REST APIs in Airflow. The HttpOperator, PythonOperator, and BashOperator each have their own strengths and weaknesses, so you can choose the most suitable method based on your specific requirements. With this knowledge, you can confidently include REST API calls in your Airflow workflows, allowing smooth integration with external systems. Make sure to take into account the pros and cons of each operator in order to make well-informed choices and optimize your DAGs. Have a great time coding with Airflow! 🚀 🤗

Thanks for spending time on this article! Before you go:

https://medium.com/towards-data-engineering
Data Engineering
Airflow
API
Software Development
Big Data
Recommended from ReadMedium