avatarTech Zero

Summary

The article outlines the process of sending parameters from Azure Data Factory (ADF) to Databricks and receiving output back in ADF.

Abstract

This article provides a step-by-step guide for data engineers on integrating Databricks with Azure Data Factory (ADF) pipelines. It details how to initialize parameters and variables in ADF, configure the Databricks notebook to accept these parameters and return outputs using widgets and the dbutils API, and finally, how to set up ADF activities to send parameters to Databricks and receive outputs in return. The use case presented involves sending a 'country' parameter from ADF to Databricks, which then returns an 'continent' output based on the input value. The tutorial ensures that by following these steps, users can effectively exchange data between ADF and Databricks.

Opinions

  • The author emphasizes the practicality of the guide, suggesting it is a "quick and easy way" to send parameters and receive outputs between ADF and Databricks.
  • The use of default values for parameters in ADF is presented as a convenient feature for running pipelines.
  • The author's approach to using Databricks widgets and the dbutils.notebook.exit function to send data back to ADF indicates a preference or recommended practice for this method of output handling.
  • The article assumes familiarity with both ADF and Databricks, indicating that it is written for readers with some experience in these tools.
  • By providing screenshots and code snippets, the author conveys a commitment to clarity and user assistance, aiming to facilitate an easier understanding and implementation of the process.

How To Send Parameters From ADF to Databricks and Receive Output From Databricks

This article explains how to send parameters to Databricks from ADF and receive output from Databricks in ADF.

Quite often as a Data Engineer, I need to use Databricks as part of my Azure Data Factory Data Pipeline. This involves configuring the pipeline’s ability to send parameters to Databricks and in turn, receive output from the Databricks. This article shows you a quick and easy way of how to do it through an example.

Use Case: A country parameter needs to be sent from ADF to Databricks. Country value is Canada. Databricks will accept the parameter and send an output called continent with value of North America back to ADF.

Requirement: ADF pipeline should be able to send the parameter to Databricks and in turn, receive the output from Databricks.

Assumption: A Databricks Notebook is already available.

Step 1: Initialize a New Parameter and Variable in ADF

Open the canvas on ADF and create a new pipeline. To begin with the new pipeline, create a new parameter called ‘country’ and a new variable called ‘continent’. Your pipeline should look like this-

(ADF create parameter)
(ADF create variable)

For the sake of this article, I have provided a default value for country parameter as Canada. When the pipeline runs, it will send this default value to Databricks. When Databricks concludes, it will send a value back to ADF that I will store in continent variable.

Step 2: Open the Databricks Notebook

Inside the Databricks notebook, I have created a small code in Python that will accept the parameters from ADF and in turn, send a value back to ADF.

(Databricks Notebook Code)

We use widgets (dbutils.widgets.get()) in Databricks and call them through the Widgets API to perform multiple tasks — in this case, that task is to accept an incoming parameter.

Next, the code checks if the value of incoming parameter is Canada then send North America as an output. This output is stored in a new variable defined inside of Databricks called continent.

Finally, using dbutils.notebook.exit, I send the value of continent back to ADF.

Step 3: Configure ADF To Send Parameter to Databricks

Drag Databricks Notebook activity to ADF canvas and connect to the Databricks notebook through your Linked Service. Once done, move to the Settings pane and use ‘Base parameters’ to send parameters to Databricks.

(ADF send parameter to Databricks)

The name of the parameter being sent from ADF is country so the same name should be used in the dbutils.widgets.get(“country”) code in Databricks.

Step 4: Configure ADF To Receive Parameters From Databricks

I created a blank variable at the beginning called continent. This is now used to store the incoming output from Databricks.

Drag the Set variable activity to ADF canvas and connect it to the Notebook activity.

In the Set variable activity, set the variable named continent and assign it a dynamic value — @activity(‘Notebook1’).output.runOutput

What the above command tells ADF is to refer to the Databricks activity output(@activity(‘Notebook1’).output) and from that output get the value of runOutput element.

(ADF receive Databricks output)

Once you complete all of the above, your pipeline should be able to send parameters from ADF to Databricks and receive output from Databricks to ADF.

Send Data To Databricks
Adf To Databricks
Adf Send Parameters
Adf Receive Parameters
Databricks To Adf
Recommended from ReadMedium