How To Send Parameters From ADF to Databricks and Receive Output From Databricks
This article explains how to send parameters to Databricks from ADF and receive output from Databricks in ADF.
Quite often as a Data Engineer, I need to use Databricks as part of my Azure Data Factory Data Pipeline. This involves configuring the pipeline’s ability to send parameters to Databricks and in turn, receive output from the Databricks. This article shows you a quick and easy way of how to do it through an example.
Use Case: A country parameter needs to be sent from ADF to Databricks. Country value is Canada. Databricks will accept the parameter and send an output called continent with value of North America back to ADF.
Requirement: ADF pipeline should be able to send the parameter to Databricks and in turn, receive the output from Databricks.
Assumption: A Databricks Notebook is already available.
Step 1: Initialize a New Parameter and Variable in ADF
Open the canvas on ADF and create a new pipeline. To begin with the new pipeline, create a new parameter called ‘country’ and a new variable called ‘continent’. Your pipeline should look like this-
For the sake of this article, I have provided a default value for country parameter as Canada. When the pipeline runs, it will send this default value to Databricks. When Databricks concludes, it will send a value back to ADF that I will store in continent variable.
Step 2: Open the Databricks Notebook
Inside the Databricks notebook, I have created a small code in Python that will accept the parameters from ADF and in turn, send a value back to ADF.
We use widgets (dbutils.widgets.get()) in Databricks and call them through the Widgets API to perform multiple tasks — in this case, that task is to accept an incoming parameter.
Next, the code checks if the value of incoming parameter is Canada then send North America as an output. This output is stored in a new variable defined inside of Databricks called continent.
Finally, using dbutils.notebook.exit, I send the value of continent back to ADF.
Step 3: Configure ADF To Send Parameter to Databricks
Drag Databricks Notebook activity to ADF canvas and connect to the Databricks notebook through your Linked Service. Once done, move to the Settings pane and use ‘Base parameters’ to send parameters to Databricks.
The name of the parameter being sent from ADF is country so the same name should be used in the dbutils.widgets.get(“country”) code in Databricks.
Step 4: Configure ADF To Receive Parameters From Databricks
I created a blank variable at the beginning called continent. This is now used to store the incoming output from Databricks.
Drag the Set variable activity to ADF canvas and connect it to the Notebook activity.
In the Set variable activity, set the variable named continent and assign it a dynamic value — @activity(‘Notebook1’).output.runOutput
What the above command tells ADF is to refer to the Databricks activity output(@activity(‘Notebook1’).output) and from that output get the value of runOutput element.
Once you complete all of the above, your pipeline should be able to send parameters from ADF to Databricks and receive output from Databricks to ADF.