avatarAmy @GrabNGoInfo

Summary

The web content provides a comprehensive guide on using Databricks widgets with SQL code to create interactive notebooks and dashboards, including the creation, configuration, and application of various widget types for data filtering and parameter passing.

Abstract

The article is a tutorial that explains how to leverage Databricks widgets within SQL notebooks to enhance interactivity and reusability of code. It covers the creation of different widget types such as text, dropdown, combobox, and multiselect, and demonstrates how to pass their values to SQL queries and dashboards. The tutorial also includes instructions on configuring widget behavior, using widgets to filter dashboard visualizations, and deleting widgets when necessary. Additionally, it provides insights on how to run a notebook with specific parameters from another notebook and discusses the limitations of widgets when used in scheduled jobs. The guide is supplemented with visual aids, code snippets, and references to external resources, including video tutorials and Databricks documentation.

Opinions

  • The author emphasizes the utility of widgets for applying different parameters to notebooks and dashboards, facilitating data exploration with minimal code changes.
  • Widgets are presented as a means to create more dynamic and user-friendly dashboards by allowing end-users to interact with the data through filters.
  • The tutorial suggests that the default setting for widget changes, "Run Accessed Commands," may not be sufficient for SQL cells, and recommends rerunning the entire notebook to ensure all dependencies are met.
  • The author provides a clear recommendation to use the "Run Notebook" option for widget change actions to avoid missing important steps in the code execution flow.
  • The article implies that widgets are a powerful feature for data analysts and engineers working with Databricks, enhancing the functionality of SQL notebooks and improving the user experience for dashboard consumers.

Databricks Widgets in SQL Notebook

Apply different parameters for notebooks and dashboards in Databricks using SQL Code

Photo by Igor Tudoran on Unsplash

Databricks widget API enables users to apply different parameters for notebooks and dashboards. It’s best for re-running the same code using different parameter values. When used in dashboards, it creates filters for the dashboard charts.

Databricks supports both python and SQL code for the widget API. In this tutorial, we will talk about how to create widgets using SQL. We will cover:

  • How to create different types of widgets using SQL?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Resources for this post:

To learn about python widgets, please refer to my tutorial Databricks Widgets in Python Notebook.

Let’s get started!

Step 0: Databricks Widget Types

There are four types of Databricks widgets:

  • text takes text as inputs.
  • dropdown creates a dropdown list with values.
  • combobox is a combination of text and dropdown. Users can either select values from the dropdown list or input their own values.
  • multiselect creates a list of values. Users can select one or more values from the list.

To get the help information about widgets, use dbutils.widgets.help(). We need to add %python at the beginning of the cell because the default language for the notebook is SQL. The output has the methods available for widgets and their syntax.

%python
# Get documentation about widgets
dbutils.widgets.help()

Output:

dbutils.widgets provides utilities for working with notebook widgets. You can create different types of widgets and get their bound value. For more info about a method, use dbutils.widgets.help("methodName").
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value

To get the help information about a specific widget method, use dbutils.widgets.help(methodName). For example, I can use dbutils.widgets.help('combobox') to get information about the combobox method.

%python
# Get help informatuon for one method
dbutils.widgets.help('combobox')

Output:

/**
* Creates a combobox input widget with a given name, default value and choices. Optionally, you
* can provide a label for the combobox widget that will be rendered in place of the name. If a
* widget with a given name already exists, its properties will be overwritten. The default
* value does not have to be one choices.
*
* Example: dbutils.widgets.combobox("product", "Other", Seq("Camera", "GPS", "Smartphone"))
*
* @param name unique name identifying the widget
* @param defaultValue value value which widget is populated by default
* @param choices possible choices for the dropdown menu
* @param label optional widget label
*/
combobox(name: java.lang.String, defaultValue: java.lang.String, choices: scala.collection.Seq, label: java.lang.String): void

Step 1: Read In Dataset

In step 1, A CSV dataset on cryptocurrency prices is read from a mounted S3 bucket. The dataset is a subset of the Kaggle G-Research Crypto Forecasting dataset. To learn how to mount an AWS S3 bucket to Databricks, please refer to my previous tutorial Databricks Mount To AWS S3 And Import Data.

-- mode "FAILFAST" will abort file parsing with a RuntimeException if any malformed lines are encountered
CREATE OR REPLACE TEMPORARY VIEW crypto_100k_records
USING CSV
OPTIONS (path "/mnt/demo4tutorial/data/crypto_100k_records.csv", header "true", mode "FAILFAST");
-- Take a look at the data
SELECT * FROM crypto_100k_records

After reading the data, we will do some data processing. The timestamp is in UNIX epoch format, which is the number of seconds since January 1st of 1970 Coordinated Universal Time (UTC). Using from_unixtime, we changed it to a DateTime format. The columns that are not used in the visualization are dropped. We also created a new column for asset names.

-- Data processing
CREATE OR REPLACE TEMPORARY VIEW df AS
SELECT 
  asset_id,
  volume,
  --Change epoch to datetime format
  from_unixtime(timestamp) AS datetime, 
  -- Create asset name
  CASE WHEN asset_id = 1 THEN 'Bitcoin' WHEN asset_id = 6 THEN 'Ethereum' ELSE 'Other' END AS asset_name
FROM crypto_100k_records;
-- Take a look at the data
SELECT * FROM df

Output:

Dataset for Databricks SQL Widgets — Image from GrabNGoInfo.com

Step 2: Create Databricks Widgets Using SQL

In step 2, we use SQL to create different types of Databricks widgets.

  • The dropdown widget is for the Asset_Name column. It has the name of dropdown_filter and the default value of Bitcoin. There are four choices in the dropdown. Bitcoin, Ethereum, Other, and All. All means selecting all the asset names.
  • The multiselect widget is based on the Asset_Name column too. It has the name of multiselect_filter and the default value of Bitcoin. The three choices, Bitcoin, Ethereum, and Other are the three unique values for the Asset_Name column. We can select multi-values using the multiselect widget.
  • The combobox widget is based on the Asset_ID column. It has the name of combobox_filter and the default value of 0. The unique values of the Asset_ID column is pulled using SELECT DISTINCT.
  • The text widget is based on the Asset_ID column too. It has the name of text_filter and the default value of 0. Users can enter the asset ID into the box.
-- Create a dropdown widget
CREATE WIDGET DROPDOWN dropdown_filter DEFAULT "Bitcoin"  CHOICES (VALUES 'Bitcoin', 'Ethereum', 'Other', 'All');
-- Create a multiselect widget
CREATE WIDGET MULTISELECT multiselect_filter DEFAULT "Bitcoin"  CHOICES (VALUES 'Bitcoin', 'Ethereum', 'Other');
-- Create a combobox widget
CREATE WIDGET COMBOBOX combobox_filter DEFAULT "0"  CHOICES SELECT DISTINCT asset_id FROM df;
-- Create a text widget
CREATE WIDGET TEXT text_filter DEFAULT "0";

After running the code, the widgets show on the top of the notebook.

Databricks SQL Widgets — Image from GrabNGoInfo.com

Step 3: Pass Widget Values in SQL Code

After creating the widgets, in step 3, we will talk about how to pass the widget values using SQL. A widget value can be retrieved by passing the widget name into the getArgument() function.

-- Pass widget value to SQL code
SELECT * 
FROM df
WHERE asset_id = getArgument('text_filter')

Output:

Databricks Pass Widget Value in SQL Code — Image from GrabNGoInfo.com

When the dropdown widget has All as an option, we need to treat All and other options differently because All is not a value in the dataframe column. One way to do this is to use the CASE WHEN conditions to include all records if the dropdown widget value is All, and filter by the widget value otherwise.

-- Dropdown widget with All as one option
SELECT 
  datetime,
  volume
FROM df
WHERE CASE getArgument('dropdown_filter') WHEN 'Bitcoin' THEN asset_name = 'Bitcoin'
                                          WHEN 'Ethereum' THEN asset_name = 'Ethereum'
                                          WHEN 'Other' THEN asset_name = 'Other'    
                                          ELSE asset_name IN (SELECT DISTINCT asset_name FROM df) END

Output:

Databricks SQL Widget Dropdown with ‘All’ as an Option — Image from GrabNGoInfo.com

Step 4: Use Widgets As Filters For Dashboard

In step 4, we will talk about how to use widgets as filters for Databricks dashboard.

Firstly, let’s create a chart using Databricks’ built-in tool.

-- Dropdown widget with All as one option
SELECT 
  datetime,
  volume
FROM df
WHERE CASE getArgument('dropdown_filter') WHEN 'Bitcoin' THEN asset_name = 'Bitcoin'
                                          WHEN 'Ethereum' THEN asset_name = 'Ethereum'
                                          WHEN 'Other' THEN asset_name = 'Other'    
                                          ELSE asset_name IN (SELECT DISTINCT asset_name FROM df) END

Click the downward triangle next to the bar chart icon, then select the chart type.

Databricks SQL Widgets Dashboard — Image from GrabNGoInfo.com

Next, click the Plot Options icon to check if the settings for the chart are correct and make changes if necessary.

Databricks SQL Widgets Dashboard — Image from GrabNGoInfo.com

To create a dashboard, click the bar chart icon on the upper right corner of the cell, then click Add to New Dashboard.

Databricks SQL Widgets Create New Dashboard — Image from GrabNGoInfo.com

This opens the dashboard. We can see the filters on the top of the dashboard.

Databricks SQL Widgets as Filters for Dashboard — Image from GrabNGoInfo.com

Step 5: Configure Databricks Widgets

In step 5, we will configure the Databricks widgets by clicking the gear icon on the top right of the notebook.

Databricks Configure Databricks Widgets — Image from GrabNGoInfo.com

The Widgets Panel Settings window will pop up. Under On Widget Change, there are three options.

  • Run Accessed Commands means that when the widget values change, only the cells that directly retrieve the changed widget are rerun. This is the default setting, but it does not work for SQL cells.
  • Do Nothing means that the notebook will not rerun based on the new widget values.
  • Run Notebook means rerun the whole notebook. I recommend choosing this option to prevent missing some of the important steps in the code.
Databricks Widgets Panel Settings — Image from GrabNGoInfo.com

The widgets panel is pin to the top of the notebook by default, but we can uncheck the Pinned to top option to show it above the first cell.

Databricks Pin Widgets — Image from GrabNGoInfo.com

Step 6: Pass Values to Widgets in Another Notebook

Sometimes we may need to run the notebook with specific parameters from another notebook. In that case, we can use %run to run the notebook and pass the parameters at the same time. The sample code below is from the Databricks documentation for widgets.

%python
# Pass parameters to widgets in another notebook
%run /path/to/notebook $X="10" $Y="1"

Step 7: Delete Databricks Widgets

In step 7, we will talk about how to delete Databricks widgets. To delete one widget, use REMOVE WIDGET widget_name.

-- Remove one widget
REMOVE WIDGET text_filter

To remove all widgets, we need to use python code dbutils.widgets.removeAll().

%python
# Remove all widgets
dbutils.widgets.removeAll()

Summary

In this tutorial, we talked about how to create widgets using SQL. You learned:

  • How to create different types of widgets using SQL?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Note that you cannot use widgets to pass arguments between different languages within a notebook and run it in a scheduled job. So when the widgets are created in SQL, the value of the widgets needs to be retrieved by SQL. To learn about how to run Databricks jobs, please refer to my tutorial Databricks Multi-Task Job Scheduling

Recommended Tutorials

References

Databricks
Databricks Sql
Databricks Widget
Databricks Filters
Databricks Dashboard
Recommended from ReadMedium