avatarAmy @GrabNGoInfo

Summary

The web content provides a comprehensive guide on using Databricks widgets with Python to create interactive notebooks and dashboards, allowing users to apply different parameters for dynamic data visualization and analysis.

Abstract

The article is a detailed tutorial on implementing Databricks widgets within Python notebooks to enhance data processing and visualization. It covers the creation of various widget types, such as text, dropdown, combobox, and multiselect, and explains how to retrieve and use widget values in notebooks and dashboards. The guide includes steps for importing necessary libraries, reading in datasets, configuring widgets, and passing values between notebooks. It also demonstrates how to use widgets as filters in dashboards and how to delete widgets when necessary. The tutorial emphasizes the practical application of widgets for re-running code with different parameters and for creating interactive dashboards with filters that enable users to explore data more effectively.

Opinions

  • The author, Amy @GrabNGoInfo, encourages joining Medium through her referral link to support her work and access full content.
  • Widgets are highly recommended for their utility in re-running code with different parameters, making them valuable for data analysis.
  • The tutorial suggests that using the Run Notebook option for widget changes is preferable to ensure all relevant code is executed.
  • The author provides additional resources, such as video tutorials and blog posts, to complement the written guide and enhance learning.
  • The article promotes the use of Databricks' built-in tools for creating interactive charts and dashboards, highlighting their user-friendly nature.
  • The author emphasizes the importance of understanding how to run Databricks jobs and manage widgets when scheduling jobs or passing parameters between notebooks.

Databricks Widgets In Python Notebook

Apply different parameters for notebooks and dashboards in Databricks Using Python Code

Photo by Dave Hoefler on Unsplash

Databricks widget API enables users to apply different parameters for notebooks and dashboards. It’s best for re-running the same code using different parameter values. When used in dashboards, it creates filters for the dashboard charts.

Databricks supports both python and SQL code for the widget API. In this tutorial, we will talk about how to create widgets using python. We will cover:

  • How to create different types of widgets using python?
  • How to retrieve widgets values?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Resources for this post:

To learn about SQL widgets, please refer to my tutorial Databricks Widgets in SQL Notebook.

Let’s get started!

Step 0: Databricks Widget Types

There are four types of Databricks widgets:

  • text takes text as inputs.
  • dropdown creates a dropdown list with values.
  • combobox is a combination of text and dropdown. Users can either select values from the dropdown list or input their own values.
  • multiselect creates a list of values. Users can select one or more values from the list.

To get the help information about widgets, use dbutils.widgets.help(). The output has the methods available for widgets and their syntax.

# Get documentation about widgets
dbutils.widgets.help()

Output:

dbutils.widgets provides utilities for working with notebook widgets. You can create different types of widgets and get their bound value. For more info about a method, use dbutils.widgets.help("methodName").
combobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices
dropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices
get(name: String): String -> Retrieves current value of an input widget
getArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get
multiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices
remove(name: String): void -> Removes an input widget from the notebook
removeAll: void -> Removes all widgets in the notebook
text(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value

To get the help information about a specific widget method, use dbutils.widgets.help(methodName). For example, I can use dbutils.widgets.help('combobox') to get information about the combobox method.

# Get help informatuon for one method
dbutils.widgets.help('combobox')

Output:

/**
* Creates a combobox input widget with a given name, default value and choices. Optionally, you
* can provide a label for the combobox widget that will be rendered in place of the name. If a
* widget with a given name already exists, its properties will be overwritten. The default
* value does not have to be one choices.
*
* Example: dbutils.widgets.combobox("product", "Other", Seq("Camera", "GPS", "Smartphone"))
*
* @param name unique name identifying the widget
* @param defaultValue value value which widget is populated by default
* @param choices possible choices for the dropdown menu
* @param label optional widget label
*/
combobox(name: java.lang.String, defaultValue: java.lang.String, choices: scala.collection.Seq, label: java.lang.String): void

Step 1: Import Libraries

In the first step, we will import the pyspark SQL functions for data processing. to_timestamp is for processing time data, when is for creating new columns based on conditions, and col is for working with columns.

# Functions for data processing
from pyspark.sql.functions import to_timestamp, when, col

Step 2: Read In Dataset

In step 2, A CSV dataset on cryptocurrency prices is read from a mounted S3 bucket. The dataset is a subset of the Kaggle G-Research Crypto Forecasting dataset. To learn how to mount an AWS S3 bucket to Databricks, please refer to my previous tutorial Databricks Mount To AWS S3 And Import Data.

# Read in CSV data
df = (spark.read.format('csv')
  .option("inferSchema", True)
  .option("header", True)
  .option("sep", ',')
  .load("/mnt/demo4tutorial/data/crypto_100k_records.csv"))
 
# Take a look at the data
display(df)

After reading the data, we will do some data processing. The timestamp is in UNIX epoch format, which is the number of seconds since January 1st of 1970 Coordinated Universal Time (UTC). Using to_timestamp, we changed it to a DateTime format. The columns that are not used in the visualization are dropped. We also created a new column for asset names.

# Change epoch to datetime format and drop unwanted columns
df = df.withColumn('DateTime', to_timestamp(df['timestamp'])).drop('timestamp', 'Count', 'Open', 'High', 'Low', 'Close', 'VWAP', 'Target')
 
# Create asset name
df = df.withColumn('Asset_Name', when(df['Asset_ID']==1, 'Bitcoin')
                                  .when(df['Asset_ID']==6, 'Ethereum')
                                  .otherwise('Other'))
# Take a look at the data    
display(df)
Dataset for Databricks Python Widgets — Image from GrabNGoInfo.com

Step 3: Create Databricks Widgets Using Python

In step 3, we use python to create different types of Databricks widgets.

  • The dropdown widget is for the Asset_Name column. It has the name of dropdown_filter and the default value of Bitcoin. There are four choices in the dropdown. Bitcoin, Ethereum, Other, and All. All means selecting all the asset names.
  • The multiselect widget is based on the Asset_Name column too. It has the name of multiselect_filter and the default value of Bitcoin. The three choices, Bitcoin, Ethereum, and Other are the three unique values for the Asset_Name column. We can select multi-values using the multiselect widget.
  • The combobox widget is based on the Asset_ID column. It has the name of combobox_filter and the default value of 0. The unique values of the Asset_ID column is pulled using the distinct() function.
  • The text widget is based on the Asset_ID column too. It has the name of text_filter and the default value of 0. Users can enter the asset ID into the box.
# Create a dropdown widget
dbutils.widgets.dropdown(name='dropdown_filter', defaultValue='Bitcoin', choices=['Bitcoin', 'Ethereum', 'Other', 'All'], label='Select asset from the dropdown')
# Create a multiselect widget
dbutils.widgets.multiselect(name='multiselect_filter', defaultValue='Bitcoin', choices=['Bitcoin', 'Ethereum', 'Other'], label='Select multiple assets')
# Get unique asset id
unique_asset_id = [str(df.select('Asset_ID').distinct().collect()[i][0]) for i in range(len(df.select('Asset_ID').distinct().collect()))]
# Create a combobox widget
dbutils.widgets.combobox(name='combobox_filter', defaultValue='0', choices=unique_asset_id, label='Select or enter asset ID')
# Create a text widget
dbutils.widgets.text(name='text_filter', defaultValue='0', label='Enter asset ID')

Step 4: Get Values From Databricks Widgets

After creating the widgets, in step 4, we will check the value of each widget. A widget value can be retrieved by passing the widget name into dbutils.widgets.get(). We saved the retrieved widget values into variables and printed it out.

# Save the dropdown widget value into a variable
dropdown_filter_value = dbutils.widgets.get("dropdown_filter")
# Save the multi-select widget value into a variable
multiselect_filter_value = dbutils.widgets.get("multiselect_filter")
# Save the combobox widget value into a variable
combobox_filter_value = dbutils.widgets.get("combobox_filter")
# Save the text widget value into a variable
text_filter_value = dbutils.widgets.get("text_filter")
# Print the widget values
print(f'The dropdown filter value is {dropdown_filter_value}.\nThe multiselect filter value is {multiselect_filter_value}.\nThe combobox filter value is {combobox_filter_value}.\nThe text filter value is {text_filter_value}.')

Output:

The dropdown filter value is Bitcoin.
The multiselect filter value is Bitcoin,Ethereum.
The combobox filter value is 9.
The text filter value is 0.

Step 5: Pass Widget Values in Python Code

In step 5, we will talk about how to pass the widget values using python.

We can use dbutils.widgets.get() function to pull the value directly from the widget like the code below.

# Pass widget values using get()
display(df.filter(col('Asset_ID')==dbutils.widgets.get('text_filter')))
Pass Widget Values in Databricks Python Code— Image from GrabNGoInfo.com

Or alternatively, we can pass the value using the variable created in step 4. In this example, we take the variable called multiselect_filter_value, split the string using comma as the delimeter, and filter the dataframe by checking if the asset name is in the multiselect filter list.

# Pass widget values using the varaibles created in step 4.
display(df.filter(col('Asset_Name').isin(multiselect_filter_value.split(','))))
Pass Widget Values in Databricks Python Code — Image from GrabNGoInfo.com

When the dropdown widget has All as an option, we need to treat All and other options differently because All is not a value in the dataframe column. One way to do this is to use if else conditions to display all records if the dropdown widget value is All, and filter by the widget value otherwise.

# Pass dropdown value with All as one option
if dropdown_filter_value == 'All':
    display(df)
else:
    display(df.filter(col('Asset_Name').isin(dropdown_filter_value)))
Pass Dropdown Widget Values in Databricks Python Code — Image from GrabNGoInfo.com

Step 6: Use Widgets As Filters For Dashboard

In step 6, we will talk about how to use widgets as filters for Databricks dashboard.

Firstly, let’s create a time series chart for volume using Databricks’s built-in tool.

# Example for dashboard filter
if dropdown_filter_value == 'All':
    display(df)
else:
    display(df.filter(col('Asset_Name').isin(dropdown_filter_value)))

Click the downward triangle next to the bar chart icon, then select the line chart option.

Pass Widget Values in Databricks Python Code for Dashboard— Image from GrabNGoInfo.com

Next, click the Plot Options icon to check if the settings for the chart are correct and make changes if necessary.

Pass Widget Values in Databricks Dashboard — Image from GrabNGoInfo.com

To create a dashboard, click the bar chart icon on the upper right corner of the cell, then click Add to New Dashboard.

Databricks Create New Dashboard— Image from GrabNGoInfo.com

This opens the dashboard. We can see the filters on the top of the dashboard.

Databricks Create Dashboard Filters Using Widgets — Image from GrabNGoInfo.com

To learn more about how to create a Databricks dashboard, please refer to my tutorial Databricks Dashboard For Big Data

Step 7: Configure Databricks Widgets

In step 7, we will configure the Databricks widgets by clicking the gear icon on the top right of the notebook.

Databricks Configure Databricks Widgets — Image from GrabNGoInfo.com

The Widgets Panel Settings window will pop up. Under On Widget Change, there are three options.

  • Run Accessed Commands means that when the widget values change, only the cells that directly retrieve the changed widget are rerun. This is the default setting, but it does not work for SQL cells.
  • Do Nothing means that the notebook will not rerun based on the new widget values.
  • Run Notebook means rerun the whole notebook. I recommend choosing this option to prevent missing some of the important steps in the code.
Databricks Widgets Panel Settings— Image from GrabNGoInfo.com

The widgets panel is pinned to the top of the notebook by default, but we can uncheck the Pinned to top option to show it above the first cell.

Databricks Pin Widgets — Image from GrabNGoInfo.com

Step 8: Pass Values to Widgets in Another Notebook

Sometimes we may need to run the notebook with specific parameters from another notebook. In that case, we can use %run to run the notebook and pass the parameters at the same time. The sample code below is from the Databricks documentation for widgets.

# Pass parameters to widgets in another notebook
%run /path/to/notebook $X="10" $Y="1"

Step 9: Delete Databricks Widgets

In step 9, we will talk about how to delete Databricks widgets. To delete one widget, use dbutils.widgets.remove("widget_name").

# Remove one widget
dbutils.widgets.remove("text_filter")

To remove all widgets, use dbutils.widgets.removeAll().

# Remove all widgets
dbutils.widgets.removeAll()

After removing a widget, we cannot create new widgets in the same cell. New widgets need to be created in a separate cell.

Summary

In this tutorial, we talked about how to create widgets using python. You learned:

  • How to create different types of widgets using python?
  • How to retrieve widgets values?
  • How to pass the widget values to notebooks and dashboards?
  • How to configure the widget setup?
  • How to delete widgets?

Note that you cannot use widgets to pass arguments between different languages within a notebook and run it in a scheduled job. So when the widgets are created in python, the value of the widgets needs to be retrieved by python as well. To learn about how to run Databricks jobs, please refer to my tutorial Databricks Multi-Task Job Scheduling

More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com.

Recommended Tutorials

References

Databricks
Databricks Widget
Databricks Basics
Databricks Python Widgets
Recommended from ReadMedium