Summary

The provided content outlines a step-by-step guide for creating a custom Databricks cluster policy to manage and control cluster creation options for users, ensuring adherence to specific requirements such as runtime version, worker types, and cost management through tags.

Abstract

The article details the process of implementing a custom Databricks cluster policy, which is crucial for Senior/Lead Engineers and Databricks Administrators to enforce cluster configuration standards. It explains how to clone an existing policy template, modify it to enforce the use of Databricks runtime version 11.3 LTS, restrict to a single worker type (Standard_DS3_v2), define worker number ranges, disable spot instances, enable autoscaling, set auto-termination after 20 minutes of inactivity, and require mandatory tags for cost tracking. The guide emphasizes the importance of these policies in streamlining governance and performance while also managing costs effectively. It concludes with instructions on assigning the policy to users and testing the policy by creating a new cluster.

Opinions

The author emphasizes the importance of having a policy in place to control cluster creation options, which reflects a belief in the need for governance and standardization in a Databricks environment.
By providing a fixed Databricks runtime version, the author suggests that maintaining consistency and stability in the environment is a priority.
The requirement for a single worker type and defined worker number ranges indicates a preference for predictability in performance and cost.
The exclusion of spot instances and the use of on-demand VMs show a conservative approach to resource availability and reliability over cost savings.
Enabling autoscaling and auto-termination demonstrates a balanced approach to resource utilization and cost optimization.
Mandating tags for cluster creation reflects a commitment to cost accountability and the ability to track and assign costs to specific functions or projects within the organization.
The recommendation of an AI service at the end of the article suggests the author's endorsement of cost-effective alternatives to popular AI services like ChatGPT Plus (GPT-4).

Tutorial: Create a Custom Policy To Control Access To Databricks Cluster

This article describes how to create a Databricks cluster policy and implement the policy on a cluster.

(Pic credit — https://employmentlawhandbook.com.au/bulletin/10-things-you-must-include-in-your-workplace-policies/)

In my previous article, I introduced the different Databricks cluster policies along with an explanation of one of the policies’ JSON format.

Databricks Cluster Policies You Need to Know Before Giving Access to Your Clusters

This article deep-dives into different Databricks cluster policies to explain their JSON structure.

gbamezai.medium.com

In this article, let’s put that knowledge to action and implement a custom policy in our Databricks environment.

Requirements

As a Senior/Lead Engineer or a Databricks Administrator, whenever a user wants to create a new cluster, you need to have a policy in place that will control what options are available to a user during cluster creation. Following is the list of options —

✔️ A Databricks runtime version of 11.3 LTS only ✔️ Only one worker type — Standard_DS3_v2 ✔️ Min workers: 2 and Max workers: 16 ✔️ No spot instances ✔️ Autoscaling enabled ✔️ Auto termination of cluster after 20 minutes of inactivity ✔️ Mandatory tags to be added when a user wants to create a cluster

Let’s Create Our Custom Policy

The good thing about creating a custom policy is that we don’t need to begin from a scratch. We can reuse the JSON template of an existing policy and modify it as per requirements. In this article, I am going to use Job Compute policy.

From the policies tab under Compute section of Databricks, click Job Compute.

Step 1: Clone an Existing Policy Template

Click Clone button from the top as we will use the provided template of the policy and then make alterations to it.

Notice that the Name, Family and Description fields become editable for us. The JSON script, however, is still read-only. To make it editable, select Custom option under Family drop down menu.

I named my policy as My_Job_Compute_Policy and kept the description as General-purpose for running non-interactive workloads.

Step 2: Edit Policy Attributes

Next, under Definitions section, we can now edit the attributes of our custom policy. The first attribute is the cluster profile. For our purpose, we will not change the cluster profile and continue with the current selection.

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": false
  }

Note that we are keeping hidden attribute as false which means that this value will be shown to a user on the front end.

Next requirement is to provide a user with only the 11.3 LTS Databricks runtime version whenever they want to create a cluster. We now configure the policy to look like below —

"spark_version": {
    "type": "fixed",
    "value": "11.3.x-scala2.12",
    "hidden": false
  }

Moving along, our next requirement is to have the worker type as Standard_DS3_v2.

"node_type_id": {
    "type": "fixed",
    "value": "Standard_DS3_v2",
    "hidden": false
  }

The minimum and maximum workers that should be assigned to the cluster as per requirement are 2 and 16 so the next policy shows up like below —

"num_workers": {
    "type": "range",
    "minValue": 2,
    "maxValue": 16,
    "hidden": false
  }

👉 A thing to note here is that since we are providing a range for the number of workers, the type of policy selected is range.

The next requirement is for no spot instances to be used with this policy.

"azure_attributes.availability": {
    "type": "fixed",
    "value": "ON_DEMAND_AZURE",
    "hidden": false
  }

👉 Since we do not need to use spot instances, our Azure VMs will always be available to us on demand without any eviction risk (as is the case with spot VMs).

The next requirement is for Auto scaling to be enabled.

"autoscale.min_workers": {
"type": "fixed",
"value": 2,
"hidden": false
 },
  "autoscale.max_workers": {
  "type": "fixed",
  "value": 16,
  "hidden": false
 }

👉 Since the above policy is fixed, the minimum and maximum number of worker nodes the cluster can scale to is defined as 2 and 16. If the policy was unlimited then we can simply define a default value for minimum and maximum number of worker nodes but leave the rest to Databricks.

Moving on, the other requirement is for the auto-termination of the cluster after 20 minutes of inactivity.

 "autotermination_minutes": {
  "type": "fixed",
  "value": 20,
  "hidden": false
 }

And finally, we want a user creating the cluster to add tags to it so we can keep check our compute costs according to the tags at any point in time.

"custom_tags.Please enter your function name here": {
    "type": "unlimited",
    "isOptional": false
  }

The default title shown to a user in UI will be Please enter your function name here. Since we have made this attribute as nonoptional, users are required to fill out their function name while creating a cluster using this policy.

The only other attribute remaining here is to define the cluster as an all-purpose cluster. We do that by configuring the below property —

 "cluster_type": {
    "type": "fixed",
    "value": "all-purpose"
  }

Now, with everything configured, the final policy JSON should look like below —

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "11.3.x-scala2.12",
    "hidden": false
  },
  "node_type_id": {
    "type": "fixed",
    "value": "Standard_DS3_v2",
    "hidden": false
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 2,
    "hidden": false
  },
  "autoscale.max_workers": {
    "type": "fixed",
    "value": 16,
    "hidden": false
  },
  "num_workers": {
    "type": "forbidden",
    "hidden": false
  },
  "azure_attributes.availability": {
   "type": "fixed",
   "value": "ON_DEMAND_AZURE",
   "hidden": false
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 20,
    "hidden": false
  },
  "custom_tags.Please enter your function name here": {
    "type": "unlimited",
    "isOptional": false
  },
  "cluster_type": {
    "type": "fixed",
    "value": "all-purpose"
  }
}

Step 3 — Assign Policy to Users

Click Permissions tab to arrive at the below screen —

Here, we can do the following activities —

Under Max clusters per user, provide a number. For example — if I give 2 as a number, that would restrict each user of our Databricks environment to create only 2 clusters at the most. If a user tries to create a 3rd cluster, the operation would not succeed.
Select from the list of users those that need to have our custom policy implemented upon. Select a name or a group multiple users belong to and add them here. Once done, click Create button at the top.

Step 4— Put Custom Policy To Test

When I navigate back to the main page of the policies, I now see My_Job_Compute_Policy.

Click Create Compute to set up a new cluster.
Under policy dropdown, click My_Job_Compute_Policy

As a user, we can now see most of the fields greyed out for us since we pre-configured their values through our policy. I have highlighted all the custom properties we configured through this policy.

👉 An important observation here is that the Create Cluster button (at the bottom) is disabled. This is because we specifically need the user to enter the tags information before they can create a cluster. Tags, in this case, can help us track costs of each cluster and assign those costs to the functions who created the clusters.

When we add tags information, Create Cluster becomes active again and we can now create a cluster as per the defined policy.

And that’s it. In this article, we learnt about how to create a custom cluster policy, assign it to users and create a cluster using the policy.

I hope you enjoyed reading this article as much as I enjoyed writing it.