avatarGanesh Chandrasekaran

Summary

The article outlines methods for passing Python or Scala variables into Spark SQL queries without using the spark.sql wrapper.

Abstract

This article provides a guide on how to integrate Python or Scala variables directly within Spark SQL queries. It begins by instructing the creation of a new table in Spark SQL, followed by demonstrating how to assign values to Scala variables and correctly set them in the Spark configuration using a two-part namespace. It emphasizes the importance of this namespace to avoid getting NULL values in the SQL queries. The article then shows how to access these configuration-set variables within Spark SQL INSERT statements, with examples for both numeric and string data types. A similar process is detailed for Python, including variable assignment and setting these variables in the Spark configuration. The final steps involve inserting the new values into the Spark SQL table and querying the data to verify the results. The article also includes images illustrating the select query results and encourages readers to join Medium or subscribe for updates on new content.

Opinions

  • The author stresses the necessity of using a two-part namespace when setting variables in spark.conf to ensure they can be correctly referenced in Spark SQL queries.
  • It is noted that numeric values can be used in Spark SQL with or without quotes, providing flexibility in query formulation.
  • The article suggests that the methods described are efficient and useful for integrating dynamic variables into Spark SQL workflows, which can enhance the usability and flexibility of data processing pipelines.
  • By providing examples in both Scala and Python, the author acknowledges the multi-language support of Databricks Spark and caters to a broader audience of developers.
  • The inclusion of Medium membership and subscription links indicates the author's interest in building a readership and potentially monetizing their content.

Databricks Spark: How to pass value from Python/Scala to Spark SQL

This article will explain how to use Python or Scala variables in Spark SQL without wrapping the SQL statement with spark.sql.

Step 1: Create a new table

%sql
drop table if exists tbl_friends;
create table tbl_friends(name string, age int);

Step 2: Scala variable assignment.

One important thing to remember is to use a two-part namespace inside spark.conf.

Using a variable without namespace will result in NULL.

In this example, we have used myapplication.name, feel free to use change based on your requirement.

%scala
val age = 30
val name = "Rachel Green"
spark.conf.set("myapplication.name", name)
spark.conf.set("myapplication.age", age)

Step 3: Access the spark.conf.set variable from Spark SQL

Numeric values can be used with or without quotes.

'${myapplication.age}' or ${myapplication.age}

Inserting data into table tbl_friends

%sql
INSERT INTO tbl_friends values ('${myapplication.name}',${myapplication.age})

Step 4: Python variable assignment

%python
age = 31
name = "Ross Geller"
spark.conf.set("myapplication.name", name)
spark.conf.set("myapplication.age", age)

Step 5: Insert new values into Table

%sql
INSERT INTO tbl_friends values ('${myapplication.name}',${myapplication.age})

Step 6: Query the data

%sql
select * from tbl_friends
Select All Rows
%sql
select * from tbl_friends where age = ${myapplication.age}
Select based on condition
Spark Sql
Databricks
Python
Scala
Sparkconf
Recommended from ReadMedium