How to integrate Python into your SQL scripts in Snowflake
How to use Snowflake UDFs to combine SQL and Python

Snowflake is a cloud-based data warehousing platform designed to handle large-scale data analytics workloads. It is known for its elasticity, scalability, and performance, providing organizations with a powerful solution for storing, processing, and analyzing vast amounts of data.
One of its many benefits is the ease at which you can integrate other programming languages alongside your staple SQL. Python, for example, can be used within the Snowflake ecosystem in a number of ways, such as Snowpark or the Snowflake Connector for Python.
One particularly fast and easy method is to write a Python UDF. UDF stands for User-Defined Function. It is a feature that allows you to define custom functions within the Snowflake data warehouse platform. UDFs in Snowflake enable you to extend the built-in SQL functionality and create custom logic for data manipulation, calculations, transformations, and aggregations. Snowflake supports different types of UDFs, such as those that return scalar values or a table.
Snowflake UDFs are a convenient way to integrate Python logic into a SQL script. For example, you could search a string for keywords using Python’s list comprehension. The following example will take a list of strings as input and filter that list to return a list of just the keywords present.
CREATE OR REPLACE FUNCTION keyword_filter(input_text STRING, search_terms ARRAY)
RETURNS ARRAY
LANGUAGE python
RUNTIME_VERSION = '3.8'
HANDLER = 'keyword_filter'
AS
$$
def keyword_filter(input_text, search_terms):
split_text = [
word.replace(",", "").replace(".", "") for word in input_text.split(" ")
]
filtered_list = list(
set(
[
word.lower()
for word in split_text
if word.lower() in search_terms
]
)
)
return filtered_list
$$Let's break this down and look at each part.
First, we write a create statement, telling Snowflake we are creating a function, provide the name of our function (keyword_filter), and define the inputs it will take, which in this case, is a string and an array (list)
CREATE OR REPLACE FUNCTION keyword_filter(input_text STRING, search_terms ARRAY)Then we define what data type we are returning, what language we are using and which runtime version for that language
RETURNS ARRAY
LANGUAGE python
RUNTIME_VERSION = '3.8'Then we define the ‘handler’ (the logic), using ‘$$’ to bookend our code. Everything within the ‘$$’ symbols is Python code.
HANDLER = 'keyword_filter'
AS
$$
def keyword_filter(input_text, search_terms):
split_text = [
word.replace(",", "").replace(".", "") for word in input_text.split(" ")
]
filtered_list = list(
set(
[
word.lower()
for word in split_text
if word.lower() in search_terms
]
)
)
return filtered_list
$$After executing this create statement Snowflake, our Python UDF is available to use in SQL scripts. For example:
SELECT
jobs_id
, keyword_filter(description, STRTOK_TO_ARRAY('python,sql', ','))
, job_title
, company_name
, job_type
, load_timestamp
FROM
jobs_rawIf you had a much larger list of keywords, it might make sense to store these in a table, and then query that table as an array using Snowflake’s ARRAY_AGG() to use as input to our keyword_filter() UDF:
First, we create a table to hold our search_terms:
CREATE SEQUENCE IF NOT EXISTS MY_KEYWORD_SEQ;
CREATE TABLE IF NOT EXISTS search_terms (
keyword_id INTEGER DEFAULT MY_KEYWORD_SEQ.NEXTVAL PRIMARY KEY,
keyword VARCHAR(50)
);Then we define a list of keywords we want to search for and load them to our search_terms table
INSERT INTO search_terms (keyword)
VALUES
('sql'),
('nosql'),
('mysql'),
('postgres'),
('mongodb'),
('snowflake'),
('databricks'),
('airflow'),
('prefect'),
('java'),
('dbt'),
('python'),
('pytorch'),
('dax'),
('bash'),
('shell'),
('aws'),
('azure'),
('tensorflow'),Then we can query this table, using ARRAY_AGG() to return an array, and feed this to our UDF:
WITH search_terms AS (
SELECT keyword FROM my_db.my_schema.search_terms
)
SELECT
jobs_id
, keyword_filter(description, (SELECT ARRAY_AGG(keyword) from search_terms))
, job_title
, company_name
, job_type
, load_timestamp
FROM
jobs_rawOr you could take this a bit further and flatten the list into one row per keyword:
WITH search_terms AS (
SELECT keyword FROM my_db.my_schema.search_terms
)
SELECT
jobskill_id
, F.value AS jobskill
, job_title
, company_name
, job_location
, job_type
, posted_via
, load_timestamp
FROM
my_raw_data,
Table(
Flatten(
keyword_filter(
description,
(SELECT ARRAY_AGG(keyword) from search_terms)
)
)
) FAnd that's it. Snowflake makes it super easy to integrate Python into a SQL script!
