avatarAntonello Benedetto

Summary

The website content provides a comprehensive guide on how to connect to a Snowflake database using Python, detailing methods with both the Snowflake Connector and SQLAlchemy, and discussing the benefits and drawbacks of different authentication methods.

Abstract

The article "How To Connect To A Snowflake Database With Python" is a technical tutorial aimed at data engineers and analysts who work with Snowflake, a cloud-based data warehousing platform. It outlines two primary methods for connecting to Snowflake using Python: through the Snowflake Connector package and through SQLAlchemy with the snowflake-sqlalchemy dialect. The author explains the installation process, required parameters, and provides code templates for both external browser authentication (using Okta or similar SAML 2.0-compliant identity providers) and default authentication. The article also includes examples of common operations such as creating tables, inserting data, and querying data with fetchall() or converting results to pandas DataFrames. The author emphasizes the flexibility of external browser authentication for local script testing and the suitability of default authentication for production environments. Additionally, the article highlights the growing popularity of Snowflake in the data industry and the convenience of using SQLAlchemy for pandas integration.

Opinions

  • The author prefers the external browser authentication method for testing scripts locally due to its flexibility and the autonomy it provides by using personal user credentials.
  • Despite the inconvenience of a new browser window opening with each connection, the author considers external browser authentication to be the preferred option for individual testing.
  • The author suggests that default authentication, which requires role and password credentials, is typically more suitable for production environments due to its non-intrusive nature and lack of browser-based prompts.
  • The author recommends using SQLAlchemy for seamless integration with pandas DataFrames, which is particularly useful for data analysts and scientists who frequently use pandas for data manipulation.
  • The author endorses the Snowflake SQLAlchemy toolkit for its ability to bridge Snowflake databases with Python applications, enhancing the developer experience with built-in methods for data retrieval and manipulation.
  • The article subtly promotes the Snowflake platform as a scalable solution for analytical requirements and as a robust data source for BI tools, indicating the author's positive view of Snowflake's capabilities and industry relevance.

How To Connect To A Snowflake Database With Python

Learn how to interact with Snowflake using the Snowflake Connector or SQLAlchemy Python packages and take advantage of the templates presented in the article.

Suggested On-Demand Courses:

Are you trying to advance your career as a Data Engineer or Machine Learning Engineer? I would highly recommend for you to check the following courses:

>>> Not a Medium member yet? Consider signing up with my referral link to gain access to everything Medium has to offer for as little as $5 a month!

Introduction

I was firstly introduced to Snowflake in 2017, when the company for which I worked at the time, decided to migrate from Redshift to a more scalable computing-based claud data warehouse, to meet their growing analytical requirements and to feed BI tools.

Back then, I was a data analyst and I mainly queried Snowflake through a DBMS. Fast forward a few years and now, as a data engineer, I invest time building data pipelines that automatically connect, query, insert and update data into Snowflake through Python. Are you curious to understand how it is done?

Fast forward a few years and now, as a data engineer, I invest time building data pipelines that automatically connect, query, insert and update data into Snowflake through Python. Are you curious to understand how it is done?

In this article, you will learn how to interact with a Snowflake database through Python, either using the snowflake.connector or the sqlalchemy packages.

Both packages supports a number of technologies when it takes to enable secure authentication to Snowflake via Python, but according to my experience two of the most popular options are the externalBrowser authentication and default authentication, therefore I will use these in my examples.

Now, let’s explore the two methods together by creating some useful code templates you could use in your own projects.

Method # 1: Connect Using Snowflake Connector

The first step to use a Snowflake Connector is downloading the package as suggested by the official documentation:

pip install snowflake-connector-python
or 
pip install snowflake-connector-python==<version>

Then, you will need to import it in your code:

import snowflake.connector

External Browser Authentication

The snowflake.connector includes a connect() function that can be used as a constructor for creating a connection to the database.

The function allows you to specify a number of parameters, of which some are required to establish a connection and some are optional.

One of these parameters is the authenticator. Quoting the official documentation, if you set authenticator = 'externalbrowser' then the authentication will be attempted using:

“your web browser and Okta, ADFS, or any other SAML 2.0-compliant 
identity provider (IdP) that has been defined for your account”.

For instance, if your team is using Okta as an identity provider, you will just need to specify the USER_EMAIL without the need of also passing a user password. This is a template you could use straightaway, by replacing your credentials:

PROS: the 'externalbrowser' authentication allows you to establish a connection to SF with your personal user. Its flexibility makes it the preferred connection option while you are testing a new script locally, as you won’t need credentials for any shared user(this increases your autonomy).

CONS: Every time you establish a new connection, a new window will be automatically opened in your browser and this is somewhat annoying, but nothing major…

When you run the code above, it returns a connection object con_eb that in turn has a method named cursor(). Before you can actually interact with SF, you also need to create a cursor object con_eb.cursor() that, for ease of use, has been saved under the db_cursor_eb variable.

Example #1: Create Table

In the first example, I have created a TRANSACTIONS table in my preferred schema, by passing the SQL command to the cursor object execute() method:

As soon as the command is executed, you can verify if it was successful or not by running db_cursor_eb.fetchone()[0] that in this case returns the message Table TRANSACTIONS successfully created.

INFO: snowflake.connector.cursor:query: [CREATE OR REPLACE TABLE SCHEMA_NAME.TRANSACTIONS ( TR...]
INFO: snowflake.connector.cursor:query execution done
INFO: root:Table TRANSACTIONS successfully created.

Usually fetchone() retrieves the next row of a query result set, but because in this case I was just creating an empty table, it simply returned the execution state. Pretty handy message to display in production!

Default Authentication

If you don’t specify the authenticator parameter, under the hood, Snowflake will use its internal (default) authenticator.

However, in this case, you are required to pass the role USER_NAME and PASSWORD. These should be your personal credentials or the credentials of a shared user. Apart from these two little changes, the template is very similar to the previous one:

PROS: This is usually the connection method used in production, as it does not trigger a noisy authentication through your browser and does not require any additional identity checks.

CONS: You will need to request to be assigned with a role and password before being able to establish a connection. But this is a little price to pay, really…

As before, I have created a cursor object named db_cursor_def.

Example # 2: Insert Into Table

I will now use the db_cursor_def.execute() method to insert data into the TRANSACTIONS table created in the first example.

The records to be inserted will be retrieved from an existent SOURCE_TABLE and queried with a SELECT command:

Note that I set the query LIMIT to 1500 records.

Then I called the cursor object rowcount attribute, that returns the number of rows produced by the last execution, to check that this was exactly the number of rows inserted in the TRANSACTIONS table:

INFO: snowflake.connector.cursor:query: [INSERT INTO SCHEMA_NAME.TRANSACTIONS (TRX_ID, SOURCE_...]
INFO: snowflake.connector.cursor:query execution done
INFO: root: Total rows inserted: 1500

Method # 2: Connect Using SQLAlchemy

An alternative method to connect and interact with Snowflake is by using SQLAlchemy. To begin, you will need to run this command:

pip install --upgrade snowflake-sqlalchemy

It will install both snowflake.sqlalchemy as well as sqlalchemy in case it wasn’t already available.

Then, you should import the create_engine method from sqlalchemy, and the URL method from snowflake.sqlalchemy:

from snowflake.sqlalchemy import URL
from sqlalchemy import create_engine 

External Browser Authentication

In effect, to establish a connection with sqlalchemy you need to call the create_engine method and pass a connection string to it.

However, for convenience, you can use the snowflake.sqlalchemy.e>URL method to construct the connection string and connect to the database”:

As you can see, in order to connect with an externalbrowser authentication, the parameters you should specify are exactly the same as with the snowflake.connector.

Before executing a query, its best practice to first create a connection object: engine_1_con = engine1.connect(). However, it’s worth mentioning that, even calling the execute() method directly on engine_1 would work just fine.

Example # 3: Select From Table + FetchAll( )

The purpose of this example, is to show you how to retrieve the entire dataset inserted into the TRANSACTIONS table with a SELECT statement, fetch all the rows in the dataset with fetchall() into data and save the content of the first column to a list named trx_ids:

I wanted to show you a straightforward example of how to use fetchall(), because, while building data pipelines, you might end up calling this method again and again in your code.

As expected, the query retrieved 1500 rows and the first five trx_ids in the list I created, were displayed:

INFO: snowflake.connector.cursor:query: [SELECT * FROM SCHEMA_NAME.TRANSACTIONS;]
INFO: snowflake.connector.cursor:query execution done
INFO: root:Number of rows retrieved: 1500
Output:
[3440508402, 482408596, 476831653, 3224085714, 3871714048]

Default Authentication

Connecting to Snowflake with sqlalchemy via default authentication employs exactly the same syntax as before and exactly like the snowflake.connector, you will need to replace USER_EMAIL with USER_NAME as well as specify a PASSWORD:

Example # 4: Select From Table + Convert To Pandas DF

In this last example, I would like to show you how to create pandas DataFrames using a sqlalchemy engine.

This can be easily achieved by passing both engine_2 and a sql query to the pandas read_sql() function:

Therefore, If you plan to work with pandas a lot, I would recommend you to connect to Snowflake with sqlalchemy to take advantage of these built-n methods.

Conclusion

In this article, you have learnt that two main ways exist to connect and interact with a Snowflake DB via Python, using different types of authentication.

The popularity of Snowflake is growing in the data industry, hence I hope the code templates I shared above, will help speeding up your daily task while querying Snowflake as Data Analysts, creating training datasets as Data Scientists or building automated ETL pipelines as Data Engineers.

Sources

A Note For My Readers

This post includes affiliate links for which I may make a small commission at no extra cost to you, should you make a purchase

Data Science
Snowflake
Data Engineering
Python Programming
Database
Recommended from ReadMedium