avatarMatthew Henderson

Summary

The web content provides a guide on setting up a local development environment for dbt (data build tool) specifically tailored for use with Google BigQuery, using Anaconda's conda package manager.

Abstract

The article outlines the process of configuring a local dbt environment to work with BigQuery, emphasizing the use of conda environments managed by Anaconda. It assumes the reader has prior knowledge of SQL, dbt, and access to Google Cloud Platform (GCP) services. The guide details the installation of necessary tools such as the gcloud CLI and Anaconda, and the creation of a conda environment named dbt-demo. It also covers the setup of a new dbt project, including the configuration of a GCP project, creation of a BigQuery dataset, and setting up authentication via a service account. The article concludes with instructions on how to initialize a new dbt project and configure the profiles.yml file for seamless integration with BigQuery.

Opinions

  • The author presumes that readers are familiar with dbt and its documentation, suggesting that the target audience has some experience with data transformation and SQL logic.
  • The article implies that the dbt documentation, while comprehensive, lacks specificity in setting up a local environment, which is addressed by this guide.
  • The author's approach in using conda environments indicates a preference for a clean and reproducible development setup that can be easily shared or transferred.
  • By providing step-by-step instructions for both Windows and Mac users, the author conveys a commitment to inclusivity and broad applicability of the guide.
  • The inclusion of commands and screenshots for creating service accounts and configuring profiles.yml reflects the author's attention to detail and recognition of the importance of proper authentication and configuration in a dbt workflow.
  • The author encourages the use of the dbt CLI to assist in populating the profiles.yml file, showcasing an appreciation for tools that simplify complex tasks.

Setting up a local environment for DBT (BigQuery) using conda

Recently dbt has been increasing in popularity as a structured and robust framework for writing SQL logic. Although the dbt documentation is a great resource, it is not too specific on local configuration. So here is a quick article on setting up a local dbt-bigquery environment with anaconda that works on both Windows and a Mac …

dbt logo

Introduction

Before we begin, I assume you already have:

  1. The gcloud CLI installed and configured.
  2. Anaconda installed and configured.
  3. You have a billable GCP project setup.

In this tutorial we are going to create a conda environment called dbt-demo . Once the tutorial is complete you can create a new conda environment for your own projects by taking a copy:

conda env export -n dbt-demo -f /path/to/environment.yml
conda env create -n <NAME_OF_ENVIRONMENT> -f /path/to/environment.yml

Setting up a conda environment

To setup a conda environment for dbt-bigquery run the following code on the anaconda prompt:

conda create --name dbt-demo python
conda activate dbt-demo
conda install pip 
pip install dbt-bigquery

When the installation is complete run to confirm dbt is working run dbt --version . The output should look like:

installed version: [installed version of dbt] (e.g. 1.0.4)
latest version: [latest version of dbt] (e.g. 1.0.4)
Up to date!
Plugins:
- bigquery: 1.0.0 - Up to date!

BigQuery Setup

To initialize a new dbt project, it must be connected to a GCP project and BigQuery dataset within that project.

Assuming you have a billable project for GCP already setup, open up the BigQuery tab and create a new dataset for the project called dbt-demo .

Authentication

Once you have created a new dataset you will need to create a service account (unless you would like to use oauth). This is the authentication used to connect locally to GCP.

Setting up a service account for the dbt demo within GCP

When you have a service account setup, you will need to create a new json key and download it locally.

Configuring your profile

When you install dbt it will create a hidden configuration file in ~/.dbt/profiles.yml . For each dbt project you create a new block of code is required within this file with all of the details needed to connect to your data warehouse, in this case, bigquery. The dbt CLI helps to fill out these details by prompting you to choose an option for each of the required configurations.

Tip: You can open up the contents of profiles.yml on a text editor by running code ~/.dbt/profiles.yml from your default directory

To create a new dbt project first make a new directory called dbt-demo on your computer. Next, open up a anaconda/command prompt and cd into the new directory:

conda activate dbt-demo
cd /path/to/dbt-demo

Once you are in the repository, initialize a new dbt project by running:

dbt init dbt_demo

When you run this you will be prompted with a number of questions, these will be used to fill out the profiles.yml configuration file. We want the final block in profiles.yml to look like below, so fill out the questions accordingly:

profiles.yml configuration for dbt-demo

When you have filled this in the dbt structure will appear in your repository and you are ready to start using dbt locally. Happy modeling!

Dbt
Anaconda
Bigquery
Recommended from ReadMedium