This context provides a guide on using Poetry, Make, and pre-commit-hooks to set up a repo template for a data science team, ensuring consistency, rigor, and best practices.
Abstract
The context discusses the prerequisites for setting up a repo template, including installing Python, Make, and Poetry. It then outlines the initial setup process, which involves cloning the template repo and running the setup target using Make. The Makefile is explained in detail, along with its rules and targets. The context also covers the installation of dependencies and pre-commit hooks, as well as managing packages and integrating with VSCode. The use of Cookiecutter for a monolithic repo is also discussed.
Bullet points
The context provides a guide for setting up a repo template for a data science team using Poetry, Make, and pre-commit-hooks.
Prerequisites include installing Python, Make, and Poetry.
The initial setup involves cloning the template repo and running the setup target using Make.
The Makefile is explained in detail, along with its rules and targets.
The context covers the installation of dependencies and pre-commit hooks.
Managing packages and integrating with VSCode are also discussed.
The use of Cookiecutter for a monolithic repo is explained.
The context ends with a note on what's next, suggesting an Agile-Waterfall Hybrid framework for data science teams.
Python HOW: Using Poetry, Make, and pre-commit-hooks to Setup a Repo Template for your Team
Bring consistency, rigour, and best practices to your messy data science team
Last update 05 Aug 2022
If part of your job is to constantly poke your fellow data scientist to isolate projects environments, updating requirements, cleaning code, writing consistent docstrings, etc., then you should definitely follow along 💊
All the work shown here is for Windows and PowerShell (PS) but you can adapt it for Mac and your favourite Command Line Interface (CLI)
You need to have python, Make, and Poetry installed on your machine
Install Python 🐍
Download the latest Python 3.9 releases for Windows. Select Customize installation and mark py launcher for installation
You can now use py launcher in the CLI to list all installed versions of python (you can have as many as you like), and to use a specific version of python:
Install Make 🐐
We will be using Make from the GNU Project to setup and manage our repo using a Makefile. Think of Make as a tool for automating processes
To install Make for Windows, first install Chocolatey, then use it to install make. Open a new CLI and check make is working:
Install Poetry 👨🏭
We use Poetry to manage the project virtual environment and resolve dependencies. Install Poetry as described here. Open a new CLI and check Poetry is working:
2. Initial setup
Clone Repo 🤡
Clone the template locally, and copy it (without the .git folder) to your newly created project’s repo (which has its own .git). The template has the following structure:
Run setup 🔨
Make sure you don’t have any virtual environment activated in the CLI. Run the setup target using make, and you are done!
Three things have happened!
💻 An isolated .venv is created in the project’s directory
📦 Some packages are installed in .venv
🧷 the pre-commit hooks are installed
Let’s look into these things in detail 👇
3. Makefile 📜
make looks for a Makefile in the project’s root that contains a set of rules to run. Each rule has 3 parts: a target, a list of prerequisites, and a recipe in the following format:
This is what our Makefile looks like:
The setup target on Makefile > line 8 doesn’t have any recipes but rather 3 prerequisites, which are 3 make targets (a target that runs other targets). Let’s have a look at these 3 targets 👇
3.1 💻Create virtual environment
The venv target in Makefile > line 10 has one prerequisite $(GLOBAL_PYTHON) which is the value of a variable defined earlier in Makefile > line 4. The variable GLOBAL_PYTHON grabs the full path to the python interpreter which we installed earlier. If the prerequisite interpreter path doesn’t exist, you will get an error when running the venv target
Makefile > line 12 is where poetry creates an isolated .venv folder in the project’s root using the interpreter full path. To make sure .venv is created in the root directory of the project, the following configuration is added in the poetry.toml 📃 (where all poetry configurations go):
To understand how poetry manages environments check 🔗
2.2 📦Install dependencies
The install target in Makefile > line 14 has one prerequisite $(LOCAL_PYTHON) which is the value of a variable defined earlier in Makefile > line 5. The variable LOCAL_PYTHON checks if there is a path to a python interpreter in .venv. If the prerequisite interpreter path doesn’t exist, you will get an error when running the install target
Makefile > line 16 is where poetry installs the projects’ dependencies found in the pyproject.toml file. This is what our pyproject.toml looks like:
Poetry separates packages into dependencies pyproject.toml > line 7 and development dependencies pyproject.toml > line 11. When Poetry has finished installing all packages in .venv, it writes their exact versions to a poetry.lock file that you should commit to the project’s repo 🔗 so that the team working on the project is locked to the same versions of dependencies 🔗
Our packages have different version constraints. For example "*"means latest, while "^1"means >=1.0.0 <2.0.0. To understand dependency specification 🔗
Included Dev Packages 📦📦📦
These are the dev packages I’m currently using for our team:
Black
Black is “the uncompromising Python code formatter” with “a strict subset of PEP 8 coding style”.Black has a very opinionated code style 🔗. Black defaults to 88 characters per line 🔗, if you would rather change it, you can use a different number for the line-length option in pyproject.toml > line 26
Flake8
Flake8 is a code linter that warns you of syntax errors, possible bugs, stylistic errors, etc. “There are a few deviations that cause incompatibilities with black”. To fix this, we can pass few options to make flake8 consistent with black in the .flake8📃 (flake8 has not yet adopted pyproject.toml📃)
iSort
iSort sorts imports alphabetically, and automatically separated into sections and by type. “Black also formats imports, but in a different way from isort defaults which leads to conflicting changes” 😵. To fix this, we can tell isort to use black as a profile option in pyproject.toml > line 29
More details on using isort with black 🔗 and a full list of isort CLI flags 🔗
nbStripOut
nbstripout strips the output from jupyter and ipython notebooks
PyDocStyle
Pydocstyle is a static analysis tool for checking compliance with Python docstring conventions. Three conventions are available: pep257, numpy and google . The pep257 convention is enabled by default, to change it, you can use the convention option in pyproject.toml > line 32. You can also ignore specific error codes (e.g., missing docstrings) by using the add-ignore option in pyproject.toml > line 33
More details on supported conventions 🔗 and error codes 🔗
Notebook
Your beloved classic Jupyter notebook
Rich
Rich render pretty tables, progress bars, markdown, syntax highlighted source code, tracebacks, and more in the terminal. For a video introduction check 🔗
Pre-commit
To be able to install the pre-commit hooks in the next step, first you need to install the pre-commit framework
What are these hooks for? Every time you commit a code change, the hooks are run on it to automatically point out issues (e.g. is it black compliant?). By pointing issues out before a code review, it allows reviewers to focus on the architecture of a change while not wasting time with trivial style nitpicks
2.3 🧷Install pre-commit hooks
The pre-commit target in Makefile > line 18 has one additional prerequisite$(LOCAL_PRE_COMMIT) which is the value of a variable defined earlier in Makefile > line 6. The variable LOCAL_PRE_COMMIT checks if there is a path to the pre-commit package in .venv
Makefile > line 20 is where pre-commit installs a git hook script in .git/hooks/pre-commit if it finds the configuration file .pre-commit-config.yaml ⚙️ (which we have 🤘)
The .pre-commit-config.yaml defines the hooks to use. Each hook has the following format:
For example, to add a hook to pydocstyle,go to the package repo, look for .pre-commit-hooks.yaml to find the hook id, look for the release you would like to use, and copy the details over to your .pre-commit-config.yaml ⚙️ with any additional arguments:
Included hooks 🧷🧷🧷
These are the hooks I’m currently using for our team: