avatarThuwarakesh Murallie

Summary

The provided content outlines essential practices for structuring Python projects to enhance maintainability, scalability, and understandability.

Abstract

The article emphasizes the importance of a well-organized project structure for Python projects, advocating for version control, dependency management, clean code practices, clear documentation, and secure handling of secrets. It introduces tools and practices such as git for version control, Poetry for dependency management, and pre-commit hooks for automated code formatting. The use of configuration files, environment files, and comprehensive documentation is also highlighted to facilitate project setup and maintenance. The author provides a blueprint for Python project structures, including a GitHub repository template, to help developers adhere to these best practices and streamline the development process.

Opinions

  • The author believes that a project's structure is a critical factor in its success and usability by others.
  • The author suggests that even simple scripts can evolve into complex projects, necessitating good practices from the start.
  • The use of git and a .gitignore file is strongly recommended for version control.
  • Poetry is favored over Virtualenv for dependency management due to its additional capabilities.
  • Automated clean code practices using tools like black, autoflake, and isort are encouraged for code consistency.
  • The author promotes the use of TOML files for configuration management, considering them user-friendly and well-suited for the task.
  • Environment files are recommended for storing secrets to prevent them from being committed to the code repository.
  • A README file is considered essential for providing context and setup instructions for the project.
  • Documentation is seen as optional but beneficial for explaining complex aspects of the project beyond the README.
  • The author provides practical examples and commands to implement the recommended practices.
  • Hosting documentation on GitHub Pages is suggested for easy access and sharing.
  • The author has created Python project blueprints, available on GitHub, to serve as a starting point for developers to apply these best practices.

7 Ways to Make Your Python Project Structure More Elegant

Here are the best practices for a manageable, scalable, and easily understandable python project structure

Blueprint for a perfect Python project structure. — Photo by Designecologist from Pexels

Great projects start as a single file script and evolve into a community-maintained framework. But few projects make it to this level. Most, regardless of their usefulness to others, end up not being used by anyone.

The critical factor that makes your project convenient (or miserable) for others is its structure.

What is a perfect Python project structure that works well?

This article will go through all of them and set up your python project for maximum maintainability. If this seems overwhelming, you can straightaway use the blueprint I have created for you.

Make every project a git repository.

You can make a project a git repository with a single line of command. It wouldn't take a minute to do this.

git init

But we don't do this so often. In most cases, we tend to think what we write a simple script for a simple problem. That's okay.

What's simple for you is complex for others. Someone on the other side of the planet is working on a problem for which your simple script is the solution.

Also, your simple problem may absorb other simple issues along the way and become monstrous.

As it grows, it becomes increasingly difficult to maintain your code. You have no visibility into what changes you made and why you did them. If you're a team, then comes the problem of who and when they made those changes.

Suddenly, you start to worry about the script you initially wrote instead of feeling proud of it.

Start every single line of code as if it's the beginning of the next Facebook. And you need a version control system to do it.

As a first step, make your project a git repository and include a .gitignore file as well. You can generate an ignore file using the online tool gitignore.io.

.
├── .git
│   ├── <Git managed files>
│  
├── .gitignore

As you progress, make sure you commit your changes with an illustrative message. Commits are checkpoints at different times. They are indeed versions of your software you can check out at any time.

How to write a good commit message?

A good commit message should complete the sentence, "if applied, this commit will …" They should be in sentence case but without a trailing period. An optimal length for a commit message is about 50 characters.

The following is an example commit message using the git CLI.

git commit -am 'Print a hello world message'

You can also create it with more details. You can run git commit without a commit message.

This action will open up an editor where you can add multi-line commit messages. Yet, use the above convention to create a title of your commit message. You can use a blank line to separate the title and the body of your message.

Print a hello <user> message

Print a hello world message and a hello <user> message

The main function was hardcoded with 'hello world' message.
But we need a dynamic message that takes the an argument and greet.

Amend the main function to take an argument and string formating to
print hello <user> message

These commit message conventions will make it easy to skim through the git log of all the changes you made.

Clean git history with clean commit messages — from the author’s blog

Use a dependency management tool.

Most developers, especially the new ones, don't pay enough attention to the project dependencies. What's dependency management in the first place?

The software you develop may depend on packages other developers created. They may, in turn, have dependencies on several different packages. This modular approach helps create software products quickly without reinventing the wheel all the time.

Dependencies, even within the same project, may vary between different environments. Your development team may have a set of dependencies that don't go into the production system.

A sound dependency management system should be able to distinguish between these sets.

Python developers use either a virtual (or conda) environment to install project dependencies. But Virtualenv is not a dependency management tool. It doesn't have the benefits discussed above. It only helps to isolate the project environment from your system.

Poetry is a perfect dependency management tool for Python projects. It allows you to,

  • separate development and production dependencies;
  • set Python version for each project separately;
  • create entry points to your software, and;
  • helps you to package and publish it to repositories such as PyPI.

Poetry is not a replacement for Virtualenvs. It creates and manages virtual env with convenient utility commands.

If you love the idea, I published a full-length tutorial about how you can use Poetry to manage project dependencies efficiently.

Automate clean code practices in your Python projects.

Python is the most straightforward programming language. It's close to natural languages yet powerful in its applications.

But that doesn't mean your code is always readable. You may end up writing code that is too lengthy and in a style that is too difficult for others to digest. To address this problem, Python introduced a common standard called PEP 8.

PEP 8 is a set of guidelines for python programmers to code concisely and consistently. It talks about,

  • naming conventions of Python classes, functions, and variables;
  • proper usage of whitespaces;
  • code layout such as optimal line length, and;
  • conventions about comments;

Though this guideline solves a huge problem for Python programmers, it's challenging to maintain this manually in a large project.

Luckily packages such as black and autopep8 make it easy to do it with one line of command. Here's a line that formats every file inside the blueprint folder.

black blueprint

Autoflake is another tool that helps you get rid of unused variables in your script. The variables we declare but don't use create inconvenience reading the code. The following line doe the magic.

autoflake --in-place --remove-unused-variables blueprint/main.py

Lastly, I'd like to mention isort, a Python package that optimizes your imports.

isort blueprint/main.py

All these packages clean up your code in a single line. But, even then, running this every time you make changes to your script is more challenging than we think.

This is why I prefer Git pre-commit hooks.

Using pre-commit hooks, you can configure to run black, autoflake, and isort to format your codebase every time you commit a change.

How to configure pre-commit hooks to format Python code automatically?

You can install the pre-commit package using Poetry add or pip. You should have a .pre-commit-config.yaml file in your project root. You can configure which hooks to run just before every single commit. Then you'll have to install pre-commit to the git repository. You can do it by running the pre-commit install command from your project root.

poetry add pre-commit
# Create the .pre-commit-config.yaml file
poetry run pre-commit install

Here's how your .pre-commit-config.yaml file should look like.

repos:
  - repo: local
    hooks:
      - id: autoflake
        name: Remove unused variables and imports
        entry: bash -c 'autoflake "$@"; git add -u' --
        language: python
        args:
          [
            "--in-place",
            "--remove-all-unused-imports",
            "--remove-unused-variables",
            "--expand-star-imports",
            "--ignore-init-module-imports",
          ]
        files: \.py$
      - id: isort
        name: Sorting import statements
        entry: bash -c 'isort "$@"; git add -u' --
        language: python
        args: ["--filter-files"]
        files: \.py$
      - id: black
        name: Black Python code formatting
        entry: bash -c 'black "$@"; git add -u' --
        language: python
        types: [python]
        args: ["--line-length=120"]

That's it. Now try making some changes to your code and commit it. You'll be amazed to see how it automatically corrects your coding style issues.

Use a configuration file to separate project parameters.

A configuration file is like the central control panel for your application. A new user to your code will only have to change the configuration file to get it running.

What goes in a configuration file?

We know hard-coding static variables is a bad practice. For example, if you need to set a server URL, you shouldn't put it on the code directly. Instead, what's most suitable is to put it in a separate file and read from it. If you or someone else wants to change it, they only have to do it once, and they know where to do it.

In the early days, we used to read configurations from text files. I've even used JSONs and CSVs too. But we have more evolved alternatives to manage configurations.

A perfect configuration file should be easy to understand and allow comments. I found TOML files are incredible for this matter. Poetry already creates a TOML file to manage its configuration. Thus, I don't have to create a new one.

You can read a TOML file with the toml python package. It only takes a single line of command to convert your configuration file into a dictionary.

Here's how to read a TOML configuration file.

  1. Install toml package. If you're using Poetry to manage dependencies, you can install it using the add command. If not the plain old pip works.
poetry add toml
# If you're still using pip and virtualenv,
# pip install toml

2. Create (or Edit if it's already there) a TOML file. You can use any name. If you're using poetry, it creates one called pyproject.toml.

[app]
name='blueprint'

3. Load it in your python project.

import toml
app_config = toml.load('pyproject.toml')
# now you can access the configuration parameters
print(app_config)
# {'app': {'name': 'blueprint'}}

Store secrets in an environment file

A set of confidential items should never go to your code repository. A common practice is to put them on a .env file and read them at run time. You don't commit a .env file to your code repository either.

You can use this file to store information such as API keys and database credentials.

It's perfectly fine to use your configuration file to store secrets if you don't commit it to your repository. Also, you could use the .env file as your project configuration file if you don't have many complex configurations.

A critical difference between environment and config files is how you read their values. You can access environment variables anywhere on your project using Python's built-in os module. But configuration file values aren't visible to every module of your project. You'll have to either read the file on every module or read once and pass it along with function arguments.

But I strongly recommend using two separate files as they both serve different purposes. The config file conveniently configures your project without hard coding anything, and .env files store secrets.

You can use the .gitignore to stop your env files from accidentally sneak into your repository. This is one of the things you should do on your day 0.

Here's how to create and read an environment file in Python.

  1. Create a .env file in the project root
SECRET_KEY='R9p9BRDshkwzpsooPEmZS86OWjWxQvn7aPunVexFoDw'

2. install python-dotenv.

poetry add python-dotenv
# pip install python-dotenv

3. Load env file to your project.

from dotenv import load_dotenv
load_dotenv()
# If your env file is different from .env you can specify that too,
# load_dotenv('.secrets/.environ')

4. Access environment variables from anywhere in your project.

import os
print(os.getenv('SECRET_KEY'))
# R9p9BRDshkwzpsooPEmZS86OWjWxQvn7aPunVexFoDw

Environment files are an age-old convention. Hence most technologies support them upfront. Also, you can set environment variables directly on your OS.

Use a README and give additional context.

It would help if you always gave some context to someone who reads your code. What is it about, and why you wrote it?

A Readme file is short documentation for your project. It should include instructions for another person to set up your project on their system without your help.

README files are usually markdown files. GitHub, Bit Bucket, and GitLab are using it to render styled documentation on the project repository.

Markdown adds a few conventions to make an ordinary text appear special. For instance, you may add a # mark in front of a line to make it a title and## to make it a subtitle. Here's a cheat sheet to learn more about markdown.

Markdown cheatsheet to write better README — from the Author's blog.

Help readers with accompanying docs (optional.)

You don't have to have a multipage web app to document every project. But it's a good idea to have one. Thus, I made this optional.

README files are to hold basic information about your application. A doc is helpful to give specific details about your project.

For instance, you may talk about package installation and configuration on the README. But talking about 101 API endpoints you have in your application is not recommended. You may have to organize it better in an HTML doc and host it separately.

Python has several excellent tools to create HTML docs. Mkdocs (and its Material theme) Sphinx and Swagger are the popular ones to pick. How would you choose the right documentation tool?

If your application is about many API endpoints, Swagger is the best way to generate docs. If not, Mkdocs works well. With Mkdocs, you can create custom pages. You can also convert your docstrings using the mkdocstring extension. But in my opinion, you rarely need to create documentation for your code. YOUR CODE IS A DOCUMENTATION ITSELF!

Here's how to use Mkdocs on Python projects

1. You can install mkdocs as a dev dependency. You don't need it on a production system.

poetry add -D mkdocs-material

2. Edit the index.md file (This too is a markdown file.)

mkdocs new docs
cd docs

3. Start mkdocs server to see it on a browser.

mkdos serve

Do you know you can host your docs with GitHub Pages for free?

You can also host your documentation if you have a GitHub account and push your master branch to the GitHub repository. It takes one (and only one) command and a few seconds.

mkdocs gh-deploy

The above command will build static versions of your documentation and host it on GitHub Pages. The hosted URL usually be like

https://<Your username>.github.io/<Your repository name/

You can also make it running on your custom domain or subdomain of your main site. Here’s a guide from GitHub for custom domains.

You can see the project blueprint's hosted documentation here.

How to use the Python project blueprints?

I've created a couple of GitHub repositories that you can use as a starting point for your Python project. The first is a blueprint for Python projects in general, and the other is specifically for Django applications.

Python project blueprint

Django project blueprint

You can clone the repository and start working on it. Whenever you think you should have it on a remote GitHub (or Bitbucket, Gitlab), you can create a remote repository and connect it with your local one. Here's how to do it.

git clone git@github.com:thuwarakeshm/blueprint.git git remote set-url <Your New Repository>

But there is a better way.

Go to the GitHub links I’ve given above, and you'll see a button at the top right corner saying "fork." Forking allows you to create a copy of the repository that you can own. You can make changes without affecting the original sources at all.

So do fork the repository and clone the new one to your local computer.

These repositories are built with all the best practices I discussed above. But feel free to make it yours. You can change anything you want and the way you want.

Final thoughts,

Python is an elegant language. But we need more discipline around it to do excellent projects with it. What if we have most of them automated?

That's what we've just finished discussing.

We talked about why and how important having a git repository is for all of your projects. Then we discussed managing dependencies with Poetry. We learned how to use README files, create documentation, managing configuration and environment files too.

Each of the practices mentioned above themselves worth a 30day course. But I believe this article has given an idea about these techniques and why we use them.

Once you realize the benefits of these practices, you'd want to create projects that support them. I've already made a blueprint for you to don't have to start everything from scratch.

This guide is based on my understandings and experience. If you have anything to add or correct, I'd be more than willing to discuss them with you.

Thanks for reading, say Hi to me on LinkedIn, Twitter, and Medium.

Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.

Python
Programming
Data Science
Machine Learning
Software Development
Recommended from ReadMedium