avatarXiaoxu Gao

Summary

This context provides an in-depth guide on how to write a configuration file in Python, covering best practices, format choices, and various options for managing configuration files.

Abstract

The context discusses the importance of configuration management in software development and the role of configuration files in executing software in different environments without changing the code. It highlights the need for environment-dependent parameters to be stored in external files, as recommended by The Twelve-Factor App methodology. The guide explores different formats for configuration files, such as YAML, JSON, TOML, and INI, and their respective advantages and disadvantages. It also delves into various Python packages and libraries for managing configuration files, including built-in packages like Configparser and third-party libraries like python-dotenv, Dynaconf, and Hydra. The context emphasizes the importance of validation, readability, and maintainability in configuration files.

Bullet points

  • Configuration management is crucial in software development to allow software to be executed in any environment without changing the code.
  • Environment-dependent parameters should be stored in external files, as recommended by The Twelve-Factor App methodology.
  • Common formats for configuration files include YAML, JSON, TOML, and INI, each with its own advantages and disadvantages.
  • Python's built-in package Configparser is primarily used for reading and writing INI files but supports dictionary and iterable file objects as input.
  • Third-party libraries like python-dotenv, Dynaconf, and Hydra offer more advanced features for managing configuration files.
  • Validation, readability, and maintainability are key considerations when choosing a format and library for managing configuration files.

From Novice to Expert: How to Write a Configuration file in Python

Treat config file like your production code

Photo by Brett Jordan from Unsplash

When we design software, we normally put a lot of effort into writing high-quality code. But that’s not enough. Good software should also take care of its eco-system, like testing, deployment, network, etc. One of the most important aspects is configuration management.

Good configuration management should allow the software to be executed in any environment without changing the code. It helps Ops to manage all the hassle settings and it provides a view of what can happen during the process and even allows them to change the behavior during the runtime.

The most common configuration includes credentials to the database or an external service, the hostname of the deployed server, dynamic parameters, etc.

In this article, I want to share with you some good practices of configuration management and how we can implement them in Python. If you have more ideas, please leave your comments below.

When do we need a separate configuration file?

Before writing any configuration file, we should ask ourselves why we need an external file. Can’t we just make them constants in the code? Actually, the famous The Twelve-Factor App has answered this question for us:

A litmus test for whether an app has all config correctly factored out of the code is whether the codebase could be made open source at any moment, without compromising any credentials. Note that this definition of “config” does not include internal application config, such as config/routes.rb in Rails, or how code modules are connected in Spring. This type of config does not vary between deploys, and so is best done in the code.

It recommends that any environment-dependent parameters such as database credentials should sit in the external file. Otherwise, they are just normal constants in the code. Another use case I see a lot is to store dynamic variables in the external file, for instance, a blacklist or whitelist. But it can also be a number within a certain range (e.g. timeout) or some free texts. These variables can possibly be the same in each environment, but the configuration file makes the software much more flexible and easy to edit. However, if it grows too much, we might consider moving it to a database instead.

Which format of the configuration file should I use?

In fact, there are no constraints on the format of the configuration file as long as the code could read and parse them. But, there are some good practices.

The most common and standardized formats are YAML, JSON, TOML and INI. A good configuration file should meet at least these 3 criteria:

  1. Easy to read and edit: It should be text-based and structured in such a way that is easy to understand. Even non-developers should be able to read.
  2. Allow comments: Configuration file is not something that will be only read by developers. It is extremely important in production when non-developers try to understand the process and modify the software behavior. Writing comments is a way to quickly explain certain things, thus making the config file more expressive.
  3. Easy to deploy: Configuration file should be accepted by all the operating systems and environments. It should also be easily shipped to the server via a CDaaS pipeline.

Maybe you still don’t know which one is better. But if you think about it in the context of Python, then the answer would be YAML or INI. YAML and INI are well-accepted by most Python programs and packages. INI is probably the most straightforward solution with only 1 level of the hierarchy. However, there is no data type in INI, everything is encoded as a string.

The same configuration in YAML looks like this. As you can see, YAML supports nested structures quite well (like JSON). Besides, YAML natively encodes some data types such as string, integer, double, boolean, list, dictionary, etc.

JSON is very similar to YAML and is extremely popular as well, however, it’s not possible to add comments in JSON. I use JSON a lot for internal config inside the program, but not when I want to share the config with other people.

TOML, on the other hand, is similar to INI, but supports more data types and has defined syntax for nested structures. It’s used a lot by Python package management like pip or poetry. But if the config file has too many nested structures, YAML is a better choice. The following file looks like INI, but every string value has been quoted.

So far, I’ve explained WHY and WHAT. In the next sections, I will show you the HOW.

Option1: YAML/JSON — Simply read an external file

As usual, we start from the most basic approach, which is simply creating an external file and reading it. Python has dedicated built-in packages to parse YAML and JSON files. As you see from the code below, they actually return the same dict object, so the attribute access will be the same for both files.

Read

Due to security issue, it is recommended to use yaml.safe_load() instead of yaml.load() to avoid code injection in the configuration file.

Validation

Both packages will raise a FileNotFoundError for a non-existing file. YAML throws different exceptions for a non-YAML file and an invalid YAML file, while JSON throws JSONDecoderError for both errors.

Option2: Cofigureparser — Python built-in package

From this onwards, I will introduce packages designed for configuration management. We start with a Python built-in package: Configparser.

Configparser is primarily used for reading and writing INI files, but it also supports dictionary and iterable file objects as input. Each INI file consists of multiple sections where there are multiple key, value pairs. Below is an example of accessing the fields.

Read

Configparser doesn’t guess datatypes in the config file, so every config is stored as a string. But it provides a few methods to convert strings to the correct data type. The most interesting one is the Boolean type as it’s able to recognize Boolean values from 'ye>e>>>ee>s'/'e>no', 'on'/'off', 'true'/'false' and '1'/'0'.

As mentioned before, it could also be read from a dictionary using read_dict(), or a string using read_string() or an iterable file object using read_file().

Validation

The validation of Configparser is not as straightforward as YAML and JSON. First, it doesn’t raise a FileNotFoundError if the file doesn’t exist, but instead, it raises a KeyError when it tries to access a key.

Besides, the package “ignores” the error of indentation. Like the example below, if you have an extra tab or space before “DEBUG”, then you would get a wrong value for both ENVIRONMENT and DEBUG.

Nevertheless, Configparser is able to return ParserError for multiple errors (see the last test case). This helps us to solve problems in one shot.

Option3: python-dotenv — Make configurations as environment variables

Now we move to third-party libraries. So far, I have actually missed one type of configuration file which is .env. The variables inside .env file will be loaded as environment variables by python-dotenv and can be accessed by os.getenv.

A .env file basically looks like this. The default path is the root folder of your project.

ENVIRONMENT=test
DEBUG=true
USERNAME=xiaoxu
PASSWORD=xiaoxu
HOST=127.0.0.1
PORT=5432

Read

It is extremely easy to use. You can decide whether you want to override the existing variable in the environment with the parameter override.

Validation

However, python-dotenv doesn’t validate the .env file. If you have a .env file like this, and you want to access DEBUG, you will get None as the return without an exception.

# .env
ENVIRONMENT=test
DEBUG
# load.py
load_dotenv()
print('DEBUG' in os.environ.keys())
# False

Option4: Dynaconf — Powerful settings configuration for Python

Dynaconf is a very powerful settings configuration for Python that supports multi-file formats: yaml, json, ini, toml and python. It can automatically load .env file and supports custom validation rules. In a short, it covers pretty much all the functionalities from the previous 3 options and even goes beyond that. For example, you can store an encrypted password and use a custom loader to decrypt the password. It’s also nicely integrated with Flask, Django, and Pytest. I will not mention all the functionalities in this article, for more details, please refer to their documentation.

Read

Dynaconf uses .env to find all the settings file and populate settings object with the fields. If 2 settings files have the same variable, then the value will be overridden by the latest settings file.

Validation

One of the interesting features to me is the custom validator. As mentioned before, Configparser doesn’t validate INI file strictly enough, but this can be achieved within dynaconf. In this example, I check whether certain keys exist in the file and whether certain key has the correct value. If you read from YAML or TOML file which supports multiple data types, you can even check if a number is in a certain range.

Integration with Pytest

Another interesting feature is the integration with pytest. The settings for unit testing are normally different from other environments. You can use FORCE_ENV_FOR_DYNACONF to let the application read a different section in your settings file, or use monkeypatch to replace a specific key and value pair in the settings file.

Refresh the config during Runtime

Dynaconf also supports reload() , which cleans and executes all the loaders. This is helpful if you want the application to reload the settings file during runtime. For example, the application should automatically reload the settings when the config file has been opened and modified.

Option5: Hydra- Simplify the development by dynamically creating a hierarchical configuration

The last option is much more than just a file loader. Hydra is a framework developed by Facebook for elegantly configuring complex applications.

Besides reading, writing, and validating config files, Hydra also comes up with a strategy to simplify the management of multi config files, override it through a command line interface, create a snapshot of each run and etc.

Read

Here is the basic use of hydra. +APP.NAME means adding a new field in the config, or APP.NAME=hydra1.1 to override an existing field.

Validation

Hydra nicely integrates with @dataclass to perform basic validations such as type-checking and read-only fields. But it doesn’t support __post_init__ method for advanced value checking described in my previous article.

Config group

Hydra introduces a concept called config group. The idea is to group configs with the same type and choose one of them during the execution. For example, you can have a group “database” with one config for Postgres, and another one for MySQL.

When it gets more complex, you might have a layout like this in your program (an example from Hydra documentation)

and you want to benchmark your application with different combinations of db, schema and ui, then you can run:

python my_app.py db=postgresql schema=school.yaml

More …

Hydra supports parameter sweep with --multirun, that runs multiple jobs at the same with different config files. For instance, for the previous example, we can run:

python my_app.py schema=warehouse,support,school db=mysql,postgresql -m

Then you basically start 6 jobs simultaneously

[2019-10-01 14:44:16,254] - Launching 6 jobs locally
[2019-10-01 14:44:16,254] - Sweep output dir : multirun/2019-10-01/14-44-16
[2019-10-01 14:44:16,254] -     #0 : schema=warehouse db=mysql
[2019-10-01 14:44:16,321] -     #1 : schema=warehouse db=postgresql
[2019-10-01 14:44:16,390] -     #2 : schema=support db=mysql
[2019-10-01 14:44:16,458] -     #3 : schema=support db=postgresql
[2019-10-01 14:44:16,527] -     #4 : schema=school db=mysql
[2019-10-01 14:44:16,602] -     #5 : schema=school db=postgresql

Conclusion

In this article, I’ve talked about configuration management in Python in terms of WHY WHAT, and HOW. Depending on the use case, a complex tool/framework isn’t always better than a simple package. No matter which one you choose, you should always think about its readability, maintainability, and how spot the error as easily as possible. In fact, a config file is just another type of code.

I hope you enjoy this article and feel free to leave your comments below.

Reference

Python
Software Development
Programming
Data Science
Coding
Recommended from ReadMedium