avatarLynn Kwong

Summary

The context discusses Python typing and validation using mypy and pydantic libraries.

Abstract

The context starts by explaining the difference between static and dynamic typing languages, emphasizing Python as a dynamically typed but strongly typed language. The author discusses the advantages of adding type hints to Python code, mentioning mypy for static type checking and pydantic for runtime type validation. The author then shows how to add type hints to functions, classes, generators, and dictionaries, before explaining how to use mypy for type checking. Finally, the author introduces the pydantic library and its use cases.

Opinions

  • Adding type hints to Python code can make the code easier to read, maintain, and debug, especially when working in teams.
  • mypy is an excellent tool for static type checking, and it supports various Python versions and IDEs.
  • pydantic is a powerful library for data validation and settings management, enforcing type hints at runtime, and providing user-friendly errors when data is invalid.
  • mypy and pydantic complement each other, with the former checking types at development time and the latter ensuring data validation at runtime.
  • The author highly recommends using pydantic for validating and normalizing responses of APIs, especially when working with JSON schemas.

Python typing and validation with mypy and pydantic

Let’s make our Python code more readable with typing

Python is a dynamically typed programming language, which means the types are only checked at runtime and a variable is allowed to change its type over its lifetime, whereas a statically typed language like Java checks the types at compile time, and a variable is not allowed to change its type over its lifetime. On the other hand, Python is a strongly typed language because the types cannot be automatically converted at runtime. For example, you cannot have an addition calculation on integer 1 and string "2", while in a weakly typed language such as JavaScript such calculation is allowed.

Photo by Hitesh Choudhary (Python programming) from Unsplash.

Even though dynamic typing can make it faster to write Python code in the development stage, it is also very easy to introduce bugs and errors which can only be identified at runtime. Besides, with no type definitions, the code can be more difficult to read and maintain. For example, you need to read through a function to get to know what type of data would be returned by it. However, with type hints or type annotations, the return type of a function can be known immediately. Once a program is developed, you would rarely need to rewrite or redesign it. However, it is much more common that you or your colleagues need to read or maintain the code after some time. Therefore, making the code easier to read would be very important, especially if you work in a team where people have to review each other’s code.

Typing has become more and more important in Python and the type hint standards introduced in PEP484 make it possible and easy to add type annotations to your Python code. After type hints have been added to a Python file, the mypy library can be used to do static type checking before it is run. Besides, pydantic, a data validation library using Python type annotations, can enforce type hints at runtime and provide user-friendly errors when data is invalid.

How to add type hints to your Python code?

The typing system of Python is very similar to that of TypeScript, which is a strict syntactical superset of JavaScript and adds optional static typing to the language. Similar to the use cases in TypeScript, we normally only add types to functions and classes and don’t need to add them to variables as they can be inferred from the values.

For a function, type hints should be added to all parameters and also the return value.

For simple types such int, float, bool, list, dict, etc, we can use the corresponding built-in type keywords as the types. It should be noted that arbitrary arguments are typed with the type of the expected value. In this example, the keyword arguments should have a Boolean value.

For a class, we need to add type hints for all the methods. Specially, we should not write the type for self and the __init__ method should always return None:

The type hints for a generator are special because we need to add type hints for the yield value, send value, and return value, in the format Generator[yield_type, send_type, return_type]. For most cases, send_type and return_type would be None, and you only need to specify the type for the yield value. For example:

If you want to check how to send a value to a generator, this is a good tutorial.

Most type hints are quite straightforward and PEP484 has good documentation for them. However, the types for the dictionary are worth additional attention. If a variable is a dictionary and all the values have the same type (for example integer), we can write the type as Dict[str, int], in which str is the type of the key and int is the type of the value.

However, if the keys have different values, we can’t use this kind of type hint, but should use TypedDict to define the type. Let’s add a dict method that will call return the __dict__ attribute of a class instance which is a dictionary of attributes.

Especially, the cast function is a helper function that lets you override the inferred type of an expression. It’s only for mypy — there’s no runtime check. If we don’t use the cast function here, we would need to explicitly specify the return type as ComputerType:

When we have added type hints to our Python code, we can use the mypy library to check if the types are added properly. To use mypy, first, we need to install it:

$ python -m pip install mypy

Let’s put the code for the Computer class in a script called computer.py and use mypy to check the validity of the types added.

$ mypy computer.py

You can see that our code passes the type check of mypy. If you remove the cast function or change the type of the parameters, you can see that mypy would fail. You can modify your code according to the error message of mypy.

By default, mypy will not type check dynamically typed functions. This means that mypy normally will not report any errors with regular Python code with no type hints.

There are tons of options for mypy. However, most of them will not be used. The most important option is --ignore-missing-imports which is almost always needed in your work. This is because many legacy codes don’t have type hints and would fail for mypy.

In the last section, let’s introduce the pydantic library which is used for data validation and settings management using Python type annotations. pydantic enforces type hints at runtime and provides user-friendly errors when data is invalid. Unlike the type hints introduced above which may seem cosmetic, the models introduced by pydantic are productive and would change your data. pydantic is very commonly used to validate and normalize the responses of APIs which are normally dictionaries and can be converted to JSON. The JSON schema should be uniform so that other programs can easily use it.

To use pydantic, we also need to install it first:

$ python -m pip install pydantic

Then we need to define some models which simply are classes that inherit from BaseModel of pydantic. Let’s define a model for our Computer class:

We can see that a model in pydantic is very similar to the TypedDict type. The first difference is that a model is inherited from BaseModel, rather than TypedDict. Besides, an attribute can have default values in a model, which doesn’t make sense for a TypedDict type.

A TypedDict type functions exactly like the built-in dict. However, the model does more than that. It can validate and normalize the data based on the model definition. Let’s pass a dictionary to our mode and see what we’ll get:

Note that, unlike dict, we need to unpack the dictionary when it’s passed to a model. If we check the content of the computer variable now, we can see the content is:

The storage field has been converted to a string and ssd gets the default value which is True. If any non-optional field is missing or the type cannot be converted (for example from a non-digit string to float), an error would occur. The error message is very user-friendly and you can quickly identify the problem and modify your code accordingly.

The computer variable now is of type ComputerModel, we can only access the model attributes directly via their names (e.g. compter.brand). If we want to convert computer to a dictionary we need to export the model. The .dict(…) method of a model is the primary way of converting a model into a dictionary. Sub-models will be recursively converted to dictionaries.

Besides exporting a model as it is, we can use the include and exclude keywords to include or exclude some fields from the resultant dictionary. For nested objects, we can also specify which fields of the sub-models should be included or excluded by specifying the relevant keys. Since we have a super simple model let’s just use the include keyword here.

These two commands would give the same results. The ellipsis (...) indicates that we want to exclude or include an entire key, which is normally used for nested objects. Note that if we specify subfields or ... for any key, all the keys should have subfields or ... specified otherwise there would be syntax errors.

Special care should be taken when including or excluding fields from a list of submodels or dictionaries. To include or exclude a field from every member of a list, the dictionary key "__all__" should be used as follows:

The data we get is:

{'name': 'John',
 'computers': [{'brand': 'HP', 'ram': '4GB'},
               {'brand': 'Apple', 'ram': '8GB'}]}

If you use mypy to check the types in this new script pydantic-computer.py, you can see that it also passes the type check of mypy. pydantic is really powerful and only the most commonly used features are introduced in this article. If you want to learn more about it and have more fine-tuned settings, the official document is a good starting point.

In this article, the basic concepts of the type hints in Python are introduced and some special use cases for adding type hints to functions, classes, generators, and dictionaries are discussed. When type hints are added to a Python file, we can use the mypy library to check the validity of the types added. Finally, the pydantic library which can be used to validate and normalize data in Python using type hints is briefly introduced. pydantic can be very useful for API response validation and cleaning.

Related articles:

If you enjoyed this article, consider trying out the AI service I recommend. It provides the same performance and functions to ChatGPT Plus(GPT-4) but more cost-effective, at just $6/month (Special offer for $1/month). Click here to try ZAI.chat.

Python
Typing
Mypy
Type Hints
Pydantic
Recommended from ReadMedium