Do Not Use If-Else For Validating Data Objects In Python Anymore
Cerberus — A neat and readable way to validate attributes of a dictionary
It is not common to use lots of dictionaries in our Python programs to hold data objects with attributes. One typical example will be Web Development. Suppose we are using Python to develop the backend web services, it will be important to verify the JSON payload that is received from the frontend. Also, data science may need to verify data entries in some cases, too.
The most intuitive, but could be the worst solution is to use numerous if-else conditions. This might be fine if we are only verifying one or two attributes with simple requirements, but not scalable at all. Object-oriented methods are considered to be more advanced. However, sometimes, we may not want to over-engineer our applications.
In this article, I’ll introduce an amazing third-party library — Cerberus. It will simplify the code validation to a large extent. It also makes the validation rules reusable, and flexible. It supports many complex scenarios, too.
1. Quick Start with Typical Examples
1.1 Installation
As usual, Python makes it quite easy to install a 3rd party library. We just need to run the pip
command to install it.
pip install cerberus
Then, we can start to use it. For the first example, let me break it down into several pieces to clarify the terminology.
1.2 Basic Usage
Firstly, we need to import the Validator
class from the cerberus
module that we have just installed.
from cerberus import Validator
Then, we can define a “schema”. This schema will contain all the rules that we want to use to verify our data dictionary. This schema is also a dictionary.
schema = {
'name': {
'type': 'string'
},
'age': {
'type': 'integer'
}
}
The above schema tells Cerberus that we have 2 fields in our dictionary. The “name” field should be a string, and the “age” field should be an integer.
Then, we can initialise our “validator” using this schema. Our validator will be an instance of a Cerberus Validator
class.
profile_validator = Validator(schema)
Now, let's create a sample dictionary to test this validator. For any “validator” instances, we can call its validate()
method to validate a data dictionary.
my_profile = {'name': 'Christopher Tao', 'age': 34}
profile_validator.validate(my_profile)
It returns True
so it means that the dictionary passed validation. In the next example, if we changed the age value to a string, it will fail.
my_profile = {'name': 'Christopher Tao', 'age': '34'}
profile_validator.validate(my_profile)
You must be asking how we can not the failed reasons. It will be stored in the validator. We can have all the errors (if not only one) from the validator by calling its attribute errors
.
profile_validator.errors
1.3 More Complex Validation Rules
Certainly, we can have more complex rules rather than only a data type. For example, we would like our users to be at least 18 years old, then we can add the min
rule to the schema.
profile_validator = Validator()
my_profile = {'name': 'Alice', 'age': 16}
profile_validator.validate(document=my_profile, schema={
'name': {
'type': 'string'
},
'age': {
'type': 'integer',
'min': 18
}
})
In the example above, the validator is initialised without a schema. Then, it can be assigned with a schema on the fly in the validate()
method. This is just a more flexible way to do this. In case our schema rules can be changed during its life, it can be very convenient.
The error shows the failed reasons pretty smartly without any extra effort, which is one of the reasons I love this library.
1.4 Validate Nested Dictionaries
What if our dictionary is nested? In other words, there are sub-documents in a JSON document. Don’t worry, it is supported by Cerberus.
Suppose we need to have an address dictionary with street number and street name in the profile dictionary, we can define the schema as follows.
profile_validator = Validator()
profile_validator.validate({
'name': 'Chris',
'address': {
'street_no': '1',
'street_name': 'One St'
}
}, {
'name': {'type': 'string'},
'address': {
'type': 'dict',
'schema': {
'street_no': {'type': 'integer'},
'street_name': {'type': 'string'}
}
}
})
For the “address” sub-document, we just need to tell the schema it is a dict
type, and define a sub-schema inside it. Then, it will just work. The error also includes the hierarchy relationship so that we can easily troubleshoot.
2. Into The Unknown
It would be a common scenario that we can’t anticipate what fields we might actually get, so we need to handle the validation of the unknown fields. I would like to use this feature to show how flexible Cerberus is. So, let’s dive into the details about the unknown :)
2.1 Unknown is not Acceptable by Default
For example, our validator only knows that we will have a “name” field.
profile_validator = Validator({'name': {'type': 'string'}})
However, our dictionary has an unknown field “age”.
profile_validator.validate({'name':'Chris', 'age': 34})
This will cause validation to fail if we just leave it.
Don’t worry, Cerberus has a very comprehensive solution to handle unknown fields.
2.2 Allow Unknown
A very common requirement is that we want to ignore the unknown fields and just let them pass. In this case, we need to set the allow_unknown
attribute of our validator to True
.
profile_validator.allow_unknown = True
After that, the age
field as an unknown field will not be validated and simply be ignored.
profile_validator.validate({'name':'Chris', 'age': 34})
2.3 Allow Unknown for Particular Data Types
Another common requirement is that we may want to ignore certain data types. For example, the string fields are pretty free-style which we want to ignore, but we won’t allow any other data types such as integers to be unknown.
In this case, we can make sure allow_unknown
is False and specify a certain data type to it.
profile_validator.allow_unknown = False
profile_validator.allow_unknown = {'type': 'string'}
The validator schema was not changed, but let’s create a dictionary with firstname
and lastname
which are not existing in the schema.
profile_validator.validate({'firstname':'Chris', 'lastname': 'Tao'})
It passed the validation because it allows any string type unknown fields. However, if we add an integer field, it will fail.
profile_validator.validate({
'firstname':'Chris',
'lastname': 'Tao',
'age': 34
})
profile_validator.validate(my_profile)
2.4 Allow Unknown at Initialisation
If we know that we need to accept unknown fields, we can also add the flag when instantiating the validator as follows.
profile_validator = Validator({}, allow_unknown=True)
As shown above, the scheme is empty but allow_unknown
is set to True
. So, it will accept any fields.
2.5 Allow Unknown at Certain Level
We can even set allow_unknown
at a sub-document level. For example, we still want the dictionary to be rigorously validated at the root level, but for the address
object, we don’t want to add too many constraints to allow some uncertainty.
We can define the schema as follows.
profile_validator = Validator({
'name': {'type': 'string'},
'address': {
'type': 'dict',
'allow_unknown': True
}
})
Please be noted that the allow_unknown
is set to be True
under the address
level. So, no matter what we define in the address
sub-document, it will be fine.
profile_validator.validate({
'name': 'Chris',
'address': {
'street_no': 1,
'street_name': 'One St'
}
})
However, if we add an unknown field age
at the root level, it will fail.
profile_validator.validate({
'name': 'Chris',
'age': 34,
'address': {
'street_no': 1,
'street_name': 'One St'
}
})
3. Required Fields
We know that we can handle unknown fields using Cerberus. How about we want to enforce some fields to be mandatory?
By default, if the dictionary missed some fields that are defined in the schema, no error will be captured.
profile_validator = Validator({
'name': {'type': 'string'},
'age': {'type': 'integer'}
})
profile_validator.validate({'name': 'Chris'})
In the above code, the schema defined age
field but the dictionary doesn’t have it. The validation result will still be OK.
If we want to make all the fields mandatory, we can add a flag require_all
and set it to True
.
profile_validator = Validator({
'name': {'type': 'string'},
'age': {'type': 'integer'}
}, require_all=True)
profile_validator.validate({'name': 'Chris'})
Of course, we can also make certain fields to be mandatory. This can be done by a specific rule required
. It can be added to the field in the schema definition.
profile_validator = Validator({
'name': {'type': 'string', 'required': True},
'age': {'type': 'integer'}
})
profile_validator.validate({'age': 34})
4. Normalizing Dictionary
It was surprising and impressive to me that Cerberus can not only validate the dictionaries but also correct them. This is potentially very useful in data quality-assuring applications.
For example, we may have user profiles coming from different data sources. When we want to combine them as a single source of truth, it is found that the age is presented as an integer in one database whereas this is of string type in another database. In this case, Cerberus provides a function called normalize()
that can unify the data types to make sure it is consistent.
To achieve this, we need to specify the type we would like to have for the field. For example, we would like to unify the age
field to be an integer type. The code is as follows.
profile_validator = Validator({
'name': {'type': 'string'},
'age': {'coerce': int}
})
The coerce
tells the validator what is the data type we want. Please be noticed that this will NOT be used for validating purposes. So, if we have a dictionary with the age
field that is of string type, it can still pass.
my_profile = {'name': 'Chris', 'age': '34'}
profile_validator.validate(my_profile)
If we want to “normalize” the dictionary, we can call the normalize
method of the validator as follows.
my_profile_normalized = profile_validator.normalized(my_profile)
We can see that the age
value is converted to an integer after normalizing.
5. Other Rules and Customised Rules
5.1 Supported Rules
So far, I didn’t introduce too many types of validation rules. This is because there are approximately 30 different types of rules out-of-the-box in Cerberus. Here are some examples:
contains
a list contains a specific itemdependencies
rules of a field will be validated only if another field is presentedempty
a string value must not be emptynullable
a value that is allowed to beNone
typeforbidden
the value must not be in a pre-defined listexcludes
a field must not exist if another field is presentedregex
a string value must match the regexallof/anyof/noneof/oneof
define multiple rules and the value must satisfy all of them, any of them, none of them or at least one of them.
I won’t be able to introduce every single rule in Cerberus, but you can always check out all of them from the documentation.
https://docs.python-cerberus.org/en/stable/validation-rules.html
5.2 Customised Rules
What if none of these 30 rules provided by Cerberus satisfies our requirement? To maximise the flexibility, Cerberus also enable us to define customised rules.
The customised rules need to be defined as a function with three parameters as follows.
def my_rule(field, value, error):
if value % 2 != 0:
error(field, "Must be an even number")
The field
is the name of the field, the value
will be the value that we want to validate and the error
is a function that will define the error message which will be stored in the validator’s errors
attribute.
The my_rule
I have just defined is simply checking whether a number is an even number. Once it is defined, we can use it in the schema with the check_with
keyword.
my_validator = Validator({
'my_number': {'check_with': my_rule}
})
my_validator.validate({'my_number': 10})
Summary
In this article, I have introduced the 3rd party library Cerberus in Python. It provides such a neat solution to validate dictionaries for us. It is very flexible, which can be seen in the examples of handling unknown fields and defining required fields. Also, it supports about 30 rules out-of-the-box, as well as customizes validation rules by ourselves.
If you feel my articles are helpful, please consider joining Medium Membership to support me and thousands of other writers! (Click the link above)
Unless otherwise noted all images are by the author