Backward vs. Forward Compatibility

When building a client-server application, the client and server need to agree on how to talk to each other. For instance, if sending JSON, then the client and server have to agree on field names and data types. For databases, the concept is similar; without a schema, the only information you could get back would be an ordered bag of values. Values are meaningless without context.

It’s easy to build the first version when the application is still in development. Coming up with an agreement on the fields and types of data is relatively straightforward. You can break things without consequence until it works. But, the initial version of your application will only last so long. Eventually, you will need to change the data to support new features or remove unsupported ones. This is called evolving a schema.

The naive approach to evolving a schema would be to simply make arbitrary changes and update everything at once. This is impractical for production applications unless you are okay with your product breaking. Let’s say you make an incompatible change like renaming a column in the database from “productNum” to “productId”. After migrating the database to “productId”, the existing application will still be looking for “productNum”. When it doesn’t find a column with that name, it breaks.

A better way would be to make only compatible changes where different systems that talk to each other can be deployed in any order. In the above example, instead of renaming the column, create an alias to the old name so that either would work.

It can be challenging to understand which kinds of changes are compatible and which ones are incompatible. Further complicating things is that there are two different categories of compatibility: backward compatibility and forward compatibility.

This post dives into the difference between the two.

Setting the stage

Here are some terms to familiarize yourself with:

Schema — Definition of the types of data and any context needed to understand it. Schemas are independent of how the data is encoded as multiple serialization options are possible (JSON, binary, etc.). Schemas can also be versioned, something which is crucial to understanding backward and forward compatibility.

Reader — The service that parses the data. In the case of a client-server application, this is the client whenever the server has sent back some interesting data. However, when talking about what data the client sends the server (e.g. input arguments to a function), the roles are reversed: the client becomes the writer.

Writer — The service that creates the data. In the case of a client-server application, this is the server, but just as before sometimes the roles are reversed.

For databases, the writer is the service that saved the row to the database initially, whereas the reader is the service that retrieves it.

The one diagram to remember:

This diagram fully explains the difference between backward and forward compatibility.

Backward compatibility means that readers with a newer schema can correctly parse data from writers with an older schema.
Forwards compatibility means that readers with an older schema can correctly parse data from writers with a newer schema.

Backward Compatibility

Backward compatibility is important because:

For the case of input parameters: you can upgrade servers without having to upgrade clients
For return types: you can upgrade clients without having to upgrade servers
For databases: you don’t encounter any data loss (without backward compatibility you wouldn’t be able to read any data written by an older version)

For JSON here is an incomplete list of backward-compatible changes:

Adding a field with a default value. Older writers will be unaware of that field so the default value will be used instead.
Adding an optional field. Older writers will be unaware of that field so null will be used instead.
Widening a numerical type (e.g. int to float). Older writers will always use ints, which are a subset of floats.
Adding a value to an enum string. Older writers will just use one of the existing enum strings.
Removing a field. Newer readers will ignore whatever was previously written in this field. (Note: this is not true of many binary serialization formats!)

Forward Compatibility

Forward compatibility is important because:

For the case of input parameters: you can upgrade clients without having to upgrade servers
For return types: you can upgrade servers without having to upgrade clients
For databases: you can run your schema migrations before deploying the new code to read it

For JSON here is an incomplete list of forward-compatible changes:

Adding a new required field. Older readers will simply ignore it.
Narrowing a numerical type (e.g. float to int). Older readers will assume ints, which are a subset of floats.
Removing a value from an enum string. Older readers can handle the full breadth of enums.
Adding a value to an enum string if and only if the reader has implemented a proper “else” case. (See note on enums)

Full Compatibility

If a change is both forward and backward compatible, then it is called fully compatible. This means you can run any combination of readers and writers without breaking anything.

For JSON here is an incomplete list of fully compatible changes (some are repeated from above):

Adding a field with a default value
Adding an optional field
Adding a value to an enum string if and only if the reader has implemented a proper “else” case. (See note on enums)

Incompatibility

If a change is neither forward nor backward compatible, then it is an incompatible change.

For JSON here is an incomplete list of incompatible changes:

Renaming a field
Changing the type of a field (other than the numeric conversions mentioned above)

[Special Note on Enums]

It is important to write code in a way that allows for the introduction of new enums. For instance, imagine you wrote this code:

if process.status == “STARTED”:
    print(“The process has started”)
else: # assumes process.status == “FINISHED”
    print(“The process hasfinished”)

Now, if the new status “CANCELLED” was added, this code would be incorrect since it would print that the job is finished even though it’s not. Instead, consider this code:

if process.status == “STARTED”:
    print(“The process has started”)
elif process.status == "FINISHED":
    print(“The process hasfinished”)
else:
    raise ValueError("Unexpected status: " + process.status)

This code is much more future proof and allows the writer to add new enum values without breaking the existing code. (Well, technically this code now throws an exception which is at least better than returning the wrong answer).

Conclusion

For clients and servers:

Backward compatibility on the server ensures that older clients can still parse the results you return
Forward compatibility ensures that older clients can still call your methods

For databases:

Backward compatibility is essential if you want to avoid modifying existing data
Forward compatibility is a nice bonus but not necessary if you control all the readers

When making a change to your schema, ask yourself the following questions:

Does this change need to be backward compatible? Are you sure? (The majority of changes that come up in practice need to be at least backward compatible)
Does this change need to be forward compatible? If not, are you sure all your readers will be aware of the change?

And never forget: the safest schema change is no change at all.