avatarSteven Heidel

Summary

The web content discusses the concepts of backward and forward compatibility in client-server applications, emphasizing the importance of schema evolution to accommodate changes without breaking existing systems.

Abstract

In client-server applications, compatibility between the client and server is crucial for seamless communication. The article explains that as applications evolve, their data schemas must also change to support new features or deprecate old ones. This evolution must be managed carefully to maintain backward compatibility (new readers can understand old data) and forward compatibility (old readers can understand new data). The author outlines compatible and incompatible schema changes, the roles of readers and writers, and the implications for databases. Backward compatibility is essential for avoiding data loss and allowing server upgrades without client updates, while forward compatibility facilitates client upgrades without server changes. The article also touches on fully compatible changes that ensure seamless interaction between any combination of readers and writers, as well as the handling of enums to prevent code breakage with new enum values.

Opinions

  • The author suggests that making arbitrary changes to a schema without considering compatibility is naive and impractical for production environments.
  • Creating aliases for renamed columns instead of directly renaming them is a better approach to schema evolution.
  • The addition of new fields with default values or as optional is seen as a backward-compatible change.
  • Narrowing numerical types and removing enum values are considered forward-compatible changes under certain conditions.
  • The author emphasizes the importance of writing future-proof code, especially when dealing with enums, to handle new values gracefully.
  • The article posits that the safest schema change is no change at all, highlighting the risks associated with schema evolution.
  • The author believes that while forward compatibility for databases is a nice bonus, it is not as essential as backward compatibility if one controls all the readers.
  • Compatibility, particularly schema evolution, is acknowledged as a complex topic with many nuances, and the author references further reading materials for a deeper understanding.

Backward vs. Forward Compatibility

When building a client-server application, the client and server need to agree on how to talk to each other. For instance, if sending JSON, then the client and server have to agree on field names and data types. For databases, the concept is similar; without a schema, the only information you could get back would be an ordered bag of values. Values are meaningless without context.

It’s easy to build the first version when the application is still in development. Coming up with an agreement on the fields and types of data is relatively straightforward. You can break things without consequence until it works. But, the initial version of your application will only last so long. Eventually, you will need to change the data to support new features or remove unsupported ones. This is called evolving a schema.

The naive approach to evolving a schema would be to simply make arbitrary changes and update everything at once. This is impractical for production applications unless you are okay with your product breaking. Let’s say you make an incompatible change like renaming a column in the database from “productNum” to “productId”. After migrating the database to “productId”, the existing application will still be looking for “productNum”. When it doesn’t find a column with that name, it breaks.

A better way would be to make only compatible changes where different systems that talk to each other can be deployed in any order. In the above example, instead of renaming the column, create an alias to the old name so that either would work.

It can be challenging to understand which kinds of changes are compatible and which ones are incompatible. Further complicating things is that there are two different categories of compatibility: backward compatibility and forward compatibility.

This post dives into the difference between the two.

Setting the stage

Here are some terms to familiarize yourself with:

Schema — Definition of the types of data and any context needed to understand it. Schemas are independent of how the data is encoded as multiple serialization options are possible (JSON, binary, etc.). Schemas can also be versioned, something which is crucial to understanding backward and forward compatibility.

Reader — The service that parses the data. In the case of a client-server application, this is the client whenever the server has sent back some interesting data. However, when talking about what data the client sends the server (e.g. input arguments to a function), the roles are reversed: the client becomes the writer.

Writer — The service that creates the data. In the case of a client-server application, this is the server, but just as before sometimes the roles are reversed.

For databases, the writer is the service that saved the row to the database initially, whereas the reader is the service that retrieves it.

The one diagram to remember:

This diagram fully explains the difference between backward and forward compatibility.

  • Backward compatibility means that readers with a newer schema can correctly parse data from writers with an older schema.
  • Forwards compatibility means that readers with an older schema can correctly parse data from writers with a newer schema.

Backward Compatibility

Backward compatibility is important because:

  • For the case of input parameters: you can upgrade servers without having to upgrade clients
  • For return types: you can upgrade clients without having to upgrade servers
  • For databases: you don’t encounter any data loss (without backward compatibility you wouldn’t be able to read any data written by an older version)

For JSON here is an incomplete list of backward-compatible changes:

  • Adding a field with a default value. Older writers will be unaware of that field so the default value will be used instead.
  • Adding an optional field. Older writers will be unaware of that field so null will be used instead.
  • Widening a numerical type (e.g. int to float). Older writers will always use ints, which are a subset of floats.
  • Adding a value to an enum string. Older writers will just use one of the existing enum strings.
  • Removing a field. Newer readers will ignore whatever was previously written in this field. (Note: this is not true of many binary serialization formats!)

Forward Compatibility

Forward compatibility is important because:

  • For the case of input parameters: you can upgrade clients without having to upgrade servers
  • For return types: you can upgrade servers without having to upgrade clients
  • For databases: you can run your schema migrations before deploying the new code to read it

For JSON here is an incomplete list of forward-compatible changes:

  • Adding a new required field. Older readers will simply ignore it.
  • Narrowing a numerical type (e.g. float to int). Older readers will assume ints, which are a subset of floats.
  • Removing a value from an enum string. Older readers can handle the full breadth of enums.
  • Adding a value to an enum string if and only if the reader has implemented a proper “else” case. (See note on enums)

Full Compatibility

If a change is both forward and backward compatible, then it is called fully compatible. This means you can run any combination of readers and writers without breaking anything.

For JSON here is an incomplete list of fully compatible changes (some are repeated from above):

  • Adding a field with a default value
  • Adding an optional field
  • Adding a value to an enum string if and only if the reader has implemented a proper “else” case. (See note on enums)

Incompatibility

If a change is neither forward nor backward compatible, then it is an incompatible change.

For JSON here is an incomplete list of incompatible changes:

  • Renaming a field
  • Changing the type of a field (other than the numeric conversions mentioned above)

[Special Note on Enums]

It is important to write code in a way that allows for the introduction of new enums. For instance, imagine you wrote this code:

if process.status == “STARTED”:
    print(“The process has started”)
else: # assumes process.status == “FINISHED”
    print(“The process hasfinished”)

Now, if the new status “CANCELLED” was added, this code would be incorrect since it would print that the job is finished even though it’s not. Instead, consider this code:

if process.status == “STARTED”:
    print(“The process has started”)
elif process.status == "FINISHED":
    print(“The process hasfinished”)
else:
    raise ValueError("Unexpected status: " + process.status)

This code is much more future proof and allows the writer to add new enum values without breaking the existing code. (Well, technically this code now throws an exception which is at least better than returning the wrong answer).

Conclusion

For clients and servers:

  • Backward compatibility on the server ensures that older clients can still parse the results you return
  • Forward compatibility ensures that older clients can still call your methods

For databases:

  • Backward compatibility is essential if you want to avoid modifying existing data
  • Forward compatibility is a nice bonus but not necessary if you control all the readers

When making a change to your schema, ask yourself the following questions:

  • Does this change need to be backward compatible? Are you sure? (The majority of changes that come up in practice need to be at least backward compatible)
  • Does this change need to be forward compatible? If not, are you sure all your readers will be aware of the change?

And never forget: the safest schema change is no change at all.

Further Reading

Compatibility is a huge topic. There are many details I glossed over to keep this article short. There is also another even larger topic on code compatibility that is even more complicated. (For instance, why is renaming a parameter in Java a compatible change but an incompatible change in Python?)

Most of my understanding of schema evolution was learned several years ago at LinkedIn. Here are two articles written by former LinkedIn engineers:

Software Engineering
Programming
Json
Schema
Computer Science
Recommended from ReadMedium