avatarErik Englund

Summary

The web content discusses strategies for handling null values in Protocol Buffers (protobuf) and gRPC, emphasizing the importance of explicit null representation due to protobuf's design choice to not serialize nulls.

Abstract

The article "Protobuf and Null Support" delves into the challenges of representing null values within Protocol Buffers, a serialization format used in conjunction with gRPC for efficient data communication. It explains that protobuf does not support null values by default, which can lead to runtime exceptions like NullPointerException when developers attempt to set fields to null. The author outlines several patterns to handle nulls in protobuf, including the "OneOf NullValue Pattern," the "FieldMask Pattern," and the "Has Pattern," each with its pros and cons. These patterns provide type-safe and explicit ways to represent and communicate null values between clients and servers. Additionally, the article warns against "Null Anti-Patterns," such as relying on default values or checking for null strings, which can lead to ambiguous interpretations. The author also mentions advanced customizations to the protobuf code generation process to better handle null values, providing links to further resources and code examples on GitHub.

Opinions

  • The author suggests that the explicit handling of null values in protobuf, although more verbose, leads to clearer semantics and reduces ambiguity in API interactions.
  • The "OneOf NullValue Pattern" is recommended for scenarios where null is a valid and distinct value that needs to be communicated explicitly.
  • The "FieldMask Pattern" is advocated for cases where the client needs to update only certain parts of an object or when dealing with query parameters that may return partially populated objects.
  • The "Has Pattern" is presented as a straightforward approach for non-primitive types, leveraging the generated "has" methods to determine if a field has been set.
  • The article cautions against treating default values as null, emphasizing that this practice can lead to incorrect assumptions about the data's intent.
  • The author promotes the use of custom protoc plugins for advanced use cases, allowing for more nuanced control over the generated code to support optional return types and null checks on get methods.
  • The article encourages developers to think critically about how they use null values and to choose the most appropriate pattern for their specific use case, rather than relying on the familiar but ambiguous concept of null from other programming languages.
Image by David Mark from Pixabay

Protobuf and Null Support

Imagine you have the follow Protocol Buffer / gRPC definition:

service MyDataService {
  rpc UpateMyData (UpdateMyDataRequest) 
     returns (UpdateMyDataResponse);
}
message MyData {
  int32 id = 1;
  string stringValue = 2;
  SubData subData = 3;
}

message SubData {
 int64 bigValue = 1;
}
message UpdateMyDataRequest {
  MyData update = 1;
}

Now let's say you want to remove the database entry for MyData.stringValue

Your first approach would probably be something like this:

UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
  .setUpdate(MyData.newBuilder()
    .setId(id)
    .setStringValue(null)
  )

serviceFutureStub.update(request)

Only as soon as you go to run you will get a NullPointerException.

By default, setting any value in the protoc generated MessageTypes throws a NullPointerException. And on the flip side all get methods never return null. If they are unset the get will return a default value.

UpdateMyDataRequest.newBuilder().build().getStringValue() == ""

How do send a null value with Protocol Buffers?

Let me answer your question, with a question of my own.

What does it mean to be null?

The problem is null can mean different things in different contexts:

  • Null is null
  • Null is unset/optional
  • Null is default
  • Null is confused with other values

To avoid this confusion the Protobuf team decided to not serialize null values. Instead, protobuf forces you to use several explicit strategies, thereby avoiding any semantic confusion in your Protobuf / gRPC API.

In the following sections, we will address each of the null use cases outlined above and how we can represent them with Protobuf.

We are going to focus on proto3. Proto2 has other semantics that we won’t go into here.

First some basic knowledge proto3

All Fields are:

  1. Optional
  2. NEVER null
  3. Initialized with default values (0, empty string, etc)

Null is Null: OneOf NullValue Pattern

Sometimes, null is a valid value. For instance, null can be used to remove a value from a database column. In this example let’s say we want to allow the consumer to set MyData.stringValue to null.

Json Equivalent MyData Object:

{
  "id": 123
  "stringValue": null
}

As we alluded to earlier we can not set a value to null. Therefore, we need to track the null information through other means. We can do this by introducing a Nullable Type. Those familiar with Kotlin will recognize this pattern.

Proto Definition:

syntax = "proto3";

package io.github.efenglu.protobuf.examples.oneof;

option java_multiple_files = true;

import "google/protobuf/struct.proto";

service MyDataService {
  rpc UpateMyData (UpdateMyDataRequest) 
     returns (UpdateMyDataResponse);
}

message MyData {
  int32 intValue = 1;
  NullableString stringValue = 2;
  NullableSubData subData = 3;
}

message SubData {
  int64 bigValue = 1;
}

message NullableSubData {
  oneof kind {
    google.protobuf.NullValue null = 1;
    SubData data = 2;
  }
}

message NullableString {
  oneof kind {
    google.protobuf.NullValue null = 1;
    string data = 2;
  }
}

message UpdateMyDataRequest {
 MyData data = 1;
}


message UpdateMyDataResponse {

}

You will notice the two new “nullable” types:

  • NullableString
  • NullableSubData

The types are comprised of a oneof with the two possible values being:

  • Null
  • Non null object

The oneof helps us enforce that the data can’t be both null and non null.

Here is how a java client would use the generated code:

Send Null Value:

UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
  .setData(MyData.newBuilder()
    .setStringValue(NullableString.newBuilder()
      .setNull(NullValue.NULL_VALUE)
      .build()
    )
    .setSubData(NullableSubData.newBuilder()
      .setNull(NullValue.NULL_VALUE)
      .build()
    ).build()
).build();

service.upateMyData(request);

Notice here we call setNull to call out that we are intentionally sending a null value.

Client Send Non-Null Value:

UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
  .setData(MyData.newBuilder()
    .setStringValue(NullableString.newBuilder()
      .setData("hello")
      .build()
    )
    .setSubData(NullableSubData.newBuilder()
      .setData(SubData.newBuilder()
        .setBigValue(1234567)
      .build()
    ).build()
  ).build()
).build();

service.upateMyData(request);

Notice how in this client code we call setData to send the actual data.

Server Implementation:

if (request.hasData()) {

  if (request.getData().hasStringValue()) {
    final String nullableString;
    if (request.getData().getStringValue().hasNull()) {
      nullableString = null;
    } else {
      nullableString = request.getData()
        .getStringValue()
        .getData();
      }
  }

  if (request.getData().hasSubData()) {
    final SubData nullableSubData;
    if (request.getData().getSubData().hasNull()) {
      nullableSubData = null;
    } else {
      nullableSubData = request.getData()
       .getSubData()
       .getData();
    }
  }

}

Notice how we can ensure the value is null/non null, and that the client has actually set the value.

Pros:

  • Type safety of nullable values, creates a different MessageType for values that are nullable
  • Very explicit, ensure null is a set value

Cons:

  • Requires null value Message types
  • Not great for lots of types

Null as Optional: FieldMask Pattern

This is useful where the client need to update only part of an object, or when creating query/search parameters that return partially populated objects.

Here null is being used to signify missing information that should NOT be interpreted. IE, the value is null not because we want it to be null, it’s null because we don’t care. You would typically see this in the omissions of a json field.

{
 "id": 123
 -- ommited "stringValue" --
}

We will do a similar thing with proto, only we will also be explicit and tell the server which fields we actually omitted.

Proto Definition:

service MyDataService {
  rpc Update (UpdateMyDataRequest) returns (UpdateMyDataResponse);
  rpc List (ListMyDataRequest) returns (ListMyDataResponse);
}

message MyData {
  int32 id = 1;
  string stringValue = 2;
  SubData subData = 3;
}

message SubData {
  int64 bigValue = 1;
}

message UpdateMyDataRequest {
  MyData update = 1;
  google.protobuf.FieldMask field_mask = 2;
}

message UpdateMyDataResponse {
  MyData new_data = 1;
}

message ListMyDataRequest {
  int32 id = 1;
  google.protobuf.FieldMask field_mask = 2;
}

message ListMyDataResponse {
  repeated MyData data = 1;
}

Notice that UpdateMyDataRequest and ListMyDataRequest have a FieldMask field. This is a special type that will convey which of the fields within the data should be of concern.

Sample Client Usage:

MyData sendUpdate(int id, String value) {
  UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
    .setUpdate(MyData.newBuilder()
      .setId(id)
      .setStringValue(value)
    )
    .setFieldMask(FieldMaskUtil.fromFieldNumbers(
      MyData.class, 
      MyData.STRINGVALUE_FIELD_NUMBER)
    )
    .build();

  return serviceFutureStub.update(request).getNewData();
}

List<MyData> listOnlySubData(int id) {
  ListMyDataRequest request = ListMyDataRequest.newBuilder()
    .setId(id)
    .setFieldMask(FieldMaskUtil.fromFieldNumbers(
      MyData.class, 
      MyData.SUBDATA_FIELD_NUMBER)
    )
    .build();

  return serviceFutureStub.list(request).getDataList();
}

Sample Server Implementation:

@Override
public void update(
  UpdateMyDataRequest request,
  StreamObserver<UpdateMyDataResponse> responseObserver
) {

  MyData updateData = request.getUpdate();
  FieldMask fieldMask = request.getFieldMask();

  // Fetch exiting Values
  MyData existing = repo.readData(updateData.getId());
  MyData.Builder builder = existing.toBuilder();

  // Update only the fields listed in the fieldmask
  FieldMaskUtil.merge(fieldMask, updateData, builder);

  // Store the result
  repo.writeData(builder.build());

  // Send the new state back
  responseObserver.onNext(UpdateMyDataResponse.newBuilder()
    .setNewData(builder)
    .build()
  );
}

Notice in the update:

  1. Fetch the existing value of the object we want to update
  2. Transform into builder
  3. Merge the input data onto the builder using the Field Mask Util
  4. Store the new state
  5. Return the new Value

The FieldMaskUtil will only copy the fields listed in the fields mask from the input request and leave any other fields intact with their existing value.

@Override
public void list(
  ListMyDataRequest request,
  StreamObserver<ListMyDataResponse> responseObserver
) {
  int id = request.getId();
  FieldMask fieldMask = request.getFieldMask();
  // Fetch the list
  List<MyData> result = repo.listData(id);

  ListMyDataResponse.Builder response = 
    ListMyDataRespons.newBuilder();
  MyData.Builder builder = MyData.newBuilder();
  for (MyData data : result) {
    builder.clear();

    // Use the field mask to send back ONLY the data requested
    FieldMaskUtil.merge(fieldMask, data, builder);

    response.addData(builder);
  }

  // Send the filtered list back
  responseObserver.onNext(response.build());
}

Here is a lot of the same only we are returning a filtered value.

  1. Fetch the list
  2. For each of the list elements filter the element to only return the fields requested
  3. Return the filtered list

Pros:

  1. Concise Code
  2. Easier to test

Cons:

  1. FieldMask concept can be hard to understand
  2. Requires client to manually call out fields into field mask, may seem duplicative
  3. Semantic contract of fields can break

Null as Optional: Has Pattern

The last pattern is where most people start when it comes to protobuf. Every field in a message type that is a non-primitive generates a “has” method that returns a boolean. This method returns true if the value “has been set”. We can utilize this feature to see when a consumer “has set a value”. We can then infer that the unset fields are not important.

Now this only works with non-primitive types, ie Message types. Proto3 provides wrappers for all primitive types if you need this behavior with primitives.

...

import "google/protobuf/wrappers.proto";

service MyDataService {
  rpc Update (UpdateMyDataRequest) returns (UpdateMyDataResponse);
}
...
message UpdateMyDataRequest {
  int32 id = 1;
  google.protobuf.StringValue stringValue = 2;
  UpdateSubData subData = 3;
}

message UpdateSubData {
  google.protobuf.Int64Value bigValue = 1;
}
...

Note the import of google/protobuf/wrappers.proto and the google.protobuf.StringValue and google.protobuf.Int64Value. These fields are no longer primitives and thus will have a “has” method generated.

Client Usage:

void update() {
  service.update(UpdateMyDataRequest.newBuilder()
    .setStringValue(StringValue.of("customValue"))
    .build()
  );
}

Here the client sets the fields they want to use. The one caveat is the string value field must be populated with a StringValue object as seen above.

Those familiar with Java pre-auto boxing will recognize this pattern.

Server Implementation:

@Override
public void update(
  UpdateMyDataRequest request,
  StreamObserver<UpdateMyDataResponse> responseObserver) {

  // Fetch exiting Values
  MyData existing = repo.readData(request.getId());
  MyData.Builder builder = existing.toBuilder();

  // Update Fields as necessary
  if (request.hasStringValue()) {
    builder.setStringValue(request.getStringValue().getValue());
  }

  if (request.hasSubData()) {
    if (request.getSubData().hasBigValue()) {
      builder.setSubData(
        builder.getSubData().toBuilder()
          .setBigValue(request.getSubData()
            .getBigValue()
            .getValue()
          )
        );
      }
  }

  repo.writeData(builder.build());

  responseObserver.onNext(UpdateMyDataResponse.newBuilder()
    .setNewData(builder)
    .build()
  );
}

The server implementation is also very similar. However, as apposed to delegating the field merging to the FieldMaskUtil we must manually merge the fields.

  1. Fetch the existing object
  2. Transform to builder
  3. For each field and recursive field check has and assign as necessary
  4. Store the value
  5. Return the new value

Pros:

  1. Conceptually easy to understand
  2. Small Client code

Cons:

  1. Server implementation easy to break: miss a has, or add a field and the merge is broken
  2. Large server code with lots of branches

Null Anti-Pattern: Default Value

We’ve discussed several patterns that you “should” use and their pros and cons. Now lets discuss a pattern you SHOULD NOT USE!

Checking the value against the default.

You may be tempted to say.

Oh, if the value is default then I know that the value wasn’t set and is therefore, null.

FALSE!

  • The consumer may have set the value to the default value
  • Proto3 don’t allow you to provide a default value, they are just typical defaults (0, “”) and therefore, slightly ambiguous

Don’t try to be smart, just treat the value as the value.

Do not try and create your own “has” for primitive types from the default values. Use the “has” methods, and primitive wrappers.

Null Anti-Pattern: Null String

Image seeing this code:

String value;
if (value != null) {
  // insert value into database
}

See anything wrong here.

  • What is the value is the empty string “” ?
  • Or what if the value is all whitespace “ “?

Protobuf treats strings as primitive types and therefore they can not be null. Instead of checking if the string is not null use standard libraries, like apache commons, to check if the string is not blank.

String value;
if (StringUtils.isNotBlank(value)) {
  // insert value into database
}

This is clear that the value will be inserted if the value is not blank.

Advanced Use Cases

Our team wanted to go a step further with our “null support”. We created custom protoc code generation plugins to customize the generated code to suite our purposes.

We added support for:

  • Optional return types

We also forked the protoc generator to allow for:

  • Null check on get
  • Allow setting a null to clear the value
  • Created a “has” method for primitive types

Look at my other medium article for details on how you could create your own protoc plugin.

In summary. Yes, it is true, Protocol Buffers does NOT support null values. However, in the grand scheme of things is that so bad.

Protocol buffers forces you to ask:

  • How you are using the value?
  • What am I trying to say with null here?
  • Can I express this without the ambiguity of null?

For complete examples checkout my Github Repo related to this article:

Grpc
Protobuf
Protocol Buffers
Java
Null
Recommended from ReadMedium