
Protobuf and Null Support
Imagine you have the follow Protocol Buffer / gRPC definition:
service MyDataService {
rpc UpateMyData (UpdateMyDataRequest)
returns (UpdateMyDataResponse);
}
message MyData {
int32 id = 1;
string stringValue = 2;
SubData subData = 3;
}
message SubData {
int64 bigValue = 1;
}
message UpdateMyDataRequest {
MyData update = 1;
}
Now let's say you want to remove the database entry for MyData.stringValue
Your first approach would probably be something like this:
UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
.setUpdate(MyData.newBuilder()
.setId(id)
.setStringValue(null)
)
serviceFutureStub.update(request)
Only as soon as you go to run you will get a NullPointerException.
By default, setting any value in the protoc generated MessageTypes throws a NullPointerException. And on the flip side all get methods never return null. If they are unset the get will return a default value.
UpdateMyDataRequest.newBuilder().build().getStringValue() == ""
How do send a null value with Protocol Buffers?
Let me answer your question, with a question of my own.
What does it mean to be null?
The problem is null can mean different things in different contexts:
- Null is null
- Null is unset/optional
- Null is default
- Null is confused with other values
To avoid this confusion the Protobuf team decided to not serialize null values. Instead, protobuf forces you to use several explicit strategies, thereby avoiding any semantic confusion in your Protobuf / gRPC API.
In the following sections, we will address each of the null use cases outlined above and how we can represent them with Protobuf.
We are going to focus on proto3. Proto2 has other semantics that we won’t go into here.
First some basic knowledge proto3
All Fields are:
- Optional
- NEVER null
- Initialized with default values (0, empty string, etc)
Null is Null: OneOf NullValue Pattern
Sometimes, null is a valid value. For instance, null can be used to remove a value from a database column. In this example let’s say we want to allow the consumer to set MyData.stringValue to null.
Json Equivalent MyData Object:
{
"id": 123
"stringValue": null
}
As we alluded to earlier we can not set a value to null. Therefore, we need to track the null information through other means. We can do this by introducing a Nullable Type. Those familiar with Kotlin will recognize this pattern.
Proto Definition:
syntax = "proto3";
package io.github.efenglu.protobuf.examples.oneof;
option java_multiple_files = true;
import "google/protobuf/struct.proto";
service MyDataService {
rpc UpateMyData (UpdateMyDataRequest)
returns (UpdateMyDataResponse);
}
message MyData {
int32 intValue = 1;
NullableString stringValue = 2;
NullableSubData subData = 3;
}
message SubData {
int64 bigValue = 1;
}
message NullableSubData {
oneof kind {
google.protobuf.NullValue null = 1;
SubData data = 2;
}
}
message NullableString {
oneof kind {
google.protobuf.NullValue null = 1;
string data = 2;
}
}
message UpdateMyDataRequest {
MyData data = 1;
}
message UpdateMyDataResponse {
}
You will notice the two new “nullable” types:
- NullableString
- NullableSubData
The types are comprised of a oneof with the two possible values being:
- Null
- Non null object
The oneof helps us enforce that the data can’t be both null and non null.
Here is how a java client would use the generated code:
Send Null Value:
UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
.setData(MyData.newBuilder()
.setStringValue(NullableString.newBuilder()
.setNull(NullValue.NULL_VALUE)
.build()
)
.setSubData(NullableSubData.newBuilder()
.setNull(NullValue.NULL_VALUE)
.build()
).build()
).build();
service.upateMyData(request);
Notice here we call setNull to call out that we are intentionally sending a null value.
Client Send Non-Null Value:
UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
.setData(MyData.newBuilder()
.setStringValue(NullableString.newBuilder()
.setData("hello")
.build()
)
.setSubData(NullableSubData.newBuilder()
.setData(SubData.newBuilder()
.setBigValue(1234567)
.build()
).build()
).build()
).build();
service.upateMyData(request);
Notice how in this client code we call setData to send the actual data.
Server Implementation:
if (request.hasData()) {
if (request.getData().hasStringValue()) {
final String nullableString;
if (request.getData().getStringValue().hasNull()) {
nullableString = null;
} else {
nullableString = request.getData()
.getStringValue()
.getData();
}
}
if (request.getData().hasSubData()) {
final SubData nullableSubData;
if (request.getData().getSubData().hasNull()) {
nullableSubData = null;
} else {
nullableSubData = request.getData()
.getSubData()
.getData();
}
}
}
Notice how we can ensure the value is null/non null, and that the client has actually set the value.
Pros:
- Type safety of nullable values, creates a different MessageType for values that are nullable
- Very explicit, ensure null is a set value
Cons:
- Requires null value Message types
- Not great for lots of types
Null as Optional: FieldMask Pattern
This is useful where the client need to update only part of an object, or when creating query/search parameters that return partially populated objects.
Here null is being used to signify missing information that should NOT be interpreted. IE, the value is null not because we want it to be null, it’s null because we don’t care. You would typically see this in the omissions of a json field.
{
"id": 123
-- ommited "stringValue" --
}
We will do a similar thing with proto, only we will also be explicit and tell the server which fields we actually omitted.
Proto Definition:
service MyDataService {
rpc Update (UpdateMyDataRequest) returns (UpdateMyDataResponse);
rpc List (ListMyDataRequest) returns (ListMyDataResponse);
}
message MyData {
int32 id = 1;
string stringValue = 2;
SubData subData = 3;
}
message SubData {
int64 bigValue = 1;
}
message UpdateMyDataRequest {
MyData update = 1;
google.protobuf.FieldMask field_mask = 2;
}
message UpdateMyDataResponse {
MyData new_data = 1;
}
message ListMyDataRequest {
int32 id = 1;
google.protobuf.FieldMask field_mask = 2;
}
message ListMyDataResponse {
repeated MyData data = 1;
}
Notice that UpdateMyDataRequest and ListMyDataRequest have a FieldMask field. This is a special type that will convey which of the fields within the data should be of concern.
Sample Client Usage:
MyData sendUpdate(int id, String value) {
UpdateMyDataRequest request = UpdateMyDataRequest.newBuilder()
.setUpdate(MyData.newBuilder()
.setId(id)
.setStringValue(value)
)
.setFieldMask(FieldMaskUtil.fromFieldNumbers(
MyData.class,
MyData.STRINGVALUE_FIELD_NUMBER)
)
.build();
return serviceFutureStub.update(request).getNewData();
}
List<MyData> listOnlySubData(int id) {
ListMyDataRequest request = ListMyDataRequest.newBuilder()
.setId(id)
.setFieldMask(FieldMaskUtil.fromFieldNumbers(
MyData.class,
MyData.SUBDATA_FIELD_NUMBER)
)
.build();
return serviceFutureStub.list(request).getDataList();
}
Sample Server Implementation:
@Override
public void update(
UpdateMyDataRequest request,
StreamObserver<UpdateMyDataResponse> responseObserver
) {
MyData updateData = request.getUpdate();
FieldMask fieldMask = request.getFieldMask();
// Fetch exiting Values
MyData existing = repo.readData(updateData.getId());
MyData.Builder builder = existing.toBuilder();
// Update only the fields listed in the fieldmask
FieldMaskUtil.merge(fieldMask, updateData, builder);
// Store the result
repo.writeData(builder.build());
// Send the new state back
responseObserver.onNext(UpdateMyDataResponse.newBuilder()
.setNewData(builder)
.build()
);
}
Notice in the update:
- Fetch the existing value of the object we want to update
- Transform into builder
- Merge the input data onto the builder using the Field Mask Util
- Store the new state
- Return the new Value
The FieldMaskUtil will only copy the fields listed in the fields mask from the input request and leave any other fields intact with their existing value.
@Override
public void list(
ListMyDataRequest request,
StreamObserver<ListMyDataResponse> responseObserver
) {
int id = request.getId();
FieldMask fieldMask = request.getFieldMask();
// Fetch the list
List<MyData> result = repo.listData(id);
ListMyDataResponse.Builder response =
ListMyDataRespons.newBuilder();
MyData.Builder builder = MyData.newBuilder();
for (MyData data : result) {
builder.clear();
// Use the field mask to send back ONLY the data requested
FieldMaskUtil.merge(fieldMask, data, builder);
response.addData(builder);
}
// Send the filtered list back
responseObserver.onNext(response.build());
}
Here is a lot of the same only we are returning a filtered value.
- Fetch the list
- For each of the list elements filter the element to only return the fields requested
- Return the filtered list
Pros:
- Concise Code
- Easier to test
Cons:
- FieldMask concept can be hard to understand
- Requires client to manually call out fields into field mask, may seem duplicative
- Semantic contract of fields can break
Null as Optional: Has Pattern
The last pattern is where most people start when it comes to protobuf. Every field in a message type that is a non-primitive generates a “has” method that returns a boolean. This method returns true if the value “has been set”. We can utilize this feature to see when a consumer “has set a value”. We can then infer that the unset fields are not important.
Now this only works with non-primitive types, ie Message types. Proto3 provides wrappers for all primitive types if you need this behavior with primitives.
...
import "google/protobuf/wrappers.proto";
service MyDataService {
rpc Update (UpdateMyDataRequest) returns (UpdateMyDataResponse);
}
...
message UpdateMyDataRequest {
int32 id = 1;
google.protobuf.StringValue stringValue = 2;
UpdateSubData subData = 3;
}
message UpdateSubData {
google.protobuf.Int64Value bigValue = 1;
}
...
Note the import of google/protobuf/wrappers.proto and the google.protobuf.StringValue and google.protobuf.Int64Value. These fields are no longer primitives and thus will have a “has” method generated.
Client Usage:
void update() {
service.update(UpdateMyDataRequest.newBuilder()
.setStringValue(StringValue.of("customValue"))
.build()
);
}
Here the client sets the fields they want to use. The one caveat is the string value field must be populated with a StringValue object as seen above.
Those familiar with Java pre-auto boxing will recognize this pattern.
Server Implementation:
@Override
public void update(
UpdateMyDataRequest request,
StreamObserver<UpdateMyDataResponse> responseObserver) {
// Fetch exiting Values
MyData existing = repo.readData(request.getId());
MyData.Builder builder = existing.toBuilder();
// Update Fields as necessary
if (request.hasStringValue()) {
builder.setStringValue(request.getStringValue().getValue());
}
if (request.hasSubData()) {
if (request.getSubData().hasBigValue()) {
builder.setSubData(
builder.getSubData().toBuilder()
.setBigValue(request.getSubData()
.getBigValue()
.getValue()
)
);
}
}
repo.writeData(builder.build());
responseObserver.onNext(UpdateMyDataResponse.newBuilder()
.setNewData(builder)
.build()
);
}
The server implementation is also very similar. However, as apposed to delegating the field merging to the FieldMaskUtil we must manually merge the fields.
- Fetch the existing object
- Transform to builder
- For each field and recursive field check has and assign as necessary
- Store the value
- Return the new value
Pros:
- Conceptually easy to understand
- Small Client code
Cons:
- Server implementation easy to break: miss a has, or add a field and the merge is broken
- Large server code with lots of branches
Null Anti-Pattern: Default Value
We’ve discussed several patterns that you “should” use and their pros and cons. Now lets discuss a pattern you SHOULD NOT USE!
Checking the value against the default.
You may be tempted to say.
Oh, if the value is default then I know that the value wasn’t set and is therefore, null.
FALSE!
- The consumer may have set the value to the default value
- Proto3 don’t allow you to provide a default value, they are just typical defaults (0, “”) and therefore, slightly ambiguous
Don’t try to be smart, just treat the value as the value.
Do not try and create your own “has” for primitive types from the default values. Use the “has” methods, and primitive wrappers.
Null Anti-Pattern: Null String
Image seeing this code:
String value;
if (value != null) {
// insert value into database
}
See anything wrong here.
- What is the value is the empty string “” ?
- Or what if the value is all whitespace “ “?
Protobuf treats strings as primitive types and therefore they can not be null. Instead of checking if the string is not null use standard libraries, like apache commons, to check if the string is not blank.
String value;
if (StringUtils.isNotBlank(value)) {
// insert value into database
}
This is clear that the value will be inserted if the value is not blank.
Advanced Use Cases
Our team wanted to go a step further with our “null support”. We created custom protoc code generation plugins to customize the generated code to suite our purposes.
We added support for:
- Optional return types
We also forked the protoc generator to allow for:
- Null check on get
- Allow setting a null to clear the value
- Created a “has” method for primitive types
Look at my other medium article for details on how you could create your own protoc plugin.
In summary. Yes, it is true, Protocol Buffers does NOT support null values. However, in the grand scheme of things is that so bad.
Protocol buffers forces you to ask:
- How you are using the value?
- What am I trying to say with null here?
- Can I express this without the ambiguity of null?
For complete examples checkout my Github Repo related to this article: