How to validate that a Protobuf message does not contain enum fields with zero value? Turn out that is not supported directly by Protobuf! We need to look into how protojson
package is implemented.
More and more companies are adopting gRPC with Protobuf for communication between internal services. It has the benefits of high performance, supporting multiple programming languages, and being backed by Google with a great ecosystem around.
For communication with front-end and external services, Protobuf can be marshaled to JSON format. The browser only understands JSON format, and we can not expect other companies to consume Protobuf directly from us. (Of course, you can, if you are big enough!)
Sample code is written in Go.
From the Protobuf style guide, the zero value enum should have the suffix UNSPECIFIED
. It’s because enum is implemented as a uint32
, and the value 0
is considered as, well, unspecified. It’s similar to nil
for a message or an empty string. When encoding Protobuf as JSON, a nil
message, an UNSPECIFIED
enum, or an empty string is ignored.
We were following that convention, until someday, we did not.
When sending external webhook messages, we decided to not use 0
as UNSPECIFIED
. One reason is that we are using EmitUnpopulated: true
to ensure that all fields are included in the JSON representation when sending webhook messages to external parties. And we don’t want that UNSPECIFIED
value to appear in the webhook messages, if somehow we forget to set an enum field to 0. Unit tests can not catch all the mistakes; we engineers know that.
This causes a lot of trouble, so we had to revert and make the value 0
as UNSPECIFIED
again. One problem is that it forces the use of EmitUnpopulated: true
everywhere! And there are places where we don’t want to emit all unpopulated fields. Like calling some third-party APIs. Some messages mix between UNSPECIFIED
enums and non-UNSPECIFIED
enums; there are no ways to send the correct format with that. Use EmitUnpopulated: true
, the third-party APIs don’t understand UNSPECIFIED
; use EmitUnpopulated: false
and some required fields with non-UNSPECIFIED
enums are omitted. Of course, they can all be refactored away, but it should be simpler to just force the use of UNSPECIFIED
at the beginning.
Turn out there are no simple ways to do that in Protobuf 3!
In Protobuf 2, there is required
option to prevent a field to be unset. This option was removed in Protobuf 3, because it prevents refactoring for removing fields. If we forgot to update every service to remove that no-longer-used required
field, especially in a company with multiple teams working together, the messages will be dropped unintentionally. It should be better not to require it upfront. (more)
In Protobuf 3, there was jsonpb.JSONPBMarshaler
interface. We can simply implement that interface for all enums to return error upon seeing a zero value. But again, it was removed! As a protocol, we should minimize the customization as much as possible. Otherwise, that customization will have to be implemented and maintained in all different languages across different teams!
We’ll have to reach the reflection package. The protoreflect.Message
interface has Range()
method for iterating over every populated field. We can use that method to verify that there are no enum fields with zero… Oh, wait. It only iterates over populated fields. So it won’t detect the zero value in enum!
But the function protojson.Marshal()
can still emit unpopulated fields with EmitUnpopulated
option. How does it implement that? Dive into encoding/protojson
, there is a code snippet for iterating over unpopulated fields (source). Let’s steal it:
// unpopulatedFieldRanger wraps a protoreflect.Message and modifies its Range
// method to additionally iterate over unpopulated fields.
type unpopulatedFieldRanger struct{ pref.Message }
func (m unpopulatedFieldRanger) Range(f func(pref.FieldDescriptor, pref.Value) bool) {
fds := m.Descriptor().Fields()
for i := 0; i < fds.Len(); i++ {
fd := fds.Get(i)
if m.Has(fd) || fd.ContainingOneof() != nil {
continue // ignore populated fields and fields within a oneofs
}
v := m.Get(fd)
isProto2Scalar := fd.Syntax() == pref.Proto2 && fd.Default().IsValid()
isSingularMessage := fd.Cardinality() != pref.Repeated && fd.Message() != nil
if isProto2Scalar || isSingularMessage {
v = pref.Value{} // use invalid value to emit null
}
if !f(fd, v) {
return
}
}
m.Message.Range(f)
}
What the above code does is iterating over additional fields, by looping over protoreflect.Message.Descriptor().Fields()
. Fields within oneof
fields are skipped. Unpopulated singular message
fields are set as invalid
(think of it as null
in generated JSON) before being sent to the input function.
Still a bit more code to write, like implementing a traveling method for iterating over all different Protobuf types: message, array (repeated), dynamic Struct, and of course, enum. But it’s solvable. And I can take a rest now.
Thanks for reading! If you have a better way to do that, please let me know by connecting on Twitter 👋
Also published at my blog.