On my current project, I work with protobuf not only for GRPC, but also as RabbitMQ message format. While the advantages of protobuf are not limited to its speed, I was wondering if it's really so fast compared to JSON libraries, especially in the ruby world. I decided to make some benchmarks to check it, but first I want to add a brief introduction to each format.
It's a fast, compact cross-platform message system, designed with forward-and backward compatibility in mind. It consists of a definition language and language-specific compilers.
It works perfectly for small object-like data, has great backward and forward compatibility, is fast(we are not sure yet), and is more compact than JSON, for example, but have some limitations like not supporting direct comparison(you need to deserialize objects to compare).
It's not compressed and some specific formats can work better for their data(for example JPEG). It's not self-describing.
See Official docs for more details.
JSON is an abbreviation for JavaScript object notation. A text-based data format, which was originally used in JavaScript, but later got widely spread as a communication format not only between JS apps and backend but even between microservices and has multiple other usages.
It uses strings as keys and has a string, number, boolean, object, array and nul as available types for value. The main advantage of it is that it is human-readable, and pretty easy to serialize and parse for programming language.
See the site for more details.
I picked up three popular ruby JSON libraries. They are Oj, Yajl, and standard JSON library. For protobuf, I use standard google protoc with google ruby gem.
I will measure different specific types of payload to see which data type we will show the most difference, as long as complex payload with a mix of field types.
You can see all the code here https://github.com/alexstaro/proto-vs-json.
As a hardware I use laptop with AMD Ryzen 3 PRO 5450U and 16gb of ddr4 ram.
As an operating system, I use Ubuntu 22.10 kinetic.
Ruby version 3.2.1 was installed via asdf.
For benchmarking, I use benchmark/ips gem (https://github.com/evanphx/benchmark-ips)
The setup looks like this:
Benchmark.ips do |x|
x.config(time: 20, warmup: 5)
x.report('Yajl encoding') do
Yajl::Encoder.encode(data)
end
...
x.compare!
end
We will start with only integers. Numbers are pretty hard for JSON so we expect protobuf to be far away from other competitors.
The test data:
data = {
field1: 2312345434234,
field2: 31415926,
field3: 43161592,
field4: 23141596,
field5: 61415923,
field6: 323423434343443,
field7: 53141926,
field8: 13145926,
field9: 323423434343443,
field10: 43161592
}
Benchmark results:
protobuf encoding: 4146929.7 i/s
Oj encoding: 1885092.0 i/s - 2.20x slower
standard JSON encoding: 505697.5 i/s - 8.20x slower
Yajl encoding: 496121.7 i/s - 8.36x slower
There is no doubt that protobuf is an absolute winner, but what if we make the test more closer to the real-world scenario - we almost always create proto messages only for serialization.
Here are the results:
protobuf encoding: 4146929.7 i/s
Oj encoding: 1885092.0 i/s - 2.20x slower
standard JSON encoding: 505697.5 i/s - 8.20x slower
Yajl encoding: 496121.7 i/s - 8.36x slower
protobuf with model init: 489658.0 i/s - 8.47x slower
The result is not so obvious. I expected encoding with message initialization would be slower but not the slowest.
Let's check deserialization:
protobuf parsing: 737979.5 i/s
Oj parsing: 448833.9 i/s - 1.64x slower
standard JSON parsing: 297127.2 i/s - 2.48x slower
Yajl parsing: 184361.1 i/s - 4.00x slower
There are no surprises here.
In terms of payload size, protobuf is almost 4 times more compact compared to json:
JSON payload bytesize 201
Protobuf payload bytesize 58
Doubles are expected to be the hardest payloads for JSON, let's check this out.
Our payload:
data = {
field1: 2312.345434234,
field2: 31415.926,
field3: 4316.1592,
field4: 23141.596,
field5: 614159.23,
field6: 3234234.34343443,
field7: 53141.926,
field8: 13145.926,
field9: 323423.434343443,
field10: 43161.592
}
Result:
protobuf encoding: 4814662.9 i/s
protobuf with model init: 444424.1 i/s - 10.83x slower
Oj encoding: 297152.0 i/s - 16.20x slower
Yajl encoding: 160251.9 i/s - 30.04x slower
standard JSON encoding: 158724.3 i/s - 30.33x slower
Protobuf is much faster even with model initialization. Let's check the deserialization:
Comparison:
protobuf parsing: 822226.6 i/s
Oj parsing: 395411.3 i/s - 2.08x slower
standard JSON parsing: 241438.7 i/s - 3.41x slower
Yajl parsing: 157235.7 i/s - 5.23x slower
Still no surprises here.
and the payload size:
JSON payload bytesize 211
Protobuf payload bytesize 90
Not four times, but still noticeable.
Strings are expected to be easier for JSON, let's check this out.
payload:
data = {
field1: "2312.345434234",
field2: "31415.926",
field3: "4316.1592",
field4: "23141.596",
field5: "614159.23",
field6: "3234234.34343443",
field7: "53141.926",
field8: "13145.926",
field9: "323423.434343443",
field10: "43161.592"
}
Bench results:
Comparison:
protobuf encoding: 3990298.3 i/s
oj encoder: 1848941.3 i/s - 2.16x slower
yajl encoder: 455222.0 i/s - 8.77x slower
standard JSON encoding: 444245.6 i/s - 8.98x slower
protobuf with model init: 368818.3 i/s - 10.82x slower
Deserialization:
Comparison:
protobuf parser: 631262.5 i/s
oj parser: 378697.6 i/s - 1.67x slower
standard JSON parser: 322923.5 i/s - 1.95x slower
yajl parser: 187593.4 i/s - 3.37x slower
The payload size:
JSON payload bytesize 231
Protobuf payload bytesize 129
Despite we have separated integers bench it's interesting how protobuf handles collections.
Here is the data:
data = {
field1: [
2312345434234, 31415926, 43161592, 23141596, 61415923, 323423434343443, 53141926, 13145926, 323423434343443, 43161592
]
}
Serialization bench:
Comparison:
protobuf encoding: 4639726.6 i/s
oj encoder: 2929662.1 i/s - 1.58x slower
standard JSON encoding: 699299.2 i/s - 6.63x slower
yajl encoder: 610215.5 i/s - 7.60x slower
protobuf with model init: 463057.9 i/s - 10.02x slower
Deserialization bench:
Comparison:
oj parser: 1190763.1 i/s
protobuf parser: 760307.3 i/s - 1.57x slower
standard JSON parser: 619360.4 i/s - 1.92x slower
yajl parser: 414352.4 i/s - 2.87x slower
To be honest, the deserialization results are pretty unexpected here.
Let's check payload size:
JSON payload bytesize 121
Protobuf payload bytesize 50
I decided to check if an array of doubles shares the same behavior.
data:
data = {
field1: [
2312.345434234, 31415.926, 4316.1592, 23141.596, 614159.23, 3234234.34343443,
53141.926, 13145.926, 323423.434343443, 43161.592
]
}
Serialization:
Comparison:
protobuf encoding: 7667558.9 i/s
protobuf with model init: 572563.4 i/s - 13.39x slower
Oj encoding: 323818.1 i/s - 23.68x slower
Yajl encoding: 183763.3 i/s - 41.73x slower
standard JSON encoding: 182332.3 i/s - 42.05x slower
Deserialization:
Comparison:
Oj parsing: 953384.6 i/s
protobuf parsing: 883899.0 i/s - 1.08x slower
standard JSON parsing: 452799.0 i/s - 2.11x slower
Yajl parsing: 356091.2 i/s - 2.68x slower
We got similar results here. It seems that protobuf has some issues with arrays.
Payload size:
JSON payload bytesize 131
Protobuf payload bytesize 82
As a "complex" payload I mocked some user data with posts and comments for those posts to make it more like real life application.
data = {
user_id: 12345,
username: 'johndoe',
email: '[email protected]',
date_joined: '2023-04-01T12:30:00Z',
is_active: true,
profile: {
full_name: 'John Doe',
age: 30,
address: '123 Main St, Anytown, USA',
phone_number: '+1-555-123-4567'
},
posts: [
{
post_id: 1,
title: 'My first blog post',
content: 'This is the content of my first blog post.',
date_created: '2023-04-01T14:00:00Z',
likes: 10,
tags: ['blog', 'first_post', 'welcome'],
comments: [
{
comment_id: 101,
author: 'Jane',
content: 'Great first post!',
date_created: '2023-04-01T15:00:00Z',
likes: 3
},
...
]
},
...
]
}
The results:
Comparison:
protobuf encoding: 1038246.0 i/s
Oj encoding: 296018.6 i/s - 3.51x slower
Yajl encoding: 125909.6 i/s - 8.25x slower
protobuf with model init: 119673.2 i/s - 8.68x slower
standard JSON encoding: 115773.4 i/s - 8.97x slower
Comparison:
protobuf parsing: 291605.9 i/s
Oj parsing: 76994.7 i/s - 3.79x slower
standard JSON parsing: 64823.6 i/s - 4.50x slower
Yajl parsing: 34936.4 i/s - 8.35x slower
And payload size:
JSON payload bytesize 1700
Protobuf payload bytesize 876
We see here the expected behavior with pure protobuf encoding on the first place, however, if we look at our “real-world” example we see it is not faster that the standard JSON encoding.
If you are switching from JSON to Protobuf just for the speed, it's may not worth it.
The reason to use Protobuf should be the awesome cross-language schema definition for data exchange — not a performance boost.
The lead image for this article was generated by HackerNoon's AI Image Generator via the prompt "programming language".