861 reads

Protobuf vs JSON in The Ruby World

by Oleksandr StarodubtsevApril 26th, 2023

Too Long; Didn't Read

Protobuf is a fast, compact cross-platform message system. It consists of a definition language and language-specific compilers. It has great backward and forward compatibility, is fast(we are not sure yet), and is more compact than JSON, for example. It's not compressed and some specific formats can work better for their data.

People Mentioned

featured image - Protobuf vs JSON in The Ruby World

‘programming language’ Image created by HackerNoon AI Image Generator

On my current project, I work with protobuf not only for GRPC, but also as RabbitMQ message format. While the advantages of protobuf are not limited to its speed, I was wondering if it's really so fast compared to JSON libraries, especially in the ruby world. I decided to make some benchmarks to check it, but first I want to add a brief introduction to each format.

What is protobuf?

It's a fast, compact cross-platform message system, designed with forward-and backward compatibility in mind. It consists of a definition language and language-specific compilers.

It works perfectly for small object-like data, has great backward and forward compatibility, is fast(we are not sure yet), and is more compact than JSON, for example, but have some limitations like not supporting direct comparison(you need to deserialize objects to compare).

It's not compressed and some specific formats can work better for their data(for example JPEG). It's not self-describing.

See Official docs for more details.

What is JSON

JSON is an abbreviation for JavaScript object notation. A text-based data format, which was originally used in JavaScript, but later got widely spread as a communication format not only between JS apps and backend but even between microservices and has multiple other usages.

It uses strings as keys and has a string, number, boolean, object, array and nul as available types for value. The main advantage of it is that it is human-readable, and pretty easy to serialize and parse for programming language.

See the site for more details.

Benchmarks

I picked up three popular ruby JSON libraries. They are Oj, Yajl, and standard JSON library. For protobuf, I use standard google protoc with google ruby gem.

I will measure different specific types of payload to see which data type we will show the most difference, as long as complex payload with a mix of field types.

You can see all the code here https://github.com/alexstaro/proto-vs-json.

Benchmark setup

As a hardware I use laptop with AMD Ryzen 3 PRO 5450U and 16gb of ddr4 ram.

As an operating system, I use Ubuntu 22.10 kinetic.

Ruby version 3.2.1 was installed via asdf.

For benchmarking, I use benchmark/ips gem (https://github.com/evanphx/benchmark-ips)

The setup looks like this:

Benchmark.ips do |x|  
  x.config(time: 20, warmup: 5)  
  
  x.report('Yajl encoding') do  
    Yajl::Encoder.encode(data)  
  end  
  ...
  x.compare!  
end

Integers only

We will start with only integers. Numbers are pretty hard for JSON so we expect protobuf to be far away from other competitors.

The test data:

data = {  
  field1: 2312345434234,  
  field2: 31415926,  
  field3: 43161592,  
  field4: 23141596,  
  field5: 61415923,  
  field6: 323423434343443,  
  field7: 53141926,  
  field8: 13145926,  
  field9: 323423434343443,  
  field10: 43161592  
}

Benchmark results:

   protobuf encoding:  4146929.7 i/s
         Oj encoding:  1885092.0 i/s - 2.20x  slower
standard JSON encoding:   505697.5 i/s - 8.20x  slower
       Yajl encoding:   496121.7 i/s - 8.36x  slower

There is no doubt that protobuf is an absolute winner, but what if we make the test more closer to the real-world scenario - we almost always create proto messages only for serialization.

What would happen if we move model initialization under benchmark?

Here are the results:

   protobuf encoding:  4146929.7 i/s
         Oj encoding:  1885092.0 i/s - 2.20x  slower
standard JSON encoding:   505697.5 i/s - 8.20x  slower
       Yajl encoding:   496121.7 i/s - 8.36x  slower
protobuf with model init:   489658.0 i/s - 8.47x  slower

The result is not so obvious. I expected encoding with message initialization would be slower but not the slowest.

Let's check deserialization:

    protobuf parsing:   737979.5 i/s
          Oj parsing:   448833.9 i/s - 1.64x  slower
standard JSON parsing:   297127.2 i/s - 2.48x  slower
        Yajl parsing:   184361.1 i/s - 4.00x  slower

There are no surprises here.

In terms of payload size, protobuf is almost 4 times more compact compared to json:

JSON payload bytesize 201
Protobuf payload bytesize 58

Doubles only

Doubles are expected to be the hardest payloads for JSON, let's check this out.

Our payload:

data = {  
  field1: 2312.345434234,  
  field2: 31415.926,  
  field3: 4316.1592,  
  field4: 23141.596,  
  field5: 614159.23,  
  field6: 3234234.34343443,  
  field7: 53141.926,  
  field8: 13145.926,  
  field9: 323423.434343443,  
  field10: 43161.592  
}

Result:

protobuf encoding:  4814662.9 i/s
protobuf with model init:   444424.1 i/s - 10.83x  slower
         Oj encoding:   297152.0 i/s - 16.20x  slower
       Yajl encoding:   160251.9 i/s - 30.04x  slower
standard JSON encoding:   158724.3 i/s - 30.33x  slower

Protobuf is much faster even with model initialization. Let's check the deserialization:

Comparison:
    protobuf parsing:   822226.6 i/s
          Oj parsing:   395411.3 i/s - 2.08x  slower
standard JSON parsing:   241438.7 i/s - 3.41x  slower
        Yajl parsing:   157235.7 i/s - 5.23x  slower

Still no surprises here.

and the payload size:

JSON payload bytesize 211
Protobuf payload bytesize 90

Not four times, but still noticeable.

Strings only

Strings are expected to be easier for JSON, let's check this out.

payload:

data = {  
  field1: "2312.345434234",  
  field2: "31415.926",  
  field3: "4316.1592",  
  field4: "23141.596",  
  field5: "614159.23",  
  field6: "3234234.34343443",  
  field7: "53141.926",  
  field8: "13145.926",  
  field9: "323423.434343443",  
  field10: "43161.592"  
}

Bench results:

Comparison:
   protobuf encoding:  3990298.3 i/s
          oj encoder:  1848941.3 i/s - 2.16x  slower
        yajl encoder:   455222.0 i/s - 8.77x  slower
standard JSON encoding:   444245.6 i/s - 8.98x  slower
protobuf with model init:   368818.3 i/s - 10.82x  slower

Deserialization:

Comparison:
     protobuf parser:   631262.5 i/s
           oj parser:   378697.6 i/s - 1.67x  slower
standard JSON parser:   322923.5 i/s - 1.95x  slower
         yajl parser:   187593.4 i/s - 3.37x  slower

The payload size:

JSON payload bytesize 231
Protobuf payload bytesize 129

Integer array

Despite we have separated integers bench it's interesting how protobuf handles collections.

Here is the data:

data = {  
  field1: [  
    2312345434234, 31415926, 43161592, 23141596, 61415923, 323423434343443, 53141926, 13145926, 323423434343443, 43161592  
  ]  
}

Serialization bench:

Comparison:
   protobuf encoding:  4639726.6 i/s
          oj encoder:  2929662.1 i/s - 1.58x  slower
standard JSON encoding:   699299.2 i/s - 6.63x  slower
        yajl encoder:   610215.5 i/s - 7.60x  slower
protobuf with model init:   463057.9 i/s - 10.02x  slower

Deserialization bench:

Comparison:
           oj parser:  1190763.1 i/s
     protobuf parser:   760307.3 i/s - 1.57x  slower
standard JSON parser:   619360.4 i/s - 1.92x  slower
         yajl parser:   414352.4 i/s - 2.87x  slower

To be honest, the deserialization results are pretty unexpected here.

Let's check payload size:

JSON payload bytesize 121
Protobuf payload bytesize 50

Array of doubles

I decided to check if an array of doubles shares the same behavior.

data:

data = {  
  field1: [  
    2312.345434234, 31415.926, 4316.1592, 23141.596, 614159.23, 3234234.34343443,
    53141.926, 13145.926, 323423.434343443, 43161.592  
  ]  
}

Serialization:

Comparison:
   protobuf encoding:  7667558.9 i/s
protobuf with model init:   572563.4 i/s - 13.39x  slower
         Oj encoding:   323818.1 i/s - 23.68x  slower
       Yajl encoding:   183763.3 i/s - 41.73x  slower
standard JSON encoding:   182332.3 i/s - 42.05x  slower

Deserialization:

Comparison:
          Oj parsing:   953384.6 i/s
    protobuf parsing:   883899.0 i/s - 1.08x  slower
standard JSON parsing:   452799.0 i/s - 2.11x  slower
        Yajl parsing:   356091.2 i/s - 2.68x  slower

We got similar results here. It seems that protobuf has some issues with arrays.

Payload size:

JSON payload bytesize 131
Protobuf payload bytesize 82

Complex payload

As a "complex" payload I mocked some user data with posts and comments for those posts to make it more like real life application.

data = {  
  user_id: 12345,  
  username: 'johndoe',  
  email: '[email protected]',  
  date_joined: '2023-04-01T12:30:00Z',  
  is_active: true,  
  profile: {  
	full_name: 'John Doe',  
	age: 30,  
	address: '123 Main St, Anytown, USA',  
	phone_number: '+1-555-123-4567'  
  },  
  posts: [  
	{  
    	post_id: 1,  
    	title: 'My first blog post',  
    	content: 'This is the content of my first blog post.',  
    	date_created: '2023-04-01T14:00:00Z',  
    	likes: 10,  
    	tags: ['blog', 'first_post', 'welcome'],  
    	comments: [  
          {  
    	    comment_id: 101,  
    	    author: 'Jane',  
    		content: 'Great first post!',  
    		date_created: '2023-04-01T15:00:00Z',  
    		likes: 3  
    	  },  
    	  ... 
        ]  
    },  
    ...
  ] 
}

The results:

Comparison:
   protobuf encoding:  1038246.0 i/s
         Oj encoding:   296018.6 i/s - 3.51x  slower
       Yajl encoding:   125909.6 i/s - 8.25x  slower
protobuf with model init:   119673.2 i/s - 8.68x  slower
standard JSON encoding:   115773.4 i/s - 8.97x  slower


Comparison:
    protobuf parsing:   291605.9 i/s
          Oj parsing:    76994.7 i/s - 3.79x  slower
standard JSON parsing:    64823.6 i/s - 4.50x  slower
        Yajl parsing:    34936.4 i/s - 8.35x  slower

And payload size:

JSON payload bytesize 1700
Protobuf payload bytesize 876

We see here the expected behavior with pure protobuf encoding on the first place, however, if we look at our “real-world” example we see it is not faster that the standard JSON encoding.