Improve Network API Performance For Your Apps

Network APIs are everywhere in the modern world. If the data is the lifeblood of modern tech, then APIs are essentially vessels that carry it around distributed and client-server systems. And knowing how to control and improve API response time is becoming increasingly important.

But often, optimization of API response time does not go beyond benchmarking endpoints and optimizing SQL queries. Of course, it’s beneficial to have a fast code base, but also important to understand that latency perceived by the user also depends on how and in which network conditions API is used.

Let’s start with network conditions. Application developers develop and test their apps on laptops or stationary machines with a usually good network connection. It’s understandable; having good internet is vital for a development environment. But sometimes, it leads to wrong assumptions about what users will experience while using the app. Here is a simple illustration of the difference between a good and moderate internet connection:

Here I open google.com on my Wi-Fi. The following picture is taken with a network connection degraded to 3G (780/330Kbps, 100ms delay), using Network Link Conditioner from Apple’s Additional Tools for Xcode package.

It’s twice as slow. And here, we add 1% packet loss to the mix:

As you can see, life quickly becomes quite miserable. And I even skipped connection establishing and SSL handshake here. You definitely should try it with your app!

Of course, we can't change network conditions for our users, but we can design and adapt our API to those conditions. And first step here is to be aware of the network conditions your API is used in. It means collecting metrics from clients, analyzing them, and setting targets or goals on performance.

The next thing I want to mention is the API's adaptation to the application's requirements. Imagine the situation, a messaging application or feature on the client side and REST API on the server. The client needs to render a list of recent messages, so here is the typical request flow:

The client with a 100ms delay each way will have to wait 600ms just to show something meaningful to a user. But what if this API can return all the data needed in one response? That’s 400ms faster!

I’m exaggerating here, but I think my point is clear. Design API so it doesn’t make unnecessary network requests when all the data required can be sent in one response.

While doing this sort of API optimization by combining/batching response data, it’s important to keep in mind that the network has its constraints and limitations. Don't forget about MTU and packet fragmentation. You can read more about it here, for instance. For API it means that there is a limit on response size after which it will be fragmented into multiple TCP packets. It still will be faster than performing separate HTTP requests because of headers.

Let’s move on to individual requests. HTTP requests consist of two parts - headers and body. While headers are often neglected, there lies a huge chunk of data, and its name is Cookies. The problem with Cookies is that they are everywhere, everyone needs them, and nobody is paying attention. So it’s easy to end up in a situation when Cookies are bigger than other headers and response bodies combined.

API usually doesn’t need Cookies, and it is worth serving your API from the Cookie-less domain. Also, take a look at other headers your API server is sending - are they all required?

Now let’s talk about response body size. The first thing worth checking - is response compression. Make sure it is enabled even for dynamic API. You will lose in terms of processing, but this will reduce the amount of data transferred sometimes significantly and improve latency for clients.

The next thing to check is that your API is not sending data the client does not require. Sometimes API responses are constructed out of existing response structures, and we end up with bloated responses because of convenience. GraphQL performs way better in this situation when a client can query only required fields.

And the last thing I want to mention is the data serialization format. Most modern APIs are built around JSON. JSON is a convenient and ubiquitous format; it is human-readable and performs very well in browsers. The problem with it is verbosity. Here is an example:

The full size of the response here is 2512 bytes. Out of it are only 1586 bytes of data. The rest is fields names and JSON syntax:

But even that can be reduced if we strip repeatable data that can be derived from the context of the query. And this leaves us with 626 bytes.

That’s four times reduction in size. This doesn’t work well for any situation and API, but it is relatively easy to do. Another possible approach is switching to binary serialization, for example, to Protobuf. It is a well-supported format with minimal overhead in how it packs the data. But you’ll also lose human readability, and it takes more effort to implement than JSON.

And the last one. It’s essential to test how the stuff you are building performs in the real world - using a mobile network, on the subway or in a dense city, etc. Don’t forget about dogfooding - if you can’t stand how slow your app is, your users won’t either.

To summarize:

Be aware of the network conditions of your users. Measure latency perceived by clients.
Design your API to make fewer requests.
Design your API to send fewer data.
Test how your app interacts with the API in the real world.