While taking the responsibility of a mature service, it may not be possible for a single server to handle all the workload. This is due to three main considerations: performance, availability, and economy.
In order to handle the excessive traffic well, increasing the capacity of processing requests is necessary, it can be achieved by either scale-up or scale-out
However, machine failures happen, and the cost of using top specification machines may not be affordable. Spread the workload evenly into multiple general workers sounds like a more feasible solution in most cases.
This seems pretty natural….right?
Yes, but actually not.
The architecture works for most stateless APIs, but this newly added “Spread” behavior adds some uncertainty for other stateful interactions.
Any techniques that make the servers store “state”, which results in different logical behaviors between them, may conflict with the general load balancing architecture above.
Using sessions on the web service is a classic case.
Limited by the stateless design of HTTP protocol, in order to allow users to have continuity in each operation, the context information was stored on the server-side within a certain period of time, such as login status and shopping cart.
Just like taking out a meal in the real world. Customers can get a number plate after ordering a meal at the counter, and they can pick up the meal with the number plate when it is ready.
Ordering a meal and pick it up are two independent steps for customers but with continuous logic, this is because the counter already stores the data of who they are and what they have ordered, and could be retrieved by the number plate they provided.
Well, most of the time, the number plate can only be used in the same shop, we can not take the plate we got at A store to ask B store for our meal.
This principle is the same on web services, because the session data may not be shared. There are two main directions to solve this problem:
The first one is a little bit more complex and out of the scope of this story, I will focus on the second solution here.
Maintaining the mapping relationships between the clients and web servers can help us forward each client to the same server they connected to last time, which currently handles the session context for them.
This can be achieved by various identifiers from the client-side like IP addresses and cookies. Many well-known load-balancing solutions provide this option with different approaches, such as AWS NLB/ALB, GCP Cloud Load Balancer, and Envoy from CNCF.
However, enable the sticky sessions option is equivalent to adding a hard rule the may conflict with traffic balancing. For example, while handling a vast amount of client requests with the same IP address, an IP-based sticky session may not be a good choice since it may cause part of the servers under a heavy workload.
In modern software services, there are many situations that require real-time updates of information, such as stock market transactions, online games, and chat rooms.
Using the polling strategy which driven by the periodic requests sent by the client-side sounds not economical, we need to afford the cost of TCP connections for each request, and it is very likely that there is no information to update.
WebSocket with full-duplex communication is a solution worth trying. It allows the server-side to actively push messages to the client-side, effectively avoiding meaningless requests.
However, a long-live connection between the client and the server would be maintained after the first request …
We are actually balancing the connections, not the workloads.
Any problem with this?
The main risk is that the workload cannot be properly distributed among multiple machines:
To avoid a single server carrying too much workload, there are two main directions to solve this problem:
Assuming that after a period of observation and statistics, we found that the workload of the same group of WebSocket connections during rush hours is about twice as much as usual, then we can do some simple calculations:
This is just a simple case for easy explanation, this strategy highly depends on the traffic shape and business nature of your services.
If the service only encounters traffic peak during holidays, we can definitely set 70% of resources usages as upper bound during weekdays, and only increase the sensitivity of the trigger before holidays.
Honestly, I don’t recommend this practice as a long-term solution. Although it is relatively simple and can gain effect immediately, it does not solve the issue of unbalanced workload fundamentally, but make costs of devices grow more rapidly.
Another more solid practice is to reshape all traffic through reconnecting. To a certain extent, it overcomes the balancing failure of long-term connection, but it also brings new challenges to the user experience and service resilience.
Every reconnection is not only an opportunity but a risk.
The timing of the reconnection is a critical issue, it has a very direct impact on the effectiveness of load balancing and the user experience. Below are several commonly used strategies:
Reconnect Periodically
This is one of the most intuitive methods, it can almost ensure the effectiveness of the workload balancing with an appropriate time interval setting. Unfortunately, the brute force of this hard rule probably devastates the client’s experience while using the service.
Product managers can easily make a list of situations that should not be interrupted:
There is no doubt that disturbing the user while they expect to have fluent operations is terrible.
Choose the right occasion to reconnect
Since we can sort out many situations that are not suitable for reconnecting, on the other hand, there may also exist some suitable. To be more precise, we can make good use of the moment that users can accept and expect to wait:
When a user is ready to wait, he is less concerned about waiting a little longer, and they will not even notice that you are secretly reconnecting. Even if the reconnection unfortunately fails, it will not interrupt the continuous operation and have too much negative impact.
A good user experience is often regarded as a holy grail, in order to realize various product imaginations, the techniques and strategies behind them are always astonishing.
I hope that my experience sharing in load balancing architectures and strategies can help you handle the challenges in software engineering well in the future :)
Image Credits
Previously published at https://rain-wu.medium.com/the-advanced-challenge-of-load-balancing-6f6ef5f36ec4