These days many developers are using plain WebSockets as a starting point when building real-time applications and services. Now that WebSockets have become widely supported across all major browsers, long-polling-fallback solutions have lost much of their appeal — In line with this trend, many developers are abandoning libraries altogether in favour of using the native WebSockets API directly.
Before you decide to go ahead and start building your entire app/service using raw WebSockets though, it’s important that you fully understand your use case and are aware of some of the common hurdles that you are likely to encounter on your journey. This might help you to decide whether or not plain WebSockets are suitable for you or if you need something more substantial.
Regardless of your use case, this is almost certainly an issue that you will encounter when using WebSockets —Solving it is harder than it looks.
For example, assume that you want a user to connect and exchange messages with your server via WebSockets. On the front-end you might have this code:
let socket = new WebSocket(someURL);// Just listen to and send messages here.// ...Happily ever after.
Unfortunately, that’s just the beginning of a long, painful journey. In the real-world, you have to account for network failure and server crashes; you can’t assume that a socket will stay open forever or even close itself properly. If the internet goes down, the WebSocket on the other side of the connection will become unreachable and the network may not be able to recover from it (let alone detect the issue quickly enough).
Mobile devices introduce a new category of connection issues; if a mobile device is locked, goes to sleep or the application is moved to the background, an active WebSocket connection may become unresponsive and not close itself properly.
Another issue to consider is what happens when your server crashes; your client sockets will hang up all at once.
The most common strategy for dealing with all these issues is to just throw away sockets and create new ones… But for a popular production system, it may not always be so simple. If your server crashes, for example, and you have 10K concurrent users/sockets and you try to reconnect them all at once, this could overload your server and crash it again (thereby repeating the vicious cycle). To solve this problem, you need to use an exponential back-off strategy.
There are use cases where you just want to broadcast real-time messages and data to everyone indiscriminately. Other times, you want fine-grained access control to allow your server to decide whether or not specific users can perform certain actions and receive certain messages — To achieve this, you need a way to associate each socket with a user account within your system; then you need to prevent each socket from performing actions and receiving real-time data which falls outside of their user permissions.
If you want your system to work under poor network conditions, you may need to use some kind of token on the front end to allow your server to track a user across multiple physical connections—If a connection fails, you need a way to re-create and re-authenticate a new connection without asking the user to re-enter their password. Also, you may need to account for users who are accessing your app through multiple browser tabs; you need a way to authenticate sockets across all open tabs if they sign into one of them. JSON Web Token is a popular strategy for authenticating users over WebSockets but getting the implementation right can be a lot of work — Not getting it right can create security vulnerabilities within your system.
When building a system that allows two or more users to chat with each other in real-time, a common, naive strategy that many developers use involves creating an object/hashmap to keep track of which sockets belong to which users on the server side — Then, when a client socket belonging to UserA sends a message to your server which is meant for UserB, your server code can look for sockets that belong to UserB in the object/hashmap and send the message to those sockets.
This approach works very well if your back end code runs on a single process on a single server but if you need to scale out, or even scale up to multiple processes, you will find that this approach adds a lot of extra complexity. The root problem here is that you might have one user hosted on one process and another user hosted on a different process; if so, how can you access a user’s sockets if you don’t have any reference to them in the current process?
This drawing illustrates the problem:
… and the naive solution:
… and this shows why the naive solution doesn’t scale:
A better, more scalable approach to the problem of sending messages between users is to use pub/sub channels on the front end (and let the pub/sub service do the heavy lifting). A lot of developers/companies who start out using plain WebSockets and who end up needing to scale beyond a single process find it difficult and often end up switching to an expensive third-party pub/sub service (and losing control over their real-time back end). That’s why I started SocketCluster — It’s open source, it supports standard pub/sub and it’s designed to scale to any number of users across (almost) any number of machines.
When it comes to choosing the right tools, it’s important to investigate some of the heavier solutions before deciding to go straight for the lightest, most basic ones. Sometimes, if you have very specific requirements, you will actually want the basic solution but often, choosing a slightly heavier solution will save you a lot of time and frustration in the medium and long term.