A team of engineers writing about web & mobile applications
Chatting is the most popular way to communicate and get things done today. Seriously, does anyone still make calls or send emails in the era of messengers?
We enjoy chatting for several reasons. It’s natural because we use language to type or record messages. It’s simple because chats typically have plain and clear interfaces that we can grasp within minutes. Finally, it’s instant because messages get delivered immediately. That’s it. As we can get things done just like that, it’s no wonder that the number of people using WhatsApp, the world’s most popular messaging app, is close to 1.5 billion.
However, although chatting is incredibly easy from the users’ side, it hardly comes to anyone’s mind as to how complex chats are on the inside.
You probably usually imagine something like this when you send a GIF of a funny cat to your best friend:
However, if you’ve ever wondered what hides behind the Black Hole, seize your chance to find out here. Let’s imagine that we’re sending a GIF with a cat to a friend of ours and we follow it step by step to see how the chat architecture is organized. Here we go.
Every chat consists of two parts: Chat App and Chat Server Engine.
The Chat App is a desktop, web, or smartphone chat application. It is responsible for receiving user data, such as messages and files, storing them locally, transferring to the backend for further processing, and displaying them to users. In other words, the Chat App acts as a mediator between the user and the backend.
Users interact with the Chat App via the Chat User Interface (Chat UI). It includes various components and widgets that are used to send text and voice messages, share files, display dialogs, contact lists, etc.
The Chat Server Engine is a pool of external servers responsible for the chat performance. The Chat Server Engine does all of the dirty work that is not visible to the users: obtains user messages from the Chat App, processes them, and delivers to the recipient. The Chat Server Engine falls into several components, each of them having their own responsibility: user files storage, message processing and delivery, etc. We’ll be covering all of them a bit later in this article.
This is a general view of the chat architecture: client ↔ chat engine ↔ client
To get deeper into the chat architecture, let’s send our cat out on an adventure. Сlick an Attach button in the Chat UI, select the GIF with the cat from the device Media Gallery-and voila! The cat’s journey has begun.
To make sure that the cat is not captured by some other cat lovers along the way, it gets encrypted with a security key so that no one but the recipient can decrypt it.
Furthermore, the encrypted cat is transferred to the Chat Media Storage-a server where static data, such as user files, are stored (Amazon S3, Cloudinary, etc).
The cat gets there via the Chat Media Storage Client-a Chat Client Engine component responsible for uploading data on the Chat Media Storage.
After the cat finds itself a cozy place in the Media Storage, a link to where it is located is generated to be further sent to the recipient.
SPOILER: You might have noticed that when you get a message with a picture attached, the picture sometimes has a Download indicator in the middle (like in Telegram and Viber) and is blurred (like in WhatsApp). That’s because you didn’t receive the picture itself but rather the message with the link to it inside. To be viewed, the picture has to be downloaded from the Chat Media Storage Client and decrypted first.
At this step, a message is created and the link to the cat’s location is attached to it. The message is transferred to the Chat Device Storage, which is, actually, a user’s device storage where all messages are located. It allows users to access them anytime even if there is no Internet connection.
At the same time, the message gets dispatched to the Chat Server Engine via the Chat WebSocket Client.
After the message reaches the Chat Server Engine, you can see the first check mark appear in your message. This is a system message notifying that the cat has been successfully transferred to the server — but not yet delivered.
As soon as the message with the cat reaches the Chat Server Engine, it appears in the Chat Message Queue — a queue consisting of ALL of the messages waiting to be delivered at the moment. The messages are handled by Chat Workers, which are the server threads that take messages from the Queue and try to deliver them to the right recipients.
After the worker takes the message from the Queue, two scenarios are possible.
Scenario 1. The recipient is offline
If the recipient is offline, the message can’t be delivered immediately. In this case, the message just stays in the Queue until the user goes online.
Scenario 2. The recipient is online
If the recipient is online, the message is delivered through the Chat WebSocket or VOIP Push message and removed from the Queue. Then, the message gets decrypted and goes to the internal data storage on the recipient’s device. At the time, the second check mark appears in the message, notifying that the message has been successfully delivered-but not yet read.
Meanwhile, a notification pops up on your friend’s device, stating that there is an unread message. As soon as your friend opens the Chat App, the message stored in the Chat Device Storage is displayed in the Chat UI. At once, both check marks on your device change color, thereby notifying that the message has been read.
Voila! You’ve just delivered the cat through the chat!
Now let’s take a look at what the chat architecture looks like in its entirety.
Well, that was a brief (yeah, truly brief:) ) and simplified description of chat architecture and its components.
Further on in the series, we will get into more detail about how each of the components works. Stay tuned!