Node.js is known as a blazingly fast server platform with its revolutionary single-thread architecture, utilizing server resources more efficiently. But is it possible to achieve that amazing performance using only one thread? The answer might surprise you.
In this article, we will reveal all the secrets and magic behind Node.js in a very simple manner.
Before we begin, we have to understand what a process and a thread are and discover their differences and similarities.
A process is an instance of a program that is currently being executed. Each process runs independently of others. Processes have several substantial resources:
A thread is a single unit of execution within a process. There might be multiple threads within the process performing different operations simultaneously. The process shares execution code, data, and heap with threads, but stack and registers are allocated separately for each thread.
To avoid misunderstanding terms, it's important to note that JavaScript itself is neither single-threaded nor multi-threaded. The language has nothing to do with threading. It's just a set of instructions for the execution platform to handle. The platform handles these instructions in its own way - whether in a single-threaded or multi-threaded manner.
(Or Input / Output operations) are generally considered to be slower compared to other computer operations. Here are some examples:
You might be wondering why reading data from disk is considered slow? The answer lies in the physical implementation of hardware components.
Accessing the RAM is in the order of nanoseconds, while accessing data on the disk or the network is in the order of milliseconds.
The same applies to the bandwidth. RAM has a transfer rate consistently in the order of GB/s, while the disk or network varies from MB/s to optimistically GB/s.
On top of that, we have to consider the human factor. In many circumstances, the input of an application comes from a real person (like, a key press). So the speed and frequency of I/O doesn't only depend on technical aspects.
I/O's can significantly slow down a program. The thread remains blocked, and no further operations will be executed until the I/O is completed.
Okay, why not just spawn more threads inside the program and handle each request separately? Well, it seems like a good idea. Now, each client request has its own thread, and the server can handle multiple requests simultaneously.
The program needs to allocate additional memory and CPU resources for each thread. This sounds reasonable. However, a significant issue arises when threads perform I/O operations - they become idle and spend most of their time using 0% of resources, waiting for the operation to complete. The more threads there are, the more resources are inefficiently utilized.
On top of that, managing threads is a challenging task leading to potential issues such as race conditions, deadlocks, and livelocks. The operating system needs to switch between threads, which can add overhead and reduce the efficiency gains from multithreading.
Luckily, humanity has already invented smart mechanisms to perform these kinds of operations efficiently.
Welcome to the Event Demultiplexer. It involves a process called Multiplexing - a method by which signals are combined into one signal over a shared resource. The aim is to share a scarce resource (in our case it's CPU and RAM). For example, in telecommunications, several telephone calls may be carried out using one wire.
The responsibilities of the Event Demultiplexer are divided into the following steps:
Important! The Event Demultiplexer is not a component or device that exists in the real world. It's more like a theoretical model used to explain how to handle numerous simultaneous events efficiently.
To understand this complex process, let's go back to the past. Imagine an old phone switchboard: it identifies and registers sources of events (phones) and waits for new events (calls). Once there is a new event (a phone call), the switchboard delivers a notification (lights up a bulb). Then, the switchboard operator reacts to the notification by checking the target phone number and forwarding the call to its desired destination.
For computers, the principle is the same. However, the role of sources is played by things such as file descriptors, network sockets, timers, or user input devices. Each source can generate events like data available to read, space available to write, or connection requests.
Each operating system has already implemented the Event Demultiplexer mechanism: epoll (Linux), kqueue (macOS), event ports (Solaris), IOCP (Windows).
But Node.js is crossplatform. To govern this entire process while supporting cross-platform I/O, there is an abstraction layer that encapsulates these inter-platform and intra-platform complexities and expose a generalized API for the upper layers of Node.
Welcome libuv - a cross-platform library (written in C) originally developed for Node.js to provide a consistent interface for non-blocking I/O across various operating systems. Libuv not only interfaces with the system's Event Demultiplexer but also incorporates two important components: the Event Queue and the Event Loop. These components work together to efficiently handle concurrent non-blocking resources
The Event Queue is a data structure where all events are placed by the Event Demultiplexer, ready to be enqueued and processed sequentially by the Event Loop until the queue is empty.
The Event Loop is a continuously running process that waits for messages in the Event Queue and then dispatches them to the appropriate handlers.
This is what happens when we call an I/O operation:
What Node.js does is that while one request is waiting, it can handle another request. Node.js does not wait for a request to complete before processing all other requests. By default, all requests you make in Node.js are concurrent - they do not wait for other requests to finish before executing.
Hooray! It seems like the problem is solved. Node.js can run efficiently on a single thread since most of the complexities of blocking I/O operations have been solved by OS developers. Thank you!
But if we take a closer look at the libuv structure, we find an interesting aspect:
Wait, Thread Pool? What? Yes, now we've delved deep enough to answer the main question - Why Node.js is not (entirely) single-threaded?
Okay, we have a powerful tool and OS utilities that allow us to run asynchronous code in a single thread.
But here is a problem with Event Demultiplexer. Since the implementation of the Event Demultiplexer on each OS is different, some parts of I/O operations are not fully supported in terms of asynchrony. It is difficult to support all the different types of I/O in all the different types of OS platforms. Those issues are especially related to the file I/O implementations. This also has an impact on some of Node.js's DNS functions.
Not only that. There are other types of I/O's that can not be completed in asynchronous manner, like:
DNS Operations, like dns.lookup
can block because they might need to query a remote server;
CPU-bound tasks, like cryptography;
ZIP compression.
For these kinds of cases, the thread pool is used to perform the I/O operations in separate threads (typically there are 4 threads by default). So, the complete Node.js architecture diagram would look like this:
Yes, Node.js itself is single-threaded, but the libraries it uses internally, such as libuv with its thread pool for some I/O operations, are not.
The Thread Pool, in conjunction with the Tasks Queue, is used to handle blocking I/O operations. By default, the Thread Pool includes 4 threads, but this behavior can be modified by providing additional environment variable:
UV_THREADPOOL_SIZE=8 node my_script.js
This is what happens when an I/O operation cannot be performed asynchronously, but the key differences are:
There is no magic here. I/O cannot be actually non-blocking and there is no way to achieve that (at least for now). Data cannot be transferred faster that it dictated by physics constraints. Nothing is perfect, so until we find ways to increase data transfer speeds at the hardware level, we use a set of optimised algorithms to perform asynchronous operations in the most efficient way possible.
Thank you for reading and have a wonderful day :)