Recently, I have been trying to expand my knowledge as a backend developer. I want to understand solving problems at scale and also breaking down big tasks into little chunks of tasks.
Before I bore you with gibberish, what we will be trying to achieve today is simple
> Find all the prime numbers from 1.....n
A very classic interview question. To be honest I didn't really settle down to solve it, not because I didn't want to but because I had a different end goal in mind.
Get the simplest code you can find and try to achieve at a lesser time using multithreading in NodeJS
Before we dive into the code, we need to understand what the problem is. When you run a typical NodeJS program what you get is a single process, single-thread, single event-loop, and single instance app running in the background. (Please correct me if I am wrong.)
Now imagine trying to find the prime numbers between 1 and 250 million and that's just a function within your application... Yeah, you guessed right.
To achieve multithreading in NodeJS, you need to use the
worker_threads
module and you can only get this from Node v10(experimental) and above. It became fully available from Node v12.Like I said earlier, our aim is to find the prime numbers between
1....n
. We could use Sieve of Eratosthenes
to solve the problem which is very efficient. But I wanted something that would overwork/overclock my CPU so that I could really appreciate multithreading for what it is.To start with
threaded.js
and non-threaded.js
Now in the
non-threaded.js
file paste this code in it// Function to check if a number is prime number or not
const checkPrime = num => {
for (let i = 2, s = Math.sqrt(num); i <= s; i++) if (num % i === 0) return false;
return num > 1;
};
const maxNumber = 100_000; // Where the loop should stop
let primeNumbers = [];
const startTime = Date.now();
for (let index = 0; index < maxNumber; index++) {
if (checkPrime(index)) {
primeNumbers.push(index);
}
}
const endTime = Date.now();
console.log('JOB TOOK:: ', (endTime - startTime) / 1000, ' to complete');
console.log('PRIMES:: ', primeNumbers);
So this is a basic NodeJS program that loops from 0 to `
maxNumber
` and stores the prime numbers in the Array called `primeNumbers
`. Also, I am calculating the time it takes for the code to run.const maxNumber = 100_000;
If you are confused about this line of code, please don't be. It's totally allowed in NodeJS. For me I use it to represent very long integers in my code.
Before running the benchmark, here are the details of my current laptop
0-100: ~0s
0-1_000: ~0.001s
0-10_000: ~0.003s
0-100_000: ~0.021s
0-1_000_000: ~0.351s
0-10_000_000: ~8.3s
Let's try 100 million... Lol...
0-100_000_000: ~220s
Wheeewww... My laptop fan almost blew out running this.
Before going deep, what is
MULTI-THREADING
? according to Wikipedia:In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system.
The keyword here is
concurrently
, how do we breakdown the problem into sub-problems and have them execute concurrently?, NOT sequentially. Remember, we are trying to save time, with as little resources as possible.So before we go any further, I will post the code and explain everything section by section. Grab this block of code and put it in the `threaded.js` file you created earlier
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const os = require('os');
const checkPrime = num => {
for (let i = 2, s = Math.sqrt(num); i <= s; i++) if (num % i === 0) return false;
return num > 1;
};
if (isMainThread) {
const startTime = Date.now();
const coresCount = os.cpus().length;
const numberOfElements = 100_000_000;
let workers = [];
const sharedBuffer = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * numberOfElements);
const arr = new Int32Array(sharedBuffer);
console.log('MAIN THREAD');
const numElementsPerThread = Math.ceil(numberOfElements / coresCount);
let workerIndex = 0;
let completed = 0;
while (workers.length < coresCount) {
workerIndex++;
const start = workers.length * numElementsPerThread;
const end = start + numElementsPerThread;
const worker = new Worker(__filename, {
workerData: {
start,
end,
index: workerIndex,
arr,
},
});
worker.on('message', message => {
if (message.completed) {
completed++;
}
if (completed === coresCount) {
console.log('Totally done!');
const endTime = Date.now();
console.log((endTime - startTime) / 1000, 'seconds to complete');
console.log('FINAL ARR:: ', arr);
console.log('FINAL ARRAY:: ', Array.from(arr).filter(Boolean));
}
console.log('final time:: ', message.index);
});
workers.push(worker);
}
} else {
console.log({
start: workerData.start,
end: workerData.end,
index: workerData.index,
});
for (let i = workerData.start; i < workerData.end; i++) {
let check = checkPrime(i);
if (check) {
workerData.arr[i] = i;
}
}
parentPort.postMessage({ completed: true, index: workerData.index });
}
Like I said earlier, to perform multi-threaded operations in NodeJS, we need the
worker_threads
module. This enables us to write programs that take advantage of multi-threading.const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const os = require('os');
This is a normal standard import of packages we will need to make this program work
const checkPrime = num => {
for (let i = 2, s = Math.sqrt(num); i <= s; i++) if (num % i === 0) return false;
return num > 1;
};
Remember this function from the
non-threaded
version of our app?. All it does is take a number and tells us if it's a prime number or not.Before we dive into the main code, I want to show you the basic structure of the code, so that you can further understand the main code.
const { Worker, isMainThread, parentPort, workerData } = require('worker_threads');
const os = require('os');
if (isMainThread) {
console.log('Main THREAD');
let worker_count = os.cpus().length;
console.log('THREADS::: ', worker_count);
let workers = 0;
while (workers < worker_count) {
new Worker(__filename);
workers++;
}
} else {
console.log('Worker THREAD');
}
Copy the above piece of code save it in a js file and run it. You should see a result like the one below
Main THREAD
THREADS::: 4
Worker THREAD
Worker THREAD
Worker THREAD
Worker THREAD
From the result above, you can see I use a crappy workstation.
The
isMainThread
from worker_threads
tells us if we are running on the main thread or in one of the worker threads. In the main thread
, I spun up more workers based on the number of cores my computer has. That's as basic as a multithreaded program can look like in NodeJS. Now let's get back to the main program.const sharedBuffer = new SharedArrayBuffer(Int32Array.BYTES_PER_ELEMENT * numberOfElements);
const arr = new Int32Array(sharedBuffer);
To understand the above code, you need to understand that for multithreading to work, we will need a shared memory where every worker thread will output their result when they are done.
Hence, our need for
sharedArrayBuffer
[READ THIS]. Each index in the sharedArrayBuffer
will be initialized with zero.So we created a
sharedArrayBuffer
and each element of the array will be of type Int32
and the length of the array will be the ceiling to where we want to search for prime numbers.const numElementsPerThread = Math.ceil(numberOfElements / coresCount);
Remember, multi-threading means sub-processes working concurrently. So basically we are calculating what size of the chunk we will give to each worker to execute.
e.g if i want to search for prime numbers from 0-100 and myis 4 then each worker will get 100/4 (=25) elements to perform their operations on.coreCount
Now, into the while-loop
const start = workers.length * numElementsPerThread;
const end = start + numElementsPerThread;
const worker = new Worker(__filename, {
workerData: {
start,
end,
index: workerIndex,
arr,
},
});
Even though, we know the size of the chunk each worker will get, how do we set the boundary for each worker.
start
and end
variable are calculating and holding that value for each worker we are about to spin-up.We initialize each worker with these parameters
start
, end
, index
(tracking the workers individually) and arr
(this is the shared buffer array each worker will output their result to)worker.on('message', message => {
if (message.completed) {
completed++;
}
if (completed === coresCount) {
console.log('Totally done!');
const endTime = Date.now();
console.log((endTime - startTime) / 1000, 'seconds to complete');
console.log('FINAL ARR:: ', arr);
console.log('FINAL ARRAY:: ', Array.from(arr).filter(Boolean));
}
console.log('final time:: ', message.index);
});
When each worker finishes its execution, it sends a signal back to the
mainThread
with whatever data it wants to send. So the block of code above is listening for when that event fires. When the last worker sends back its result, we calculate how long the whole process took, we also convert the Int32Array
into the normal Javascript array we are used to. The problem here is, some indexes contain elements that are zero (not prime numbers). So we use the .filter(Boolean)
trick to filter out all falsy
values from the Array.for (let i = workerData.start; i < workerData.end; i++) {
let check = checkPrime(i);
if (check) {
workerData.arr[i] = i;
}
}
In each worker, we just loop between its boundaries and check if each index is a prime number or not. If it's a prime number, then change the number at that index in the
sharedArrayBuffer
to i
.e.g Ifis 7 and 7 is a prime number, then go toi
and put 7 at index 7sharedArrayBuffer
parentPort.postMessage({ completed: true, index: workerData.index });
Once the loop terminates, send back completion response to the
mainThread
.And that my friends, was how I multi-threaded my way through prime numbers. Why don't you run it and tell me the results you got in the comment section.
Please don't break your computers while at it, I won't be held liable... lol
Please share your results in the comment section. I hope you learned something new.
Previously published at https://umaradam.xyz/prime-numbers-and-multi-threading