paint-brush
Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm: Mathematical Optimizationby@multithreading
132 reads

Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm: Mathematical Optimization

tldt arrow

Too Long; Didn't Read

With the growing computational needs associated with prime enumeration tasks, traditional implementations of the Sieve algorithm remain a bottleneck.
featured image - Memory Efficient Multithreaded Incremental Segmented Sieve Algorithm: Mathematical Optimization
MultiThreading.Tech: #1 Publication on Concurrent Programming HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Evan Ning, Milton Academy & [email protected];

(2) David R. Kaeli, Department of Electrical and Computer Engineering & Northeastern University [email protected].

3 Mathematical Optimization

One of the novel contributions of our work is to create optimized mathematical operations, where each thread can work efficiently on its designated segment, ensuring that the sieve is executed in a parallelized and optimized manner.


For any thread T, let the current starting index in the array be N and the starting value be M, assume we want to sieve for prime number P.


• First, we find the greatest multiple of P less than M, denoted as Q.


• Then, we find Q (mod6) = q, P (mod6) = p, and ⌊P/6⌋ = R.


• Finally, we find the approximate index that a value of Q would be relative to N (this will be less than N).


• From there, we can go through every possible value for p and q (note that p can only be 1 or 5).


Let id_1 and id_5 be the indices to start sieving in the mod 1 and mod 5 arrays. Table 1 illustrates the operation performed for each p and q condition.


Table 1. Optimized Mathematical Sieve Operation


Once we know the starting index, we can iterate by adding P to the index (essentially adding 6*P to the number). This guarantees that the new index will be the same remainder mod 6 and that it will be a multiple of P.


Example: Assume that we are using a chunk size of 6,000 values and using 5 threads (0-4). Also assume that we are currently iterating through the group 12,000-18,000, which is mapped onto thread 2 of 5. The index range would be 400-599, and we would sieve from 14,400 – 15,599. Let us sieve for the prime 13. The value of R would be 2. The smallest multiple of 13 less than 14,400 is 14,391. 14,391 mod 6 = q = 3, and 13 mod 6 = p = 1. id_1 and id_5 would be 400-2 = 398, as 14,391 would be between those two values (14,389 and 14,393). id_1 = 398 + 4*2 + 1 = 407, and id_5 = 398 + 2*2 = 402. Converting these numbers back to integers, we see that we start sieving from 407*6 + 1 + 12,000 = 14,443 and 402*6 + 5 + 12,000 = 14,417, which we can confirm are multiples of 13 and are 1 and 5 mod 6, respectively.


This method reduces redundant calculations, simplifies each step’s mathematical operation, and ensures that each thread is not just parallel but also efficiently parallel, making the best use of the computational resources.