In this article, I will show you how to use the , which hasn’t received significant dissemination among Java developers. ForkJoinPool is one of the ’s implementations. It is used in and Stream API. It was designed to simplify parallelism for recursive tasks by breaking the single task into independent ones until they are small enough to be executed asynchronously. With this class, you can perform a significantly large amount of tasks in a small number of threads. ForkJoinPool ExecutorService CompletableFuture Java has a common implementation that can be created through the static method: ForkJoinPool commonPool() ForkJoinPool forkJoinPool = ForkJoinPool.commonPool(); ForkJoinTask has its own and implementations. These implementations are called and respectively. Each of them has an abstract method, which must be implemented during implementation. returns generic <T> value, returns void. Both of these implementations are inherited from the abstract class. ForkJoinTask Runnable Callable RecursiveAction RecursiveTask compute() RecursiveTask<T>#compute() RecursiveAction#compute() ForkJoinTask To start a task in , use the method : ForkJoinPool T invoke(ForkJoinTask<T> task) ForkJoinTask forkJoinTask = new ForkJoinTaskImpl(...); forkJoinPool.invoke(forkJoinTask); In addition to the method, has the following methods: and . In terms of usage, the is similar to the . But in the case of a fork-join, the thread may not actually fall asleep, it will rather switch to another task. This strategy is called “work stealing”, it allows more efficient use of a limited number of threads. compute() ForkJoinTask fork() join() ForkJoinTask#join() Thread#join() Fibonacci example Let's look at the simple example of the implementation for calculating the Fibonacci numbers: ForkJoinTask public class FibonacciTask extends RecursiveTask<Integer> { private final int n; public FibonacciTask(int n) { this.n = n; } @Override public Integer compute() { if (n < 2) return n; FibonacciTask f1 = new FibonacciTask(n - 1); FibonacciTask f2 = new FibonacciTask(n - 2); f1.fork(); return f2.compute() + f1.join(); } } In this example, implements the method, which creates additional instances and forks them. The method asks the current thread to wait until results are returned by the forked methods. FibonacciTask compute() FibonacciTask join() Merge Sort Let's take a look at the more complex example of - the Merge Sort. It is based on the "Divide and Conquer" principle, old as the world. We need to divide source problems into subtasks, solve them recursively and combine the results: ForkJoinTask For this task we will use as the implementation, since we don't need a return value. We will add as a parameter to the constructor and instantiate the class's field: RecursiveAction ForkJoinTask int[] arr public class MergeSortAction extends RecursiveAction { private final int[] arr; public MergeSortAction(int[] arr) { this.arr = arr; } @Override public void compute() { ... } private void merge(int[] left, int[] right) { ... } } Next, we implement the method. It will accept 2 arrays - and parts. We need to merge these unsorted arrays by comparing elements and assigning the smallest element to the appropriate arr index. When one of the arrays will be empty, the loop will break, but the data from another non-empty array part still needs to be merged. To do this we have two additional while loops to get all the data from both arrays: merge() left right while private void merge(int[] left, int[] right) { int i = 0, j = 0, k = 0; while (i < left.length && j < right.length) { if (left[i] < right[j]) arr[k++] = left[i++]; else arr[k++] = right[j++]; } while (i < left.length) { arr[k++] = left[i++]; } while (j < right.length) { arr[k++] = right[j++]; } } Next, we will write the method to recursively divide the original array and pass the results to the method. To do this we need to calculate the middle index of the array. Then we divide the original array into two parts: and . To fill them we copy original data by calling : compute() merge() left right System.arraycopy(Object src, int srcPos, Object dest, int destPos, int length) @Override public void compute() { if (arr.length < 2) return; int mid = arr.length / 2; int[] left = new int[mid]; System.arraycopy(arr, 0, left, 0, mid); int[] right = new int[arr.length - mid]; System.arraycopy(arr, mid, right, 0, arr.length - mid); ... } Now, we have two separate arrays. Let’s divide them recursively by creating new tasks and run them asynchronously by passing them into method. To sort and combine two arrays we use method: MergeSortAction invokeAll() merge() @Override public void compute() { if (arr.length < 2) return; int mid = arr.length / 2; int[] left = new int[mid]; System.arraycopy(arr, 0, left, 0, mid); int[] right = new int[arr.length - mid]; System.arraycopy(arr, mid, right, 0, arr.length - mid); invokeAll(new MergeSortAction(left), new MergeSortAction(right)); merge(left, right); } Speed testing We’ve finished our parallel Merge Sort. Let’s test it and compare performance with the non-parallel version. I used to generate random numbers and to calculate the execution time: ThreadLocalRandom ZonedDateTime class MergeSortTest { private final List<MergeSort> mergeSortImpls = Arrays.asList(new MergeSortImpl(), new ParallelMergeSortImpl()); @Test void sort() { for (MergeSort mergeSort : mergeSortImpls) { int[] arr = IntStream .range(0, 100_000_000) .map(i -> ThreadLocalRandom.current().nextInt()) .toArray(); ZonedDateTime now = ZonedDateTime.now(); mergeSort.sort(arr); System.out.printf("%s exec time: %dms\n", mergeSort.getClass().getSimpleName(), ChronoUnit.MILLIS.between(now, ZonedDateTime.now())); assertTrue(isSorted(arr)); } } private boolean isSorted(int[] arr) { for (int i = 0; i < arr.length - 1; i++) { if (arr[i] > arr[i + 1]) return false; } return true; } } There are the execution results for 100 million of numbers on 8-core CPU: Conclusion In this article, I showed you an example of how to use Fork/Join Framework. I hope you now have a basic idea of how to speed up your applications. The source code is available . over on GitHub