An Intro to Spring WebFlux Threading Model

Spring WebFlux is a reactive, non-blocking web framework for building modern, scalable web applications in Java. It is a part of the Spring Framework, and it uses the Reactor library for implementing reactive programming in Java.

With WebFlux, you can build high-performance, scalable web applications that can handle a large number of concurrent requests and streams of data. It supports a wide range of use cases, from simple REST APIs to real-time data streaming and server-sent events.

Spring WebFlux provides a programming model based on reactive streams, which allows you to compose asynchronous and non-blocking operations into a pipeline of data processing stages. It also provides a rich set of features and tools for building reactive web applications, including support for reactive data access, reactive security, and reactive testing.

From the official Spring doc:

The term, “reactive,” refers to programming models that are built around reacting to change — network components reacting to I/O events, UI controllers reacting to mouse events, and others. In that sense, non-blocking is reactive, because, instead of being blocked, we are now in the mode of reacting to notifications as operations complete or data becomes available.

Threading model

One of the core features of reactive programming is its threading model, which is different from the traditional thread-per-request model used in many synchronous web frameworks.

In the traditional model, a new thread is created to handle each incoming request, and that thread is blocked until the request has been processed. This can lead to scalability issues when dealing with high volumes of requests, as the number of threads required to handle the requests can become very large, and thread context switching can become a bottleneck.

In contrast, WebFlux uses a non-blocking, event-driven model, where a small number of threads can handle a large number of requests. When a request comes in, it is handled by one of the available threads, which then delegates the actual processing to a set of asynchronous tasks. These tasks are executed in a non-blocking manner, allowing the thread to move on to handle other requests while the tasks are executed in the background.

In Spring WebFlux (and non-blocking servers in general), it is assumed that applications do not block. Therefore, non-blocking servers use a small, fixed-size thread pool (event loop workers) to handle requests.

Simplified threading model of a classic Servlet Container looks like:

While WebFlux request processing is slightly different:

Under the hood

Let’s go ahead and see what’s behind the shiny theory.

We need a pretty minimalistic app generated by Spring Initializr. The code is available in the GitHub repo.

All thread-related topics are very CPU-dependent. Usually, number of processing threads that handle requests is related to the number of CPU cores. For educational purposes you can easily manipulate count of threads in a pool by limiting CPUs when running Docker container:

docker run --cpus=1 -d --rm --name webflux-threading -p 8081:8080 local/webflux-threading

If you still see more than one thread in a pool - that’s ok. There might be defaults set by WebFlux.

Our app is a simple fortune teller. By calling /karma endpoint you will get 5 records with balanceAdjustment. Each adjustment is an integer number that represents a karma given to you. Yes, we are very generous because the app generates only positive numbers. No bad luck anymore!

Default processing

Let’s start with a very basic example. Next controller method returns a Flux containing 5 karma elements.

@GetMapping("/karma")
public Flux<Karma> karma() {
    return prepareKarma()
            .map(Karma::new)
            .log();
}

private Flux<Integer> prepareKarma() {
    Random random = new Random();

    return Flux.fromStream(
            Stream.generate(() -> random.nextInt(10))
                    .limit(5));
}

log method is a crucial thing here. It observes all Reactive Streams signals and traces them into logs under INFO level.

Logs output on curl localhost:8081/karma is the following:

As we can see, processing is happening on the IO thread pool. Thread name ctor-http-nio-2 stands for reactor-http-nio-2. Tasks were executed immediately on a thread that submitted them. Reactor didn’t see any instructions to schedule them on another pool.

Delaying and parallel processing

The next operation is going to delay each element emitting by 100ms (aka database emulation)

@GetMapping("/delayedKarma")
public Flux<Karma> delayedKarma() {
    return karma()
            .delayElements(Duration.ofMillis(100));
}

We don’t need to add log method here because it was already declared in the original karma() call.

In the logs we can see the next picture:

This time only the very first element was received on the IO thread reactor-http-nio-4. Processing of the rest 4 were dedicated to a parallel thread pool.

Javadoc of delayElements confirms this:

Signals are delayed and continue on the parallel default Scheduler

You can achieve the same effect without delay by specifying .subscribeOn(Schedulers.parallel()) anywhere in the call chain.

Using parallel scheduler can improve performance and scalability by allowing multiple tasks to be executed simultaneously on different threads, which can better utilize CPU resources and handle a large number of concurrent requests.

However, it can also increase code complexity and memory usage, and potentially lead to thread pool exhaustion if the maximum number of worker threads is exceeded. Therefore, the decision to use parallel thread pool should be based on the specific requirements and trade-offs of the application.

Subchain

Now let’s take a look at a more complex example. Code is still pretty simple and straightforward, but the output is way more interesting.

We are going to use a flatMap and make a fortune teller more fair. For each Karma instance it will multiply original adjustment by 10 and generate the opposite adjustments, effectively creating a balanced transaction that compensates for the original one.

@GetMapping("/fairKarma")
public Flux<Karma> fairKarma() {
    return delayedKarma()
            .flatMap(this::makeFair);
}

private Flux<Karma> makeFair(Karma original) {
    return Flux.just(new Karma(original.balanceAdjustment() * 10),
                    new Karma(original.balanceAdjustment() * -10))
            .subscribeOn(Schedulers.boundedElastic())
            .log();
}

As you see, makeFair’s Flux should be subscribed to a boundedElastic thread pool. Let’s check what we have in logs for the first two Karmas:

Reactor subscribes first element with balanceAdjustment=9 on IO thread
Then boundedElastic pool works on Karma fairness by emitting 90 and -90 adjustments on boundedElastic-1 thread
Elements after the first one are subscribed on parallel thread pool (because we still have delayedElements in the chain)

What is a `boundedElastic` scheduler?

It is a thread pool that dynamically adjusts the number of worker threads based on the workload. It is optimized for I/O-bound tasks, such as database queries and network requests, and is designed to handle a large number of short-lived tasks without creating too many threads or wasting resources.

By default, the boundedElastic thread pool has a maximum size of the number of available processors multiplied by 10, but you can configure it to use a different maximum size if needed

By using an asynchronous thread pool like boundedElastic, you can offload tasks to separate threads and free up the main thread to handle other requests. The bounded nature of the thread pool can prevent thread starvation and excessive resource usage, while the elasticity of the pool allows it to adjust the number of worker threads dynamically based on the workload.

Other types of thread pools

There are two more types of pools provide by out-of-the-box Scheduler class, such as:

single: This is a single-threaded, serialized execution context that is designed for synchronous execution. It is useful when you need to ensure that a task is executed in order and that no two tasks are executed concurrently.
immediate: This is a trivial, no-op implementation of a scheduler that immediately executes tasks on the calling thread without any thread switching.

Conclusion

Threading model in Spring WebFlux is designed to be non-blocking and asynchronous, allowing for efficient handling of a large number of requests with minimal resource usage. Instead of relying on dedicated threads per connection, WebFlux uses a small number of event loop threads to handle incoming requests and distribute work to worker threads from various thread pools.

However, it's important to choose the right thread pool for your use case to avoid thread starvation and ensure efficient use of system resources.

An Intro to Spring WebFlux Threading Model

Threading model

Under the hood

Default processing

Delaying and parallel processing

Subchain

What is a boundedElastic scheduler?

Other types of thread pools

Conclusion

What is a `boundedElastic` scheduler?