Definitive Guide to Multi-Threaded Rendering on the Web

A typical software engineer deals daily with threads, processes, synchronization, race conditions, context sharing etc. A typical frontend engineer does not, but to build modern scalable interactive apps, one should.

Why do multi threaded rendering ?

The DOM is single threaded (still is, and might be forever). But we want to do more with it. Here are some cases where a single thread starts becoming a bottleneck:

Heavy data visualizations, dashboards with multiple visualizations
Apps with complex and sophisticated interaction patterns
Interactive infographics
Physics simulations
Low powered devices

What are my options?

Multi threading on the web can be classified into four broad categories:

Compute only
Prioritized scheduling
Parallelized create DOM
Parallelized create and mutate
- Canvas
- DOM

Compute Only

This is the traditional Web worker model. Compute on the client can be distributed to multiple Web Workers.

Press enter or click to view the image in full size

Implementations:

Web Workers and Friends (Shared worker, Service worker)
AudioWorklet: Run audio processing in a separate thread
React Worker DOM (Virtual DOM computes in a separate thread)
WebGPU Compute Shaders

❤️	😭
Supported by the platform OOTB	Limited to compute, no access to DOM
Worker threads are lightweight	API is a bit clunky in some cases
Workers can make HTTP calls	Transferring data between workers can be expensive due to serialization, cannot transfer functions.

Bonus: SharedArrayBuffer and Atomics

The Web Worker message-passing model has a fundamental limitation: data must be copied or transferred between threads. For large datasets this serialization overhead can negate the benefits of offloading work.

SharedArrayBuffer solves this by allowing multiple threads to read and write to the same memory region. Combined with Atomics for synchronization, you get primitives similar to threads in C++ or Java.

❤️	😭
Zero-copy data sharing between threads	Only works with typed arrays, not arbitrary objects
Significant performance gains for large datasets	Requires COOP/COEP headers, breaks embedding scenarios
Enables true shared-memory parallelism	Still no access to DOM

Prioritized scheduling

Work is rescheduled as per priority, giving a sense of a responsive application. Still uses a single thread.

Implementations:

❤️	😭
Simple to use if you already use the latest React versions	No benefit to the initial render performance
	Dependency on React as a framework for everything
	Single threaded, so low end CPU devices do not benefit
	Repriotizes existing work, the strategy will fail when there is just more work to do like data visualizations

Parallelized create DOM

Single compute thread (main thread)
The initial render load is shared by multiple workers.
PS: The worker is generally a server side process.

Press enter or click to view image in full size

Implementations:

Facebook: Bigpipe
Ebay: Async Fragments

❤️	😭
Fast initial render performance, as DOM can be precreated on the server	Hydration on the client might be complex/expensive
	No perf benefits beyond the first render
	Need to maintain a server side DOM implementation
	Not all features are supported in server side rendering

Parallelized create and Mutate (Canvas)

With the new Offscreen canvas API (widely available since March 2023), you can create and control a canvas from a Worker. This brings us within striking distance from our goal, true multi-threaded rendering.

Implementations:

Offscreen Canvas with transferControlToOffscreen.
ChartJS Parallel rendering

❤️	😭
Create and Mutate visual elements from a Worker thread	Canvas is a very low level API, need to use an abstraction layer
Simple API	Not too many feature rich Canvas libraries exist vs SVG/HTML rendering (React, D3, Highcharts etc)
WebGL/WebGPU support	Canvas is not responsive, need to redraw when resized
	Need to handle DOM events from the Main thread, as workers do not have DOM access
	Canvas is stateless, so state updates/interactivity requires full redraw vs surgical updates

Parallelized create and mutate (DOM)

The DOM is both created and mutated by separate workers. There are two approaches which make this possible, and we will talk about the current implementations for each.

Web Worker w/ DOM

Worker DOM: The DOM API in a web worker

Worker DOM library implemented DOM within web workers, all the mutations are done within the worker, and then periodically synced with the Main DOM. Checkout these slides for more details on how this works under the hood.

❤️	😭
Performance benefits both for the first render and subsequent mutations	The complexity of maintaining a parallel DOM implementation, which will lag behind the browser's implementation
Uses the familiar WebWorker API	Some APIs need a workaround to work. Some APIs cannot be supported

Parallel DOM via cross-origin SubFrames

PDom: Multiprocess DOM via cross-origin Iframes

With the release of performance isolation in Chrome 88, it’s now possible to have multiple subframes on a webpage, which might be running in a separate process. The PDoM library tries to exploit this capability by providing an ergonomic abstraction for web developers to use.

❤️	😭
Uses the web platform, with a thin abstraction layer. No new DOM implementation.	Need to set up a separate web server with specialized DNS config
All DOM APIs are supported; no need to change the code	Only supported in Chromium-based browsers (Chrome/Edge) as of today.
First-class support to parallelize any React component

That’s all, folks!

What we didn’t talk about today is that you could also use the above techniques in combination with one another. For eg, you could use the “Compute only worker threads” with “Parallelized create only” to achieve performance benefits beyond just the initial render.