Building a Thread Pool from Scratch in Java: Understanding Concurrency by Rebuilding the Core of JVM

Image Description : Think of a thread pool like a busy restaurant kitchen. Incoming orders (tasks) are placed on the ticket rail (task queue). A fixed number of chefs (worker threads) continuously pick up orders and cook them using limited kitchen stations (CPU cores). The head chef acts like the scheduler, ensuring work flows smoothly. Locks prevent multiple chefs from grabbing the same pan at once (mutual exclusion), and a sign that says ‘3 burners only’ represents a semaphore limiting how many chefs can use a resource at the same time. Just like a well-run kitchen avoids chaos during rush hour, a well-designed thread pool keeps concurrent programs efficient and organized. I started my MacBook Pro (13-inch, 2020). I was curious: how does this machine actually run multiple threads at once? I looked at the specs. Eight CPU cores total—four performance cores, built to chew through heavy workloads like compiling programs or rendering videos, and four efficiency cores, quietly handling lighter tasks like web browsing and background apps. MacOS decides dynamically which core does what. I asked myself: if I create 20 threads, how many will actually run simultaneously? CPU cores = 8 Threads = 20 I realized: only eight threads run truly in parallel. The remaining twelve wait. The CPU scheduler rotates them using context switches, based on time slices, priorities, waiting events, or thread completion. Threads are cheap to create, but managing them is expensive—memory overhead, scheduling, and constant switching add up. context switches I wrote a naive loop to test: for (int i = 0; i < 1000; i++) { new Thread(task).start(); } for (int i = 0; i < 1000; i++) { new Thread(task).start(); } I watched as the CPU struggled. A thousand threads started, and most of the time was spent juggling rather than executing. I understood: I needed thread reuse. I needed a pool of threads that stay alive and pull work as it arrives. thread reuse Solving I/O-bound and CPU-bound bottlenecks requires different approaches: I/O-bound tasks (waiting for network/disk) are solved by increasing concurrency (async/threading) to handle wait times, while CPU-bound tasks (heavy computation) require parallelism (multi-processing/more cores) or algorithmic optimization to share the processing load. Solving I/O-bound and CPU-bound bottlenecks requires different approaches: I/O-bound tasks (waiting for network/disk) are solved by increasing concurrency (async/threading) to handle wait times, while CPU-bound tasks (heavy computation) require parallelism (multi-processing/more cores) or algorithmic optimization to share the processing load. concurrency parallelism Step 1: Building the Task Queue I asked: How do I feed work to threads efficiently? I needed a task queue—a place where tasks wait for execution. task queue Java gave me BlockingQueue. Perfect. Workers can pull tasks when available or wait efficiently when none exist: BlockingQueue BlockingQueue queue = new LinkedBlockingQueue<>(); I watched the pattern form in my head. Tasks wait quietly, threads sleep when there’s nothing to do, and wake instantly when work arrives. Everything is efficient and predictable. Step 2: Worker Threads I was curious: how should the workers behave? They should continuously check the queue, take tasks, and execute them. Simple. I wrote: continuously check the queue, take tasks, and execute them class Worker extends Thread { private BlockingQueue queue; public Worker(BlockingQueue queue) { this.queue = queue; } public void run() { while (true) { try { Runnable task = queue.take(); // blocks if empty task.run(); } catch (InterruptedException e) { break; // exit cleanly } } } } class Worker extends Thread { private BlockingQueue queue; public Worker(BlockingQueue queue) { this.queue = queue; } public void run() { while (true) { try { Runnable task = queue.take(); // blocks if empty task.run(); } catch (InterruptedException e) { break; // exit cleanly } } } } Worker extends Thread → Each worker is a thread. It can run concurrently with other threads. private BlockingQueue queue → Each worker has access to the shared task queue. queue.take() → The worker pulls a task from the queue. If the queue is empty, the thread waits (blocks) efficiently instead of looping and wasting CPU. task.run() → The worker executes the task it just pulled. InterruptedException → If the thread is asked to stop (via thread.interrupt()), it exits the loop cleanly instead of hanging. Worker extends Thread → Each worker is a thread. It can run concurrently with other threads. Worker extends Thread is a thread private BlockingQueue queue → Each worker has access to the shared task queue. private BlockingQueue queue shared task queue queue.take() → The worker pulls a task from the queue. queue.take() pulls a task from the queue If the queue is empty, the thread waits (blocks) efficiently instead of looping and wasting CPU. waits (blocks) efficiently task.run() → The worker executes the task it just pulled. task.run() InterruptedException → If the thread is asked to stop (via thread.interrupt()), it exits the loop cleanly instead of hanging. InterruptedException thread.interrupt() Why this matters: Workers reuse threads instead of creating a new thread for every task. They wait efficiently when there’s no work → no wasted CPU cycles. They exit gracefully when interrupted → easy to shut down a thread pool. Workers reuse threads instead of creating a new thread for every task. reuse threads They wait efficiently when there’s no work → no wasted CPU cycles. wait efficiently They exit gracefully when interrupted → easy to shut down a thread pool. exit gracefully I imagined them: looping silently, waiting for tasks, springing into action only when needed. If I interrupt them, they exit gracefully. No wasted CPU, no chaos. Step 3: The Thread Pool I asked: how do I put it all together? I created the thread pool: class SimpleThreadPool { private BlockingQueue queue; public SimpleThreadPool(int workers) { queue = new LinkedBlockingQueue<>(); for (int i = 0; i queue; public SimpleThreadPool(int workers) { queue = new LinkedBlockingQueue<>(); for (int i = 0; i () → Creates a thread-safe waiting room for tasks. for (int i = 0; i () → Creates a thread-safe waiting room for tasks. queue = new LinkedBlockingQueue<>() thread-safe waiting room for tasks for (int i = 0; i < workers; i++) new Worker(queue).start(); → Starts workers number of worker threads, all running in parallel and ready to pick up tasks. for (int i = 0; i System.out.println("Running on " + Thread.currentThread().getName()) ); SimpleThreadPool pool = new SimpleThreadPool(4); pool.submit(() -> System.out.println("Running on " + Thread.currentThread().getName()) ); I created 4 worker threads. I submitted a task. One of the four threads instantly picks it up and executes it. The other threads remain alive, waiting for the next task. I created 4 worker threads. 4 worker threads I submitted a task. One of the four threads instantly picks it up and executes it. The other threads remain alive, waiting for the next task. I saw it in my mind: four threads already alive, waiting in the queue. The task lands, one thread picks it up, executes it immediately. Efficiency. Predictability. No wasted cycles. A Question I Couldn't Ignore: What Happens Over Time? What if workload changes? Four threads might be perfect now—but what if traffic spikes? This led me to something I hadn’t considered initially: Dynamic resizing. Dynamic resizing. A real thread pool must: Scale up under load Scale down when idle Scale up under load Scale down when idle That requires: Tracking queue size Monitoring worker utilization Adding/removing workers safely Tracking queue size Monitoring worker utilization Adding/removing workers safely A simple extension might look like: public synchronized void addWorker() { Worker w = new Worker(queue); workers.add(w); w.start(); } public synchronized void addWorker() { Worker w = new Worker(queue); workers.add(w); w.start(); } But removing workers is harder—you need cooperative shutdown via interrupts. This is exactly where production systems get complicated. Shared Memory: The Core Danger I declared a simple counter in my program: counter++; counter++; At first glance, it seems harmless. But I asked myself: what happens when multiple threads try this at the same time? what happens when multiple threads try this at the same time? Under the hood, this operation isn’t atomic. It actually happens in three separate steps: three separate steps Read the value of counter from memory Modify it (+1) Write the new value back to memory Read the value of counter from memory Read counter Modify it (+1) Modify +1 Write the new value back to memory Write If two threads perform these steps simultaneously: Both read the same initial value Both increment independently Both write back → one increment gets lost Both read the same initial value Both increment independently Both write back → one increment gets lost This is a race condition. It’s subtle, invisible in small programs, but it can destroy correctness in concurrent systems. race condition Example (Not Thread-Safe) Imagine a shared counter: public class CounterExample { static int counter = 0; public static void increment() { counter += 1; } public static void main(String[] args) { increment(); System.out.println(counter); } } public class CounterExample { static int counter = 0; public static void increment() { counter += 1; } public static void main(String[] args) { increment(); System.out.println(counter); } } If two threads run increment() simultaneously: two threads increment() Thread A reads counter = 0 Thread B reads counter = 0 Both add 1 Both write back 1 Thread A reads counter = 0 counter = 0 Thread B reads counter = 0 counter = 0 Both add 1 Both write back 1 1 Final result → 1 instead of 2. 1 2 This is called aRace Condition. ❌ Race Condition Mutual Exclusion (Locks) I realized the solution: only allow one thread to modify shared data at a time. only allow one thread to modify shared data at a time In Java, I can do this with synchronized: synchronized public synchronized void increment() { counter++; } The keyword synchronized uses a monitor internally. Every Java object can act as a monitor. Only one thread can enter a synchronized block at a time. Others wait patiently until the lock is released. The keyword synchronized uses a monitor internally. synchronized monitor Every Java object can act as a monitor. Only one thread can enter a synchronized block at a time. Others wait patiently until the lock is released. synchronized wait patiently This guarantees correctness. No thread interference. No lost updates. Thread-Safe Version Using a Mutex (lock): Mutex ReentrantLock lock = new ReentrantLock(true); lock.lock(); try { counter++; } finally { lock.unlock(); } ReentrantLock lock = new ReentrantLock(true); lock.lock(); try { counter++; } finally { lock.unlock(); } Now only one thread at a time can update the counter. ✔️ one thread at a time A lock forces sequential (one-at-a-time) access to a shared resource, even if the program has many threads running. A lock forces sequential (one-at-a-time) access to a shared resource, even if the program has many threads running. lock sequential (one-at-a-time) access to a shared resource Atomic Variables Then I wondered: can I avoid locks entirely? Locks are safe, but they introduce overhead and can block threads. That’s where atomic variables come in. can I avoid locks entirely? atomic variables AtomicInteger counter = new AtomicInteger(0); counter.incrementAndGet(); AtomicInteger ensures the entire increment happens atomically — all at once. The CPU uses compare-and-swap (CAS) instructions behind the scenes to guarantee no other thread interferes. Hundreds of threads can safely increment this counter simultaneously without explicit locking. Race conditions disappear, and performance is higher because threads aren’t blocked. AtomicInteger ensures the entire increment happens atomically — all at once. AtomicInteger entire increment happens atomically The CPU uses compare-and-swap (CAS) instructions behind the scenes to guarantee no other thread interferes. compare-and-swap (CAS) instructions Hundreds of threads can safely increment this counter simultaneously without explicit locking. without explicit locking Race conditions disappear, and performance is higher because threads aren’t blocked. Semaphores Next, I wondered: what if multiple threads share a limited resource, like database connections? I could block threads with locks, but that wastes CPU. I needed a Semaphore. Semaphore A semaphore maintains a count of “permits.” Threads acquire a permit before proceeding and release it when done. If no permits are available, threads wait—sleeping efficiently instead of spinning. Semaphore sem = new Semaphore(3); sem.acquire(); // wait if all permits taken try { // access resource safely } finally { sem.release(); // free the permit } Semaphore sem = new Semaphore(3); sem.acquire(); // wait if all permits taken try { // access resource safely } finally { sem.release(); // free the permit } I visualized it: three database connections exist. Three threads take them. A fourth arrives and sleeps until a permit is released. CPU usage stays efficient, resources aren’t oversubscribed. Semaphores let me control concurrency explicitly. control concurrency explicitly Multithreading Models I asked myself: how do systems organize threads to get work done efficiently? Two models immediately stood out. how do systems organize threads to get work done efficiently? 1. Boss–Worker Model In this model: Boss thread → assigns tasks Worker threads → pick up tasks and execute them Boss thread → assigns tasks Boss thread Worker threads → pick up tasks and execute them Worker threads This is exactly what a thread pool does. exactly what a thread pool does Imagine I submit 10 tasks to a pool of 4 workers: Boss -> Task 1 -> Worker 1 executes -> Task 2 -> Worker 2 executes -> Task 3 -> Worker 3 executes -> Task 4 -> Worker 4 executes -> Task 5 -> Worker 1 executes (after finishing Task 1) Boss -> Task 1 -> Worker 1 executes -> Task 2 -> Worker 2 executes -> Task 3 -> Worker 3 executes -> Task 4 -> Worker 4 executes -> Task 5 -> Worker 1 executes (after finishing Task 1) Tasks are assigned dynamically. Workers reuse threads, waiting for new tasks after completing previous ones. Efficiency is high because threads aren’t constantly created or destroyed, and no task is left unattended. Tasks are assigned dynamically. Workers reuse threads, waiting for new tasks after completing previous ones. reuse threads Efficiency is high because threads aren’t constantly created or destroyed, and no task is left unattended. threads aren’t constantly created or destroyed This model is simple, predictable, and the backbone of almost every thread pool implementation. simple, predictable, and the backbone of almost every thread pool implementation 2. Fork–Join Model I wondered: what if a single task is huge and can be split? what if a single task is huge and can be split? Fork–Join divides a large task into smaller subtasks. Subtasks are executed in parallel by worker threads. When all subtasks finish, results are joined to produce the final outcome. Fork–Join divides a large task into smaller subtasks. Fork–Join subtasks Subtasks are executed in parallel by worker threads. When all subtasks finish, results are joined to produce the final outcome. joined Example: parallel sorting of an array Original array: [7, 2, 9, 4] Fork -> divide into [7,2] and [9,4] Workers sort [7,2] -> [2,7], [9,4] -> [4,9] Join -> merge sorted arrays -> [2,4,7,9] Original array: [7, 2, 9, 4] Fork -> divide into [7,2] and [9,4] Workers sort [7,2] -> [2,7], [9,4] -> [4,9] Join -> merge sorted arrays -> [2,4,7,9] Java’s Fork/Join Framework handles the splitting, scheduling, and joining automatically. Used for divide-and-conquer algorithms, parallel streams, and any task that can be recursively split. Efficiency comes from maximizing CPU core usage while keeping tasks balanced. Java’s Fork/Join Framework handles the splitting, scheduling, and joining automatically. Fork/Join Framework Used for divide-and-conquer algorithms, parallel streams, and any task that can be recursively split. divide-and-conquer algorithms parallel streams Efficiency comes from maximizing CPU core usage while keeping tasks balanced. maximizing CPU core usage Thread Coordination Threads sometimes need to cooperate: Thread.yield() lets a thread voluntarily give up the CPU slice so others can run. Thread.yield() lets a thread voluntarily give up the CPU slice so others can run. Thread.yield() Example: for (int i = 0; i < 1000; i++) { // some computation Thread.yield(); // hint to scheduler: let others run } for (int i = 0; i < 1000; i++) { // some computation Thread.yield(); // hint to scheduler: let others run } I imagined threads in my head: Thread A is computing Thread B is waiting Thread A calls yield() → scheduler may give Thread B a turn immediately Thread A is computing Thread B is waiting Thread A calls yield() → scheduler may give Thread B a turn immediately yield() This allows the system to balance CPU usage naturally without forcing threads to sleep or busy-wait. balance CPU usage naturally Thread.interrupt() stops threads cooperatively. My worker threads rely on interrupts to exit cleanly. Thread.interrupt() stops threads cooperatively. My worker threads rely on interrupts to exit cleanly. Thread.interrupt() I wondered: how can I stop a thread cleanly without killing the JVM? how can I stop a thread cleanly without killing the JVM? Thread.interrupt() lets a thread know it should stop. The thread can check its interrupted status or catch InterruptedException if it’s blocked on something (like a queue). Thread.interrupt() know it should stop InterruptedException In my thread pool: try { Runnable task = queue.take(); // blocks if no task } catch (InterruptedException e) { break; // exit loop cleanly } I can signal the worker threads to stop waiting and exit immediately. Threads don’t just vanish mid-task—they cooperate, finish what they’re doing, and shut down safely. I can signal the worker threads to stop waiting and exit immediately. stop waiting and exit immediately Threads don’t just vanish mid-task—they cooperate, finish what they’re doing, and shut down safely. cooperate Concurrency Failures I saw the dangers: Deadlock—two threads waiting for each other: Deadlock Thread A → lock1 → waiting for lock2 Thread B → lock2 → waiting for lock1 Thread A → lock1 → waiting for lock2 Thread B → lock2 → waiting for lock1 Both stuck forever. Livelock—threads react to each other but never make progress. They move but remain stuck. Livelock Limits of Parallelism Even perfect concurrency has limits. I remembered Amdahl’s Law: Amdahl’s Law Speedup = 1 / (S + P/N) S = serial portion, P = parallel portion, N = CPU cores. Speedup = 1 / (S + P/N) S = serial portion, P = parallel portion, N = CPU cores. Even if 10% of a program is serial, the max speedup is only 10×. Adding cores won’t help. The ceiling exists. Virtual Threads vs Regular Java Threads Virtual Threads vs Regular Java Threads I thought I understood Java concurrency now. Threads pull tasks from queues. The OS scheduler assigns them to cores. The pool keeps the thread count close to the CPU count. Eight cores, eight workers — elegant. Then I started thinking about servers. The Problem I Hadn't Considered The Problem I Hadn't Considered Imagine a typical web request. It arrives, queries a database, waits for the response, calls another microservice, waits again, reads from disk, waits again. Most of its life is just... waiting. And the entire time it waits, the OS thread it's sitting on is reserved. Blocked. Doing nothing. Burning memory. Counting against my pool. reserved I asked myself: What if my server gets 10,000 simultaneous requests? Each one waiting on I/O? That means 10,000 OS threads. Each one consumes roughly 1 MB of stack memory. That's 10 GB just to sit there and wait. The CPU is barely involved — the threads are blocked, not running. But the OS is tracking every single one of them, scheduling them, context-switching between them. The overhead is enormous. sit there and wait For years, developers worked around this with callbacks, event loops, and reactive frameworks. Beautiful but complex. And all of it existed for exactly one reason: OS threads are too expensive to block. OS threads are too expensive to block. I finally understood the real constraint. It wasn't the CPU. It was threads waiting. threads waiting What the JVM Does Differently What the JVM Does Differently Java 21 introduced virtual threads, and once I understood the problem above, the solution made immediate sense. A regular Java thread — what Java now calls a platform thread — maps directly to an OS thread. One-to-one. My thread pool creates 8 Java threads, the OS creates 8 OS threads. I'm capped by OS cost. platform thread A virtual thread doesn't work that way. It's managed entirely by the JVM, not the OS. The JVM maintains a small pool of real OS threads — called carrier threads — and it schedules virtual threads onto them dynamically. carrier threads Application Task ↓ Java Virtual Thread ← lightweight, managed by JVM ↓ JVM Scheduler ← multiplexes thousands of virtual threads ↓ Small pool of OS Threads ← just a handful ↓ CPU Cores Application Task ↓ Java Virtual Thread ← lightweight, managed by JVM ↓ JVM Scheduler ← multiplexes thousands of virtual threads ↓ Small pool of OS Threads ← just a handful ↓ CPU Cores I imagined it like a theater. There are thousands of actors — virtual threads — waiting backstage. But there are only eight spots on the stage — OS threads. Normally, only actors actively performing get on stage. The moment one actor pauses to wait for something, they step off. Another actor immediately steps on. The stage is never idle. This is the key insight: when a virtual thread blocks on I/O, the JVM detaches it from the OS thread, parks it, and gives that OS thread to another virtual thread. The CPU stays busy. The OS thread is never wasted. when a virtual thread blocks on I/O, the JVM detaches it from the OS thread, parks it, and gives that OS thread to another virtual thread. Why Virtual Threads Are So Cheap Why Virtual Threads Are So Cheap I asked: if this is so clever, why did it take until Java 21? Because virtual threads required the JVM itself to intercept every blocking call — file reads, database queries, network waits — and convert them into parking/resuming operations under the hood. That's deep plumbing. Project Loom spent years building it. The result: creating a virtual thread is nearly free. Their stacks grow dynamically instead of pre-allocating 1 MB. I can spin up a million of them without breaking a sweat. java // This is now reasonable — one virtual thread per request try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { executor.submit(task); } // This is now reasonable — one virtual thread per request try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { executor.submit(task); } With platform threads, I'd never write this. Ten thousand tasks would mean ten thousand OS threads — a disaster. With virtual threads, the JVM handles the multiplexing. Ten thousand tasks, a handful of OS threads, full CPU utilization. No special async code needed. No callbacks. Just write blocking code normally, and the JVM makes it efficient. When to Use Which This is where it gets subtle. I had assumed virtual threads were strictly better. They're not. Virtual threads shine when tasks spend most of their time *waiting* — web servers, database calls, microservices, file pipelines. The JVM keeps the CPU busy by shuffling waiting threads off their carriers and running something else. The more waiting, the more virtual threads help. But for CPU-intensive work — video encoding, cryptography, scientific computation, compiling code — threads never wait. They run continuously. Even with a million virtual threads, I still only have eight cores. Only eight threads execute at once. Creating thousands of virtual threads that all want CPU time just means the JVM scheduler has more work to do with no benefit. For CPU work, the rule I already knew still holds: When to Use Which Thread count ≈ CPU cores Thread count ≈ CPU cores For I/O work, that rule disappears entirely. I can create as many virtual threads as I have tasks. One Subtle Trap: Thread Pinning I was almost done forming my mental model when I found the one rough edge. If a virtual thread enters a `synchronized` block and then blocks inside it — waiting on I/O, for example — the JVM *cannot* detach it from its carrier OS thread. The virtual thread gets *pinned*. The OS thread is stuck until the synchronized block finishes. The whole advantage evaporates for that thread. This is called thread pinning, and it's why modern concurrent code under virtual threads tends to prefer ReentrantLock over synchronized for anything that might block. ReentrantLock cooperates with the JVM's parking mechanism. synchronized doesn't. thread pinning ReentrantLock synchronized ReentrantLock synchronized Nothing breaks. But if I'm writing new code for virtual threads, I keep my `synchronized` blocks short and out of any I/O path. Everything Else Still Applies Everything Else Still Applies One thing surprised me: all the synchronization rules I'd learned — synchronized, AtomicInteger, Semaphore, ReentrantLock — still work exactly the same with virtual threads. Virtual threads change how threads are scheduled, not how shared memory works. synchronized, AtomicInteger, Semaphore, ReentrantLock how threads are scheduled how shared memory works Race conditions are still race conditions. Deadlocks are still deadlocks. The primitives are unchanged. Virtual threads are not a replacement for understanding concurrency. They're a release valve for the one specific problem that had been making concurrent servers so expensive to write: the cost of blocking. My mental model finally felt complete: Platform threads → maximize CPU usage → for CPU-bound work Virtual threads → maximize concurrency → for I/O-bound work Synchronization → protect shared memory → always required, either way Platform threads → maximize CPU usage → for CPU-bound work Virtual threads → maximize concurrency → for I/O-bound work Synchronization → protect shared memory → always required, either way ExecutorService: The Production Version of a Thread Pool ExecutorService: The Production Version of a Thread Pool I had just built my own thread pool. Workers looping on a BlockingQueue. Tasks submitted, picked up, executed. It worked. BlockingQueue Then I asked myself: what am I missing? I thought through it carefully. What happens when a task throws an exception? My worker thread dies silently — and now my pool has three workers instead of four, and I have no idea. What if I need to wait for a task to finish and get its result back? Runnable returns nothing. What about shutting down gracefully — draining the queue, finishing in-flight tasks, and stopping the workers cleanly? My implementation had none of this. Runnable I had built the skeleton. Java ships the whole body. The Abstraction The Abstraction ExecutorService is exactly what I built, done properly. Instead of creating threads directly, I hand tasks to an executor and it decides when and where they run. ExecutorService ExecutorService executor = Executors.newFixedThreadPool(4); executor.submit(() -> { System.out.println("Running on " + Thread.currentThread().getName()); }); ExecutorService executor = Executors.newFixedThreadPool(4); executor.submit(() -> { System.out.println("Running on " + Thread.currentThread().getName()); }); Internally: four worker threads, a BlockingQueue, workers looping on queue.take(), executing whatever they pull. My SimpleThreadPool, battle-tested and production-hardened. BlockingQueue queue.take() SimpleThreadPool The difference is everything I didn't think to build. Getting Results Back Getting Results Back With Runnable, I could fire tasks and forget them. But I often need answers. Java solves this with Callable — a task that returns a value — and Future, which represents the result of a computation that hasn't finished yet. Runnable Callable Future Callable task = () -> { return 42; }; Future future = executor.submit(task); Integer result = future.get(); // blocks until the result is ready Callable task = () -> { return 42; }; Future future = executor.submit(task); Integer result = future.get(); // blocks until the result is ready I submit the task and get a Future response immediately. My thread continues doing other work. When I actually need the result, I call future.get() — it blocks until the computation finishes and hands me the value. Future future.get() This clicked for me: Future is a placeholder for a value that doesn't exist yet. I can submit ten tasks, collect ten futures, do other work, then retrieve all ten results. That's real concurrency — not just firing and forgetting, but coordinating across tasks. Future placeholder for a value that doesn't exist yet Handling Failure Handling Failure Here's the part my simple pool completely ignored. What if a task throws an exception? With Runnable, exceptions disappear silently into the void. The worker thread catches them, and unless I've written special handling, nothing happens — no log, no retry, no signal to the caller. Runnable With Future, exceptions are captured and held. When I call future.get(), if the task threw an exception, it's re-thrown wrapped in an ExecutionException. The failure doesn't vanish. It surfaces at the point where I asked for the result. Future future.get() ExecutionException try { Integer result = future.get(); } catch (ExecutionException e) { Throwable cause = e.getCause(); // the original exception } try { Integer result = future.get(); } catch (ExecutionException e) { Throwable cause = e.getCause(); // the original exception } And crucially: the worker thread doesn't die. ExecutorService catches the exception internally and keeps the thread alive, ready for the next task. My naive implementation had no such protection — one bad task could silently kill a worker and slowly starve the pool. doesn't die ExecutorService Choosing the Right Pool Choosing the Right Pool I noticed Java gives me several executor types, each for a different workload pattern. newFixedThreadPool(n) — what I built. Fixed number of workers. Tasks queue up when all workers are busy. Right for CPU-bound work where I want exactly as many threads as cores. newFixedThreadPool(n) newCachedThreadPool() — no fixed limit. Creates threads as needed, reuses idle ones. Right for short-lived tasks that arrive in bursts. Dangerous if tasks never stop arriving — threads accumulate without bound. newCachedThreadPool() newSingleThreadExecutor() — exactly one worker. Tasks execute one at a time, in order. Right for when sequence matters, or when I need to serialize access to something without using synchronized. newSingleThreadExecutor() synchronized newScheduledThreadPool(n) — runs tasks after a delay, or repeatedly on a fixed schedule: newScheduledThreadPool(n) scheduler.scheduleAtFixedRate(task, 0, 1, TimeUnit.SECONDS); scheduler.scheduleAtFixedRate(task, 0, 1, TimeUnit.SECONDS); Right for background jobs, polling, and cleanup work that needs to happen periodically. Each one wraps the same core architecture — BlockingQueue, worker loop, carrier threads — but tuned for a different shape of workload. Knowing which to reach for is knowing what my tasks actually do. BlockingQueue Shutting Down Shutting Down My original pool had no shutdown. Workers looped forever. The only way out was killing the JVM. ExecutorService gives me two options, and the difference matters. ExecutorService executor.shutdown(); // finish what's queued, then stop executor.shutdownNow(); // interrupt everything immediately executor.shutdown(); // finish what's queued, then stop executor.shutdownNow(); // interrupt everything immediately shutdown() is graceful. It stops accepting new tasks, lets the queue drain, waits for in-flight work to finish, then terminates the workers cleanly. This is what I almost always want. shutdown() shutdownNow() is urgent. It interrupts running tasks and returns the ones still waiting in the queue — unexecuted — as a list. Right for when I need to stop immediately and I'm willing to abandon queued work. shutdownNow() Workers exit cleanly from either path because they catch InterruptedException and break out of their loop — exactly the pattern I'd written in my own Worker class. InterruptedException Worker Plugging in Virtual Threads Plugging in Virtual Threads Everything I just described — submit(), Future, Callable, shutdown() — works identically whether my workers are platform threads or virtual threads. The interface doesn't change. Only the executor factory does. submit() Future Callable shutdown() // Platform threads — sized to CPU cores ExecutorService executor = Executors.newFixedThreadPool(8); // Virtual threads — one per task, JVM handles the rest ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor(); // Platform threads — sized to CPU cores ExecutorService executor = Executors.newFixedThreadPool(8); // Virtual threads — one per task, JVM handles the rest ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor(); With the virtual thread executor, I stop thinking about pool size entirely. Every submitted task gets its own virtual thread. The JVM multiplexes them onto a handful of OS threads. For I/O-heavy workloads — the ones that spend most of their time waiting — this is dramatically more efficient than any fixed pool I could size manually. The same ExecutorService interface. The same submit(), get(), shutdown() calls. Different execution model underneath. ExecutorService submit() get() shutdown() The Full Picture The Full Picture I started by building a thread pool from scratch because I wanted to understand what was happening inside. Now I do. And now I can see exactly what ExecutorService adds on top: ExecutorService Task results, via Callable and Future Failure capture, so exceptions surface instead of disappearing Worker resilience, so a bad task doesn't kill the pool Graceful and immediate shutdown paths Pool shape variants for different workload profiles A clean seam where I can swap platform threads for virtual threads Task results, via Callable and Future Callable Future Failure capture, so exceptions surface instead of disappearing Worker resilience, so a bad task doesn't kill the pool Graceful and immediate shutdown paths Pool shape variants for different workload profiles A clean seam where I can swap platform threads for virtual threads My SimpleThreadPool was the mechanism. ExecutorService is the mechanism plus everything you learn you need once you try to run it in production. SimpleThreadPool ExecutorService Few problems to help solidify your understanding: https://leetcode.com/problems/web-crawler-multithreaded https://leetcode.com/problems/design-bounded-blocking-queue https://leetcode.com/problems/fizz-buzz-multithreaded https://leetcode.com/problems/web-crawler-multithreaded https://leetcode.com/problems/design-bounded-blocking-queue https://leetcode.com/problems/fizz-buzz-multithreaded