Inside Java Virtual Threads: Architecture, Scheduling, and Performance

Virtual threads (Project Loom) represent a fundamental shift in how Java handles concurrent workloads. Rather than a simple API change, they introduce a new execution model built on continuations and cooperative scheduling. This article explores the internals: what virtual threads are, how they interact with the JVM scheduler, their performance implications, and where they fit alongside reactive and async patterns.

1. Platform Threads vs. Virtual Threads

Platform Threads (1:1 Model)

Traditionally, Java threads map directly to OS threads (1:1 model). Each java.lang.Thread:

Allocates 1–2 MB of stack memory (OS-dependent)
Has its own kernel context and registers
Carries full scheduling responsibility to the OS kernel
Limits practical concurrency to thousands (before memory/scheduler exhaustion)

For I/O-bound workloads (network requests, database queries), most platform threads spend time blocked, wasting OS resources.

Virtual Threads (Many-to-One Model)

Virtual threads are lightweight, user-space threads managed by the JVM. Key properties:

Minimal memory footprint (~200 bytes vs. 1+ MB)
Scheduled on a limited pool of carrier threads (OS threads)
Automatically suspended when blocking on I/O or other I/O operations
Can scale to millions of concurrent instances on modest hardware

Virtual threads don’t replace platform threads; they’re a higher-level abstraction sitting on top of them.

2. Carrier Threads and the JVM Scheduler

Carrier Thread Architecture

A carrier thread is a platform thread that executes virtual thread code. The JVM maintains:

Default carrier pool: typically ForkJoinPool with a size of 2 × CPU cores
Virtual thread scheduler: decides which virtual thread runs on which carrier thread

Virtual Thread 1 ─┐
Virtual Thread 2 ─┤── Carrier Thread A
Virtual Thread 3 ─┘

Virtual Thread 4 ─┐
Virtual Thread 5 ─┤── Carrier Thread B
Virtual Thread 6 ─┘

Scheduling Model

The JVM scheduler:

Mounts a virtual thread onto a carrier thread (runs it)
Suspends it when it hits a blocking point (via continuations)
Unmounts it and schedules another virtual thread on that carrier
Later, resumes the virtual thread when the blocking operation completes

This is non-preemptive, cooperative scheduling: a virtual thread yields control voluntarily, not forced by the OS.

3. Continuations: The Mechanical Heart

Virtual threads rely on continuations — a mechanism to pause and resume execution mid-method without unwinding the stack.

What is a Continuation?

A continuation captures the entire execution state:

Local variables and method parameters
Call stack frames
Program counter (instruction pointer)

When a blocking operation occurs (e.g., Socket.read()), the JVM:

Saves the continuation state
Suspends the virtual thread
Releases the carrier thread to run another virtual thread
Later, when I/O completes, resumes the saved continuation

Example: Under the Hood

var executor = Executors.newVirtualThreadPerTaskExecutor();
executor.submit(() -> {
    System.out.println("Start"); // Virtual thread mounted
    var data = socket.read();     // BLOCKING CALL
    System.out.println(data);      // Resumed later
});

Timeline:

Virtual thread starts; mounted on carrier thread A
socket.read() blocks waiting for network data
JVM captures continuation and unmounts virtual thread
Carrier thread A is freed; another virtual thread mounts
Network data arrives; OS wakes the JVM’s I/O handler
JVM resumes the first virtual thread’s continuation on any available carrier thread
Execution continues from socket.read() (transparently to the application)

4. Pinning: The Hidden Gotcha

What is Pinning?

Pinning occurs when a virtual thread cannot be suspended and unmounted from its carrier thread, effectively blocking the carrier. This ruins the scalability benefit.

When Does Pinning Happen?

Synchronized blocks or methods
```
synchronized(lock) {
    socket.read(); // Virtual thread PINNED; cannot unmount
}
```
The JVM cannot unmount a virtual thread while holding a monitor lock (Java’s synchronized).

Calling native code via JNI that blocks

nativeBlockingCall(); // Pinned while native code runs

Thread-local variables accessed inside blocking calls

ThreadLocal.get(); // Access inside blocking region may pin

Why is Pinning Bad?

If many virtual threads pin simultaneously, the carrier threads are exhausted, and queued virtual threads stall.

Virtual Thread 1 (pinned) ─┐
Virtual Thread 2 (pinned) ─┤── Only 2 carriers total
Virtual Thread 3 (blocked) ─ waiting for a carrier!
Virtual Thread 4 (blocked) ─ waiting for a carrier!

Avoiding Pinning

Replace synchronized with ReentrantLock (does not pin)
Use StampedLock or ReadWriteLock for fine-grained control
Keep native code execution short or avoid blocking in JNI

5. Blocking Calls, I/O, and Monitor Interaction

Blocking Operations That Suspend (Don’t Pin)

Virtual threads are suspended (unmounted) on:

Socket I/O: Socket.read(), write()
File I/O: FileInputStream.read(), FileOutputStream.write() (if async-capable)
Thread.sleep()
Lock.lock() (ReentrantLock, not synchronized)
Coordination primitives: CountDownLatch.await(), Semaphore.acquire()

Interaction with java.lang.Thread.currentThread()

var vt = Thread.ofVirtual().start(() -> {
    System.out.println(Thread.currentThread().isVirtual()); // true
});

Existing APIs that call currentThread() work unchanged; the virtual thread identity is preserved across suspensions.

Interaction with Exception Handling

Stack traces and exception handling remain unchanged from the application’s perspective:

try {
    socket.read(); // Virtual thread suspended here
} catch (IOException e) {
    e.printStackTrace(); // Shows correct virtual thread stack
}

6. Performance Characteristics and Limits

Memory and CPU Overhead

Metric	Platform Thread	Virtual Thread
Memory per thread	~1–2 MB	~200 bytes
Max scalable threads	~10K–50K	1M+
Creation overhead	High (~1 µs)	Very low (~100 ns)
Context-switch cost	High (kernel)	Low (JVM)

Throughput Example: Echo Server

Platform Threads (thread-per-request):

while (true) {
    Socket client = server.accept();
    new Thread(() -> handleClient(client)).start(); // New thread per request
}

Scales to ~1,000 concurrent connections before resource exhaustion.

Virtual Threads (virtual thread per request):

var executor = Executors.newVirtualThreadPerTaskExecutor();
while (true) {
    Socket client = server.accept();
    executor.submit(() -> handleClient(client)); // Virtual thread per request
}

Scales to 100,000+ concurrent connections on the same hardware.

Latency and Tail Latency

Virtual threads reduce tail latency for I/O-bound applications:

No queue-head-of-line blocking (scheduler moves to next unblocked thread)
Lower context-switch overhead
Better cache locality (virtual threads on same carrier share CPU cache)

7. Comparison with Async/Reactive Models

Reactive Approach (Vert.x, Project Reactor, RxJava)

httpClient.get("/api")
    .thenCompose(resp -> resp.bodyAsStream())
    .thenAccept(body -> process(body))
    .exceptionally(e -> {
        log.error("Failed", e);
        return null;
    });

Pros:

No OS thread allocation per request
Efficient for high-concurrency scenarios
Forced non-blocking discipline

Cons:

Complex, callback-heavy code
Hard to debug (stack traces fragmented)
Difficult error handling
Requires async-aware libraries

Virtual Threads Approach

try {
    var resp = httpClient.get("/api").block();
    process(resp.bodyAsStream());
} catch (IOException e) {
    log.error("Failed", e);
}

Pros:

Sequential, imperative code (easier to reason about)
Standard exception handling
Works with blocking libraries (no rewrites needed)
Better debuggability

Cons:

Still requires pinning-aware design
Overhead vs. raw reactive (though small)
Not suitable for CPU-bound workloads

When to Use Each

Scenario	Platform Threads	Virtual Threads	Reactive
Small I/O-heavy service (~100 req/s)	✅	✅	Maybe overkill
High-concurrency I/O (~1M+ open connections)	❌	✅✅	✅
CPU-bound or batch processing	✅	✅ (in thread pool)	❌
Complex logic with multiple async stages	❌	✅✅	✅ (with care)
Legacy code migration	✅	✅✅	❌

8. Current Limitations and Future Evolution

Current Limitations (Java 21–23)

Pinning with synchronized
- Virtual threads pin when holding monitors
- Workaround: use ReentrantLock (though future improvements may help)
Debugging and tooling
- IDE and profiler support improving but not yet complete
- Large numbers of virtual threads can overwhelm traditional debuggers
Native code integration
- JNI code that blocks or uses thread-locals can cause pinning
- Need careful design for C++ interop
Virtual thread aware libraries
- Not all libraries optimize for virtual threads yet
- Thread pool sizes may not adapt automatically
Kernel support variance
- Different OS I/O models (epoll, kqueue, iouring) have different efficiency

Future Evolution

Planned improvements:

Loom enhancements: better monitor lock handling, reducing pinning
Scoped values: thread-local replacement without pinning risk
Structured concurrency: APIs to manage task hierarchies (already in preview)
Foreign Function & Memory API: safer JNI replacement
Thread-local optimizations: avoid pinning in more scenarios

9. Practical Example: A High-Concurrency HTTP Server

import com.sun.net.httpserver.*;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.util.concurrent.Executors;

public class HighConcurrencyServer {
    public static void main(String[] args) throws IOException {
        HttpServer server = HttpServer.create(
            new InetSocketAddress(8080), 
            128
        );

        // Virtual thread per request executor
        var executor = Executors.newVirtualThreadPerTaskExecutor();
        server.setExecutor(executor);

        server.createContext("/api/data", exchange -> {
            try {
                // Simulate I/O (database query, API call)
                Thread.sleep(100);
                
                String response = "{ \"status\": \"ok\" }";
                exchange.getResponseHeaders().set("Content-Type", "application/json");
                exchange.sendResponseHeaders(200, response.length());
                exchange.getResponseBody().write(response.getBytes());
                exchange.close();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                exchange.sendResponseHeaders(500, 0);
                exchange.close();
            }
        });

        server.start();
        System.out.println("Server running on http://localhost:8080");
        System.out.println("Handling requests with virtual threads...");
    }
}

Why this scales:

Thousands of concurrent requests → thousands of virtual threads
Each request’s I/O suspension unmounts the virtual thread
Small pool of carrier threads handles all virtual threads
No memory explosion, clean code

10. Conclusion

Virtual threads represent Java’s answer to the scalability challenges of the blocking model without sacrificing code readability. By leveraging continuations and cooperative scheduling, they enable millions of lightweight, concurrent tasks on modest hardware.

Key takeaways:

Virtual threads are suspended (not blocked) on I/O, freeing carrier threads
Pinning is the primary gotcha; use ReentrantLock instead of synchronized
They excel for I/O-bound, high-concurrency workloads
Existing blocking libraries work unchanged; no callback rewrite needed
They’re complementary to, not replacements for, reactive approaches

Virtual threads ensure that Java remains relevant and competitive in the era of high-concurrency, distributed systems. Java will not only survive — it will evolve.

Further reading:

Tags: Java VirtualThreads ProjectLoom Concurrency Architecture JVM