Inside Java Virtual Threads: Architecture, Scheduling, and Performance
Virtual threads (Project Loom) represent a fundamental shift in how Java handles concurrent workloads. Rather than a simple API change, they introduce a new execution model built on continuations and cooperative scheduling. This article explores the internals: what virtual threads are, how they interact with the JVM scheduler, their performance implications, and where they fit alongside reactive and async patterns.
1. Platform Threads vs. Virtual Threads
Platform Threads (1:1 Model)
Traditionally, Java threads map directly to OS threads (1:1 model). Each java.lang.Thread:
- Allocates 1–2 MB of stack memory (OS-dependent)
- Has its own kernel context and registers
- Carries full scheduling responsibility to the OS kernel
- Limits practical concurrency to thousands (before memory/scheduler exhaustion)
For I/O-bound workloads (network requests, database queries), most platform threads spend time blocked, wasting OS resources.
Virtual Threads (Many-to-One Model)
Virtual threads are lightweight, user-space threads managed by the JVM. Key properties:
- Minimal memory footprint (~200 bytes vs. 1+ MB)
- Scheduled on a limited pool of carrier threads (OS threads)
- Automatically suspended when blocking on I/O or other I/O operations
- Can scale to millions of concurrent instances on modest hardware
Virtual threads don’t replace platform threads; they’re a higher-level abstraction sitting on top of them.
2. Carrier Threads and the JVM Scheduler
Carrier Thread Architecture
A carrier thread is a platform thread that executes virtual thread code. The JVM maintains:
- Default carrier pool: typically
ForkJoinPoolwith a size of2 × CPU cores - Virtual thread scheduler: decides which virtual thread runs on which carrier thread
Virtual Thread 1 ─┐
Virtual Thread 2 ─┤── Carrier Thread A
Virtual Thread 3 ─┘
Virtual Thread 4 ─┐
Virtual Thread 5 ─┤── Carrier Thread B
Virtual Thread 6 ─┘
Scheduling Model
The JVM scheduler:
- Mounts a virtual thread onto a carrier thread (runs it)
- Suspends it when it hits a blocking point (via continuations)
- Unmounts it and schedules another virtual thread on that carrier
- Later, resumes the virtual thread when the blocking operation completes
This is non-preemptive, cooperative scheduling: a virtual thread yields control voluntarily, not forced by the OS.
3. Continuations: The Mechanical Heart
Virtual threads rely on continuations — a mechanism to pause and resume execution mid-method without unwinding the stack.
What is a Continuation?
A continuation captures the entire execution state:
- Local variables and method parameters
- Call stack frames
- Program counter (instruction pointer)
When a blocking operation occurs (e.g., Socket.read()), the JVM:
- Saves the continuation state
- Suspends the virtual thread
- Releases the carrier thread to run another virtual thread
- Later, when I/O completes, resumes the saved continuation
Example: Under the Hood
var executor = Executors.newVirtualThreadPerTaskExecutor();
executor.submit(() -> {
System.out.println("Start"); // Virtual thread mounted
var data = socket.read(); // BLOCKING CALL
System.out.println(data); // Resumed later
});
Timeline:
- Virtual thread starts; mounted on carrier thread A
socket.read()blocks waiting for network data- JVM captures continuation and unmounts virtual thread
- Carrier thread A is freed; another virtual thread mounts
- Network data arrives; OS wakes the JVM’s I/O handler
- JVM resumes the first virtual thread’s continuation on any available carrier thread
- Execution continues from
socket.read()(transparently to the application)
4. Pinning: The Hidden Gotcha
What is Pinning?
Pinning occurs when a virtual thread cannot be suspended and unmounted from its carrier thread, effectively blocking the carrier. This ruins the scalability benefit.
When Does Pinning Happen?
- Synchronized blocks or methods
synchronized(lock) { socket.read(); // Virtual thread PINNED; cannot unmount }The JVM cannot unmount a virtual thread while holding a monitor lock (Java’s synchronized).
- Calling native code via JNI that blocks
nativeBlockingCall(); // Pinned while native code runs - Thread-local variables accessed inside blocking calls
ThreadLocal.get(); // Access inside blocking region may pin
Why is Pinning Bad?
If many virtual threads pin simultaneously, the carrier threads are exhausted, and queued virtual threads stall.
Virtual Thread 1 (pinned) ─┐
Virtual Thread 2 (pinned) ─┤── Only 2 carriers total
Virtual Thread 3 (blocked) ─ waiting for a carrier!
Virtual Thread 4 (blocked) ─ waiting for a carrier!
Avoiding Pinning
- Replace
synchronizedwithReentrantLock(does not pin) - Use
StampedLockorReadWriteLockfor fine-grained control - Keep native code execution short or avoid blocking in JNI
5. Blocking Calls, I/O, and Monitor Interaction
Blocking Operations That Suspend (Don’t Pin)
Virtual threads are suspended (unmounted) on:
- Socket I/O:
Socket.read(),write() - File I/O:
FileInputStream.read(),FileOutputStream.write()(if async-capable) Thread.sleep()Lock.lock()(ReentrantLock, not synchronized)- Coordination primitives:
CountDownLatch.await(),Semaphore.acquire()
Interaction with java.lang.Thread.currentThread()
var vt = Thread.ofVirtual().start(() -> {
System.out.println(Thread.currentThread().isVirtual()); // true
});
Existing APIs that call currentThread() work unchanged; the virtual thread identity is preserved across suspensions.
Interaction with Exception Handling
Stack traces and exception handling remain unchanged from the application’s perspective:
try {
socket.read(); // Virtual thread suspended here
} catch (IOException e) {
e.printStackTrace(); // Shows correct virtual thread stack
}
6. Performance Characteristics and Limits
Memory and CPU Overhead
| Metric | Platform Thread | Virtual Thread |
|---|---|---|
| Memory per thread | ~1–2 MB | ~200 bytes |
| Max scalable threads | ~10K–50K | 1M+ |
| Creation overhead | High (~1 µs) | Very low (~100 ns) |
| Context-switch cost | High (kernel) | Low (JVM) |
Throughput Example: Echo Server
Platform Threads (thread-per-request):
while (true) {
Socket client = server.accept();
new Thread(() -> handleClient(client)).start(); // New thread per request
}
Scales to ~1,000 concurrent connections before resource exhaustion.
Virtual Threads (virtual thread per request):
var executor = Executors.newVirtualThreadPerTaskExecutor();
while (true) {
Socket client = server.accept();
executor.submit(() -> handleClient(client)); // Virtual thread per request
}
Scales to 100,000+ concurrent connections on the same hardware.
Latency and Tail Latency
Virtual threads reduce tail latency for I/O-bound applications:
- No queue-head-of-line blocking (scheduler moves to next unblocked thread)
- Lower context-switch overhead
- Better cache locality (virtual threads on same carrier share CPU cache)
7. Comparison with Async/Reactive Models
Reactive Approach (Vert.x, Project Reactor, RxJava)
httpClient.get("/api")
.thenCompose(resp -> resp.bodyAsStream())
.thenAccept(body -> process(body))
.exceptionally(e -> {
log.error("Failed", e);
return null;
});
Pros:
- No OS thread allocation per request
- Efficient for high-concurrency scenarios
- Forced non-blocking discipline
Cons:
- Complex, callback-heavy code
- Hard to debug (stack traces fragmented)
- Difficult error handling
- Requires async-aware libraries
Virtual Threads Approach
try {
var resp = httpClient.get("/api").block();
process(resp.bodyAsStream());
} catch (IOException e) {
log.error("Failed", e);
}
Pros:
- Sequential, imperative code (easier to reason about)
- Standard exception handling
- Works with blocking libraries (no rewrites needed)
- Better debuggability
Cons:
- Still requires pinning-aware design
- Overhead vs. raw reactive (though small)
- Not suitable for CPU-bound workloads
When to Use Each
| Scenario | Platform Threads | Virtual Threads | Reactive |
|---|---|---|---|
| Small I/O-heavy service (~100 req/s) | ✅ | ✅ | Maybe overkill |
| High-concurrency I/O (~1M+ open connections) | ❌ | ✅✅ | ✅ |
| CPU-bound or batch processing | ✅ | ✅ (in thread pool) | ❌ |
| Complex logic with multiple async stages | ❌ | ✅✅ | ✅ (with care) |
| Legacy code migration | ✅ | ✅✅ | ❌ |
8. Current Limitations and Future Evolution
Current Limitations (Java 21–23)
- Pinning with synchronized
- Virtual threads pin when holding monitors
- Workaround: use
ReentrantLock(though future improvements may help)
- Debugging and tooling
- IDE and profiler support improving but not yet complete
- Large numbers of virtual threads can overwhelm traditional debuggers
- Native code integration
- JNI code that blocks or uses thread-locals can cause pinning
- Need careful design for C++ interop
- Virtual thread aware libraries
- Not all libraries optimize for virtual threads yet
- Thread pool sizes may not adapt automatically
- Kernel support variance
- Different OS I/O models (epoll, kqueue, iouring) have different efficiency
Future Evolution
Planned improvements:
- Loom enhancements: better monitor lock handling, reducing pinning
- Scoped values: thread-local replacement without pinning risk
- Structured concurrency: APIs to manage task hierarchies (already in preview)
- Foreign Function & Memory API: safer JNI replacement
- Thread-local optimizations: avoid pinning in more scenarios
9. Practical Example: A High-Concurrency HTTP Server
import com.sun.net.httpserver.*;
import java.io.IOException;
import java.net.InetSocketAddress;
import java.util.concurrent.Executors;
public class HighConcurrencyServer {
public static void main(String[] args) throws IOException {
HttpServer server = HttpServer.create(
new InetSocketAddress(8080),
128
);
// Virtual thread per request executor
var executor = Executors.newVirtualThreadPerTaskExecutor();
server.setExecutor(executor);
server.createContext("/api/data", exchange -> {
try {
// Simulate I/O (database query, API call)
Thread.sleep(100);
String response = "{ \"status\": \"ok\" }";
exchange.getResponseHeaders().set("Content-Type", "application/json");
exchange.sendResponseHeaders(200, response.length());
exchange.getResponseBody().write(response.getBytes());
exchange.close();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
exchange.sendResponseHeaders(500, 0);
exchange.close();
}
});
server.start();
System.out.println("Server running on http://localhost:8080");
System.out.println("Handling requests with virtual threads...");
}
}
Why this scales:
- Thousands of concurrent requests → thousands of virtual threads
- Each request’s I/O suspension unmounts the virtual thread
- Small pool of carrier threads handles all virtual threads
- No memory explosion, clean code
10. Conclusion
Virtual threads represent Java’s answer to the scalability challenges of the blocking model without sacrificing code readability. By leveraging continuations and cooperative scheduling, they enable millions of lightweight, concurrent tasks on modest hardware.
Key takeaways:
- Virtual threads are suspended (not blocked) on I/O, freeing carrier threads
- Pinning is the primary gotcha; use
ReentrantLockinstead ofsynchronized - They excel for I/O-bound, high-concurrency workloads
- Existing blocking libraries work unchanged; no callback rewrite needed
- They’re complementary to, not replacements for, reactive approaches
Virtual threads ensure that Java remains relevant and competitive in the era of high-concurrency, distributed systems. Java will not only survive — it will evolve.
Further reading: