Getting actual with digital threads
19 hours in the past
By Vadim Filanovsky, Mike Huang, Danny Thomas and Martin Chalupa
Netflix has an in depth historical past of utilizing Java as our main programming language throughout our huge fleet of microservices. As we decide up newer variations of Java, our JVM Ecosystem workforce seeks out new language options that may enhance the ergonomics and efficiency of our programs. In a latest article, we detailed how our workloads benefited from switching to generational ZGC as our default rubbish collector once we migrated to Java 21. Digital threads is one other characteristic we’re excited to undertake as a part of this migration.
For these new to digital threads, they’re described as “light-weight threads that dramatically scale back the hassle of writing, sustaining, and observing high-throughput concurrent functions.” Their energy comes from their potential to be suspended and resumed mechanically by way of continuations when blocking operations happen, thus liberating the underlying working system threads to be reused for different operations. Leveraging digital threads can unlock larger efficiency when utilized within the acceptable context.
On this article we focus on one of many peculiar instances that we encountered alongside our path to deploying digital threads on Java 21.
Netflix engineers raised a number of unbiased stories of intermittent timeouts and hung cases to the Efficiency Engineering and JVM Ecosystem groups. Upon nearer examination, we observed a set of frequent traits and signs. In all instances, the apps affected ran on Java 21 with SpringBoot 3 and embedded Tomcat serving visitors on REST endpoints. The cases that skilled the problem merely stopped serving visitors regardless that the JVM on these cases remained up and working. One clear symptom characterizing the onset of this problem is a persistent improve within the variety of sockets in closeWait state as illustrated by the graph beneath:
Sockets remaining in closeWait state point out that the distant peer closed the socket, however it was by no means closed on the native occasion, presumably as a result of the appliance failed to take action. This may typically point out that the appliance is hanging in an irregular state, wherein case utility thread dumps could reveal further perception.
With the intention to troubleshoot this problem, we first leveraged our alerts system to catch an occasion on this state. Since we periodically accumulate and persist thread dumps for all JVM workloads, we will typically retroactively piece collectively the conduct by inspecting these thread dumps from an occasion. Nonetheless, we have been shocked to seek out that every one our thread dumps present a wonderfully idle JVM with no clear exercise. Reviewing latest adjustments revealed that these impacted companies enabled digital threads, and we knew that digital thread name stacks don’t present up in jstack-generated thread dumps. To acquire a extra full thread dump containing the state of the digital threads, we used the “jcmd Thread.dump_to_file” command as a substitute. As a last-ditch effort to introspect the state of JVM, we additionally collected a heap dump from the occasion.
Thread dumps revealed 1000’s of “clean” digital threads:
#119821 "" digital#119820 "" digital
#119823 "" digital
#120847 "" digital
#119822 "" digital
...
These are the VTs (digital threads) for which a thread object is created, however has not began working, and as such, has no stack hint. In truth, there have been roughly the identical variety of clean VTs because the variety of sockets in closeWait state. To make sense of what we have been seeing, we have to first perceive how VTs function.
A digital thread is just not mapped 1:1 to a devoted OS-level thread. Moderately, we will consider it as a process that’s scheduled to a fork-join thread pool. When a digital thread enters a blocking name, like ready for a Future, it relinquishes the OS thread it occupies and easily stays in reminiscence till it is able to resume. Within the meantime, the OS thread may be reassigned to execute different VTs in the identical fork-join pool. This enables us to multiplex a variety of VTs to only a handful of underlying OS threads. In JVM terminology, the underlying OS thread is known as the “provider thread” to which a digital thread may be “mounted” whereas it executes and “unmounted” whereas it waits. A terrific in-depth description of digital thread is accessible in JEP 444.
In our surroundings, we make the most of a blocking mannequin for Tomcat, which in impact holds a employee thread for the lifespan of a request. By enabling digital threads, Tomcat switches to digital execution. Every incoming request creates a brand new digital thread that’s merely scheduled as a process on a Digital Thread Executor. We are able to see Tomcat creates a VirtualThreadExecutor right here.
Tying this info again to our downside, the signs correspond to a state when Tomcat retains creating a brand new net employee VT for every incoming request, however there aren’t any out there OS threads to mount them onto.
What occurred to our OS threads and what are they busy with? As described right here, a VT might be pinned to the underlying OS thread if it performs a blocking operation whereas inside a synchronized block or technique. That is precisely what is going on right here. Here’s a related snippet from a thread dump obtained from the caught occasion:
#119515 "" digital
java.base/jdk.inner.misc.Unsafe.park(Native Methodology)
java.base/java.lang.VirtualThread.parkOnCarrierThread(VirtualThread.java:661)
java.base/java.lang.VirtualThread.park(VirtualThread.java:593)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.inner.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
zipkin2.reporter.inner.CountBoundedQueue.provide(CountBoundedQueue.java:54)
zipkin2.reporter.inner.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
zipkin2.reporter.courageous.AsyncZipkinSpanHandler.finish(AsyncZipkinSpanHandler.java:214)
courageous.inner.handler.NoopAwareSpanHandler$CompositeSpanHandler.finish(NoopAwareSpanHandler.java:98)
courageous.inner.handler.NoopAwareSpanHandler.finish(NoopAwareSpanHandler.java:48)
courageous.inner.recorder.PendingSpans.end(PendingSpans.java:116)
courageous.RealSpan.end(RealSpan.java:134)
courageous.RealSpan.end(RealSpan.java:129)
io.micrometer.tracing.courageous.bridge.BraveSpan.finish(BraveSpan.java:117)
io.micrometer.tracing.annotation.AbstractMethodInvocationProcessor.after(AbstractMethodInvocationProcessor.java:67)
io.micrometer.tracing.annotation.ImperativeMethodInvocationProcessor.proceedUnderSynchronousSpan(ImperativeMethodInvocationProcessor.java:98)
io.micrometer.tracing.annotation.ImperativeMethodInvocationProcessor.course of(ImperativeMethodInvocationProcessor.java:73)
io.micrometer.tracing.annotation.SpanAspect.newSpanMethod(SpanAspect.java:59)
java.base/jdk.inner.replicate.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
java.base/java.lang.replicate.Methodology.invoke(Methodology.java:580)
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:637)
...On this stack hint, we enter the synchronization in courageous.RealSpan.end(RealSpan.java:134). This digital thread is successfully pinned — it’s mounted to an precise OS thread even whereas it waits to accumulate a reentrant lock. There are 3 VTs on this actual state and one other VT recognized as “<redacted> @DefaultExecutor - 46542” that additionally follows the identical code path. These 4 digital threads are pinned whereas ready to accumulate a lock. As a result of the app is deployed on an occasion with 4 vCPUs, the fork-join pool that underpins VT execution additionally incorporates 4 OS threads. Now that we’ve exhausted all of them, no different digital thread could make any progress. This explains why Tomcat stopped processing the requests and why the variety of sockets in closeWait state retains climbing. Certainly, Tomcat accepts a connection on a socket, creates a request together with a digital thread, and passes this request/thread to the executor for processing. Nonetheless, the newly created VT can’t be scheduled as a result of the entire OS threads within the fork-join pool are pinned and by no means launched. So these newly created VTs are caught within the queue, whereas nonetheless holding the socket.
Now that we all know VTs are ready to accumulate a lock, the following query is: Who holds the lock? Answering this query is vital to understanding what triggered this situation within the first place. Normally a thread dump signifies who holds the lock with both “- locked <0x…> (at …)” or “Locked ownable synchronizers,” however neither of those present up in our thread dumps. As a matter of reality, no locking/parking/ready info is included within the jcmd-generated thread dumps. It is a limitation in Java 21 and might be addressed sooner or later releases. Rigorously combing by way of the thread dump reveals that there are a complete of 6 threads contending for a similar ReentrantLock and related Situation. 4 of those six threads are detailed within the earlier part. Right here is one other thread:
#119516 "" digital
java.base/java.lang.VirtualThread.park(VirtualThread.java:582)
java.base/java.lang.System$2.parkVirtualThread(System.java:2643)
java.base/jdk.inner.misc.VirtualThreads.park(VirtualThreads.java:54)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:219)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:990)
java.base/java.util.concurrent.locks.ReentrantLock$Sync.lock(ReentrantLock.java:153)
java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:322)
zipkin2.reporter.inner.CountBoundedQueue.provide(CountBoundedQueue.java:54)
zipkin2.reporter.inner.AsyncReporter$BoundedAsyncReporter.report(AsyncReporter.java:230)
zipkin2.reporter.courageous.AsyncZipkinSpanHandler.finish(AsyncZipkinSpanHandler.java:214)
courageous.inner.handler.NoopAwareSpanHandler$CompositeSpanHandler.finish(NoopAwareSpanHandler.java:98)
courageous.inner.handler.NoopAwareSpanHandler.finish(NoopAwareSpanHandler.java:48)
courageous.inner.recorder.PendingSpans.end(PendingSpans.java:116)
courageous.RealScopedSpan.end(RealScopedSpan.java:64)
...Be aware that whereas this thread seemingly goes by way of the identical code path for ending a span, it doesn’t undergo a synchronized block. Lastly right here is the sixth thread:
#107 "AsyncReporter <redacted>"
java.base/jdk.inner.misc.Unsafe.park(Native Methodology)
java.base/java.util.concurrent.locks.LockSupport.park(LockSupport.java:221)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.purchase(AbstractQueuedSynchronizer.java:754)
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1761)
zipkin2.reporter.inner.CountBoundedQueue.drainTo(CountBoundedQueue.java:81)
zipkin2.reporter.inner.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:241)
zipkin2.reporter.inner.AsyncReporter$Flusher.run(AsyncReporter.java:352)
java.base/java.lang.Thread.run(Thread.java:1583)That is truly a traditional platform thread, not a digital thread. Paying explicit consideration to the road numbers on this stack hint, it’s peculiar that the thread appears to be blocked inside the inner purchase() technique after finishing the wait. In different phrases, this calling thread owned the lock upon coming into awaitNanos(). We all know the lock was explicitly acquired right here. Nonetheless, by the point the wait accomplished, it couldn’t reacquire the lock. Summarizing our thread dump evaluation:
There are 5 digital threads and 1 common thread ready for the lock. Out of these 5 VTs, 4 of them are pinned to the OS threads within the fork-join pool. There’s nonetheless no info on who owns the lock. As there’s nothing extra we will glean from the thread dump, our subsequent logical step is to peek into the heap dump and introspect the state of the lock.
Discovering the lock within the heap dump was comparatively easy. Utilizing the wonderful Eclipse MAT instrument, we examined the objects on the stack of the AsyncReporter non-virtual thread to determine the lock object. Reasoning in regards to the present state of the lock was maybe the trickiest a part of our investigation. Many of the related code may be discovered within the AbstractQueuedSynchronizer.java. Whereas we don’t declare to totally perceive the internal workings of it, we reverse-engineered sufficient of it to match towards what we see within the heap dump. This diagram illustrates our findings:
