We switched to Java 21 virtual threads and got a deadlock in TPC-C for Postgres

We switched to Java 21 virtual threads and got a deadlock in TPC-C for Postgres(blog.ydb.tech)

280 points by magden 2 years ago | 243 comments

This problem is not going to go away so easily. Numerous core Java classes (like BufferedInputStream) use synchronized. I count 1600+ usages in java.base. The blocking issue means it's _much_ easier to accidentally run into this, rather than waving it away as an unlikely edge case.

I personally ran into this Using the built in com.sun webserver, with a virtual thread executor. My VPS only has two CPUs which means the FJP that virtual threads run on only have 2 active threads at a time. I ran into this hang when some of the connection hung, blocking any further requests from being processed.

pron 2 years ago | |

As the JEP states, pinning due to synchronized is a temporary issue. We didn't want to hold off releasing virtual threads until that matter is resolved (because users can resolve it themselves with additional work), but a fix already exists in the Loom repository, EA builds will be offered shortly for testing, and it will be delivered in a GA release soon.

Those who run into this issue and are unable or unwilling to do the work to avoid it (replacing synchronized with j.u.c locks) as explained in the adoption guide [1] may want to wait until the issue is resolved in the JDK.

I would strongly recommend that anyone adopting virtual threads read the adoption guide.

[1]: https://docs.oracle.com/en/java/javase/21/core/virtual-threa...

cesarb 2 years ago | | |

> unable or unwilling to do the work to avoid it

The problem is that it's rare to write code which uses no third-party libraries, and these third-party libraries (most written before Java virtual threads ever existed) have a good chance of using "synchronized" instead of other kinds of locks; and "synchronized" can be more robust than other kinds of locks (no risk of forgetting to release the lock, and on older JVMs, no risk of an out-of-memory while within the lock implementation breaking things), so people can prefer to use it whenever possible.

To me, this is a deal breaker; it makes it too risky to use virtual threads in most cases. It's better to wait for a newer Java LTS which can unmount virtual threads on "synchronized" blocks before starting to use it.

masoudprv 2 years ago | | |

Hi Ron. Thanks a lot for the amazing work you are doing on loom and whole JVM platform. EA builds and GA release you mentioned can make it into 22 or you meant EA build for 23?

i000 2 years ago | | |

Wow, I would love to be in the meeting where this decision was made.

Let's ship this with a foot gun, but lets not mention in the JEP that it may hang - let them figure it out.

jillesvangurp 2 years ago | |

People always forget that things that only happen every few million times, can happen fairly frequently on a busy server. This has bitten me numerous times. The nature of a lot of these types of issues is that they are hard to detect and hard to reproduce.

Virtual threads are nice for unblocking legacy code but they aren't without issues. There are better options for new code with less trade offs on the jvm as well. I've recently been experimenting with jasync-postgresql (there's a mysql variant as well) as an alternative to JDBC in Kotlin. It's a nice library. It does have some limitations and is a bit on the primitive side. But it appears to be somewhat widely used in various database frameworks for Scala, Java, and Kotlin.

Databases and database frameworks are an area on the JVM where there just is a huge amount of legacy code built on threads and blocking IO. It's probably one of the reasons Oracle worked on virtual threads as migrating away from these frameworks is unlikely to ever happen in a lot of code bases. So, waving a magic wand and making all that code non blocking is very attractive. But of course that magic has some hard limitations and synchronize blocks are one of those. I imagine they are working on improving that further.

simiones 2 years ago | | |

> Virtual threads are nice for unblocking legacy code but they aren't without issues. There are better options for new code with less trade offs on the jvm as well.

The designers of Project Loom would say the exact opposite. The whole push behind Project Loom and similar models (Go's oft-praised "goroutines" runtimes being another one) is motivated by Threads being a much better fit for async behavior in a fundamentally procedural language like Java or Go than promise-based frameworks like async/await.

The whole motivation of Project Loom is to make the simple thing (spawning threads to handle blocking IO) the fast thing as well (by actually replacing the blocking IO with efficient async IO OS calls and managing the threads internally). Project Loom will be considered a full success if the next generation Java web server does something akin to "new Thread(() -> {executeHandlerFunc(conn); }.Start(); " for each incoming connection, just like the Go built-in web server.

mike_hearn 2 years ago | | |

The lack of support for synchronized isn't a fundamental or hard limit, it's just that the HotSpot implementation is complicated for performance reasons and they put off rewriting that code until later. They're indeed working on that now and in some future version I guess wait/notify and synchronized blocks will start to work. After all, you can easily transform such code into an equivalent that does work.

tveita 2 years ago | | |

There are ways to find problem sections without having to trigger a full deadlock: https://openjdk.org/jeps/444

  The system property jdk.tracePinnedThreads triggers a stack trace when a thread blocks while pinned. Running with -Djdk.tracePinnedThreads=full prints a complete stack trace when a thread blocks while pinned, highlighting native frames and frames holding monitors. Running with -Djdk.tracePinnedThreads=short limits the output to just the problematic frames.

vincnetas 2 years ago | | |

Was curious what it is "jasync". And man it hurts me to see documentation like this (when compared to classic javadocs)

https://github.com/jasync-sql/jasync-sql/wiki/API-Overview

From project WIKI (https://github.com/jasync-sql/jasync-sql/wiki)

kaba0 2 years ago | | |

Synchronized blocks are not a problem. Synchronized blocks that later don’t unblock the thread may sometimes be.

he0001 2 years ago | |

BufferdInputStream is rewritten and is only using synchronized if subclassed. In fact there has been a lot of work removing the synchronized keyword.

anthony88 2 years ago | |

I've written an open source library to easily replace synchronized with something more virtual thread friendly: https://github.com/japplis/Virtually

mrintegrity 2 years ago |

Totally off topic but I am getting tired of the AI generated images used on nearly all blog posts nowadays. They are instantly recognisable, it just seems low effort and lowers the feeling of quality one might otherwise have

NovaX 2 years ago |

It is a known caveat that virtual threads do not work well with long running synchronization by pinning the thread. That unfortunately means that for many applications it may be premature to adopt them, but it is mature enough for broader evaluation by the libraries and frameworks. The Java team provided a status of their efforts recently [1].

https://www.youtube.com/watch?v=WoQJnnMIlFY&t=421s

papercrane 2 years ago |

Curious if you considered switching to a different connection pooling library. These days I usually use HikariCP which is fast an actively maintained. c3p0 hasn't had any activity for years, I'm not sure if it's still maintained.

samus 2 years ago | |

Crucially, c3p0 will probably never see the `synchronized` blocks being replaced by reentrant locks. Since LTS offers exist for Java 21, many libraries might actually do that. But I actually hope that the ecosystem resists, which would force virtual thread users suffering from this problem to upgrade soon.

rickette 2 years ago | |

Indeed, Hikari is the go to connection pool for some years now. It's even the default when running Spring Boot.

eivanov89 2 years ago | |

Perhaps we'll give HikariCP a chance. However, please keep in mind that the goal of the YDB team is to enhance database performance. We needed virtual threads to make TPC-C efficient enough to generate a reasonable load on a modest amount of hardware.

xmcqdpt2 2 years ago | | |

Is that true?

Virtual threads aren't necessarily faster, you still have just as many sockets and network connections as before. You can easily spawn 5000 platform threads, and if that's not enough, there are quite a few user-space implementations of fibers/coroutines/async etc on the JVM that can deal with many outlying requests (Cats/ZIO in Scala, Kotlin coroutines, the Play framework, concurrent.Future, etc.)

tveita 2 years ago | |

Looks like HikariCP is also awaiting fixes for this https://github.com/brettwooldridge/HikariCP/pull/2055

whalesalad 2 years ago | |

Been ages since I’ve touched it but back in 2017-2018 I had some fun integrating HikariCP in place of c3p0 in some Clojure projects and it was more performant.

vvern 2 years ago |

Go has a mechanism to spawn a new thread (m in ho runtime parlance) if it thinks one of its threads might be blocked in a cgo (go’s “native function” equivalent). That prevents stuff like this.

spintin 2 years ago |

The warning shots across the bow where heard with this statement from the devs:

"Don't replace platform/native threads with virtual ones, replace tasks (without further explanation) instead"?!

Combine that with the fact that they chose to implement the scheduler in Java instead of C(++) and you're set for performance problems.

Remember that NIO took from 1.5 to 1.7 to be usable/performant and that was native!

Edit: Finally figured out why: https://news.ycombinator.com/item?id=39010648

MrBuddyCasino 2 years ago | |

> they chose to implement the scheduler in Java instead of C(++) and you're set for performance problems

The JDK has historically used some native implementations in its stdlib (zip, imageio and others), back when the runtime wasn't as fast as it is today. But today's runtime would often be faster in Java than those native implementations.

bitcharmer 2 years ago | |

> Combine that with the fact that they chose to implement the scheduler in Java instead of C(++) and you're set for performance problems.

Ah yes, the argument from the 1990s. It would make sense to understand where the JVM and its compiler are these days before making incorrect statements about performance.

From your link:

> Blocking network TCP IO needs a sychronized block to work

This is utterly false.

spintin 2 years ago | | |

So how do you implement a TCP socket?

I have always had to do synchronized(something) { socketInputStream.read(); }

And the dude himself says that reading from a socket is a problem if you listen to the interview.

saagarjha 2 years ago |

This is a common problem when migrating a system from threads to virtual threads. In general, using primitives which block the current thread and prevent forward progress can quickly lead to deadlocks. It’s a hard issue to catch because in the past usually this would get “solved” by spawning a new thread to complete the task but in a world with virtual threads the runtime is usually reluctant to spawn more threads, so there’s nothing that can service more work if you’ve blocked all the threads.

moonchild 2 years ago |

I am extremely confused.

> There are two scenarios in which a virtual thread cannot be unmounted during blocking operations because it is pinned to its carrier:

> When it executes code inside a synchronized block or method

Isn't 'synchronized' effectively sugar for taking a kind of lock? Why can't it be treated uniformly by the scheduler?

avodonosov 2 years ago | |

The `synchronized` by itself does not cause any problems for virtual threads.

Only when one calls a blocking operation from synchronized, the thread is not unmounted. E.g. `synchronized (...) {blockingQueue.take()}`. Note that this is not a sane coding practice. (Calling a potentially long operation from within synchronized. The blockingQueue.take() does not need to be wrapped into synchronized. It has synchronization inside and plays well with virtual threads. Only when wrapped into the synchronized, the current implementation can not unmount the virtual thread.).

The JDK team works to remove quirks like pinning in the future versions.

dikei 2 years ago | |

No, synchronized is a very primitive lock implementation compared to what's available in java.util.concurrent.Locks.

However, it's built directly into the JVM specification, so it's difficult to change while keeping compatibility, while j.u.c.Locks is just a library. In other words, they can't change synchronized schematic, so they created j.u.c.Locks as a replacement.

altfredd 2 years ago | | |

> it's difficult to change while keeping compatibility

Actually, it is trivial to change. Just embed a ReentrantLock into every object and rewrite all calls to "synchronized"/"Object.wait" to use that lock.

Unfortunately, this would result in a bit of performance regression (increasing per-object memory footprint). To solve that would require turning ReentrantLock into a magical intrinsic, fully integrated with lock bytes in the object header. Which is actually not that hard either — other runtimes like Golang or Android VM solve problems like this on daily basis. Oracle, however…

samus 2 years ago | | |

The j.u.c.*Locks exist for a very long time already.

taspeotis 2 years ago |

Did they get a deadlock again? https://news.ycombinator.com/item?id=38939165

eivanov89 2 years ago | |

Haha, but quite frankly we had one more in TPC-C for YDB. But unrelated to the virtual threads.

dikei 2 years ago |

I wonder if HikariCP, currently the best Java DB Connection Pooling library, suffer the same issue as c3p0.

jshowalter 2 years ago |

Why not treat this as a bug, and fix it in Java 21? For compliance reasons we can only use LTS versions, and the next one isn't until September 2025, according to https://www.oracle.com/java/technologies/java-se-support-roa....

torrent 2 years ago |

Seems to be a similiar problem field as writing blocking functions that call async functions in C# and co-existence of synchronous and asynchronous code.

There are numerous recommendations such as

https://learn.microsoft.com/en-us/archive/msdn-magazine/2015...

Final phase is "I hope these techniques will help you adopt async into your existing applications in a way that works best for you."

pierrebai 2 years ago |

I thought the blog was great but "in summary" conclusion bad.

The summary merely stated that Java virtual thread are great. I expected a summary of the problem and solution, for example something like:

When using Java 21 virtual threads, you can end-up starved of carrier threads due to all carrier threads waiting on a pool exhausted resources with no thread available to free such resources. The solution is to wrap those resources in a virtual-thread aware object. In our case, we solved our problem by wrapping connections in semaphores.

xyst 2 years ago |

Concurrency, parallelism. These are among the most misunderstood concepts in programming/software development.

TLS (especially mutual TLS) and Oauth also join this club.

motoboi 2 years ago | |

Interestingly enough, I love both mutual-TLS and OAUTH (especially OIDC).

Alifatisk 2 years ago | | |

Why is that interesting?

skyde 2 years ago |

Why synchronized block are not preemptible? When compiling

public void syncMethod() {

        synchronized(lock) {
            // some code
        }

}

they could translate to

public void syncMethod() {

        await reentrantLockAsync.lockAsync();
        try {
            await somecodeAsync();
        } finally {
            await lock.unlockAsync()
        }
    }

dikei 2 years ago | |

The first issue is your second code is not Java (no await/async literal for Java yet)

The second issue is they're not completely equivalent. In the second case, you'd need extra memory for the `reentrantLock`, while `synchronized` works with any object. Furthermore, if you need to use `wait/notify`, then there need to be an extra `Condition` object to use in combination with the `ReentrantLock`. For sure, developers can rewrite most `synchronized` to use `ReentrantLock` and `Condition`, but javac won't do it automatically for you.

skyde 2 years ago | | |

They could at least introduce a new language construct like await synchronizedAsync(lock) { // some code }

C# introduced:

       await foreach (int item in RangeAsync(10, 3))

       Console.Write(item + " "); // Prints 10 11 12

So you dont have to type:

IAsyncEnumerator<int> e = RangeAsync(10, 3).GetAsyncEnumerator();

  try {

    while (await e.MoveNextAsync()) Console.Write(e.Current + " ");

  } finally { 

    if (e != null) await e.DisposeAsync(); 

  }

dxxvi 2 years ago |

Why was c3p0 used (its latest version was released in Dec 2019)? Those tests existed for a while and people were too lazy to replace c3p0 with something newer? I guess that they spent all their time to use virtual threads in those tests and had no time left to look at c3p0.

bheadmaster 2 years ago |

Why couldn't JVM detect when all carrier threads are blocked, and just spawn more of them?

davidtos 2 years ago | |

That already exists luckily, you can even change the maximum number of carrier threads with:

- jdk.virtualThreadScheduler.maxPoolSize=10

dikei 2 years ago | | |

The default is 256, way higher than 10.

But of course, when you have thousands of Virtual Threads all deliberately pinning the carrier thread, you quickly run out.

bob1029 2 years ago |

Sounds similar to the quirks you get with TPL in .NET under some circumstances. For library code, a ConfigureAwait(false) invoke should be considered to signify that the execution does not need to resume on the original thread.

nottorp 2 years ago |

Can someone explain what these virtual threads are in 2 sentences please?

xmcqdpt2 2 years ago | |

Why two sentences? Maybe you should ask ChatGPT if you want explanations with specific length requirements.

Anyway it's pretty simple really. A generic thread is bunch of stack frames (with their associated local variables). A standard OS thread is under the control of the kernel scheduler which decides whether the thread runs and makes progress or not. The VirtualThread in Java is just a thread which is not directly mapped to the OS thread scheduler but exists as a user space object that can be scheduled by a (Java implemented) scheduler. It's basically just a call stack with its local variables, but one that only steps forward when an OS thread of the scheduler decides to step it.

ganeshkrishnan 2 years ago | |

They are not real threads. That is, the CPU is not context switching them but jvm running them in async.

Usually threads were also used for long running io but not cpu intensive tasks. It's recommended to use virtual threads for such scenarios now

nottorp 2 years ago | | |

Thanks!

> Why two sentences? Maybe you should ask ChatGPT if you want explanations with specific length requirements.

As you can see, people can do it better. I put a limit on it because I didn't want an explanation of what threads are, just of the difference.

DeathArrow 2 years ago |

Why dining philosophers from the image have more than two hands?

isoprophlex 2 years ago | |

Because you're reading a low-effort Medium webshit with AI generated images

pierrebai 2 years ago | |

Probably the author thought that this typical AI-generation quirks were a funny wink to the concept of virtual threads. The elephants, of course, represent Postgres.

hoseja 2 years ago |

BTW, dining philosophers is an extremely clunky example.

mrkeen 2 years ago | |

I was looking to see how it would be relevant, and then:

> we present a case study on how we encountered a deadlock with virtual threads in TPC-C for PostgreSQL, even without the dining philosophers problem.

I guess it was a clunky non-example!

(I was hoping to see a virtual thread solution to compare to:

  https://www.adit.io/posts/2013-05-15-Locks,-Actors,-And-STM-In-Pictures.html
  https://www.youtube.com/watch?v=aQXgW55f7cg
  https://hackage.haskell.org/package/stm
)

oldgradstudent 2 years ago |

Java virtual threads did not cause a deadlock here.

The deadlock was a usage error.

A better title would be: Naively switching to Java virtual threads caused a deadlock in TPC-C for Progress SQL.

otterley 2 years ago |

"Our PostgreSQL TPC-C implementation utilizes c3p0 for connection pooling...The problem is that...synchronized code might be deeply embedded within the libraries you use. In our case, it was within the c3p0 library. So, the fix is straightforward: we simply wrapped the connection with a java.util.concurrent.Semaphore. With this change, virtual threads are blocked on the semaphore and, crucially, release the carrier thread instead of delving inside c3p0. Thus, we never block inside c3p0 because we enter c3p0 code only when there is a free session available."

charleslmunger 2 years ago |

"Why not to use java virtual threads" -> Fundamental features of the language treated as implementation details in libraries for more than two decades cause deadlock.

chucke1992 2 years ago |

Personally I am curious how these features like virtual threads are tested when developed.

kaba0 2 years ago | |

The model has been extensively tested in TLA+, which can reason about all possible timing combinations, among other things.

chucke1992 2 years ago | | |

Oh. Interesting. Never heard of it.

schizofunky 2 years ago |

I think i have a more elegant solution for this deadlock:

"Switch to haskell".

shermantanktop 2 years ago | |

If serious: that’s dismissive, superior, and a low-effort appeal to Haskell fans.

If not serious: it’s still low effort, but while it is framed as a zinger, it’s not funny at all. I don’t even understand what the humor might be, maybe it’s serious after all.

mrkeen 2 years ago | | |

It would have been better to write:

Switch to a language/runtime that's not only had virtual threads for decades, but also a saner synchronisation model (transactions) rather than synchronized blocks.

kaba0 2 years ago | |

Every language with sufficient concurrency and parallelism primitives are prone to dead locks/live locks and any other kind of race conditions.

mrkeen 2 years ago | | |

Exactly! It's the same thing with memory management primitives. If you expose malloc and free directly to an application programmer, things will eventually get buggy.

It's better to hide the locking primitives and let the runtime handle it for you safely.

jake_morrison 2 years ago |

Erlang is based on virtual threads (confusingly called processes). The Erlang virtual machine schedules them on OS threads. Erlang processes communicate using message passing, preventing deadlocks. You can use millions of Erlang processes without problems, e.g., to handle millions of Elixir LiveView sessions.

toast0 2 years ago | |

Erlang has the advantage that it was built around processes and is effectively preemptive. Processes can be descheduled anytime they make a function call or use receive to get or wait for messages and being a functional language, you have a finite amount of instructions before calling a function.

Other languages adding virtual threads later in life don't have the same ability to feel preemptive. Although I think someone said Java has a nice trick or two?

Anyway, if all the virtual threads seem preemptive, you won't have the case that your limited number of actual threads are waiting on locks and not yielding --- all Erlang processes yield eventually; usually in a fairly short time frame.

Jtsummers 2 years ago | |

You can have deadlock in Erlang, it's just a bit harder. It happens when two processes are both waiting on the other to send them a message which is analogous to two threads each waiting for a mutex the other holds. The same thing can happen in Go with its channels, another message passing based concurrency control mechanism.

orthoxerox 2 years ago | | |

Two deadlocked processes won't exhaust the thread pool in Erlang, they will simply never wake up.

jake_morrison 2 years ago | | |

Sure, you can make deadlocks in any language, but it's uncommon in Erlang. Shared state is the exception, and message passing means that things that manage state, such as gen_servers, only process one message at a time from their inbox.

Contrast this with languages like Java where every object is a potential concurrency problem. Or the 10+ years of trying to make Python async (see Twisted).

Alifatisk 2 years ago | |

This feels like something I could read on Wikipedia about Erlang, how does this add anything to the topic?

kaba0 2 years ago | |

It all has the cost of a significantly lower throughput in case of Erlang.

Also, this is more of a user error, than a fundamental issue.

A virtual thread cannot be unmounted during blocking operations when it is pinned to its carrier. A virtual thread is pinned in the following situations: The virtual thread runs code inside a synchronized block or method The virtual thread runs a native method or a foreign function (see Foreign Function and Memory API)