-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential deadlock with Vaadin #12272
Comments
I don't think this is a Jetty issue. Vaadin should not grab a lock and then perform a blocking operation like The other 2 stack traces are now waiting on Did you report this issue to Vaadin? |
Yes, I did. I referenced the issue in the beginning of the description.
That was my impression as well. I cannot say for sure that this is what causes the deadlock, but it might be a good idea to avoid this regardless. Do you think the same is true for writing back the response? |
Yes, if a blocking API is used. |
Alright, I will forward these suggestions to Vaadin, thanks! I am still interested, do you think that this is really the cause of the deadlock? Specifically, if you look at the threads waiting on the VaadinSession, are those the threads that would eventually release the BlockingContentProducer? Because to me, it looks like when content isn't immediately available, like it seems to be the case here, a read interested is registered in the selector, which should notify HttpInput when content becomes available. Can that really be affected by some request handler being blocked on a lock? I can see some threads are waiting in |
A blocker read will be eventually woken up, either when data is available, or when a timeout occurs. The problem is that in both cases the wait can be really long (10s of seconds, minutes or even more), so if something else happens in the system that requires access to the Vaadin session, then these threads will be blocked for a long time. The proper solution is to not perform blocking API calls with locks held. |
Does this mean that you don't see any reason for why either of the two threads waiting for the Vaadin session would prevent the first thread from eventually proceeding? This in turn would imply that shutdown would still be blocked by that first thread regardless of whether that thread holds the Vaadin session lock while reading the request body (except if the client for some reason delays sending more bytes until one of the other threads have made progress)? |
...or is this a form of head-of-line blocking with a TCP-level buffer that full of bytes for one of the threads waiting for the lock which prevents the demultiplexer from reaching the bytes that would allow the first thread to proceed? |
@Legioth I am not sure I understand your comments. A blocking operation performed with locks held is typically a mistake. In this case, there is nothing that Jetty can do, it's a Vaadin issue. Note that this will happen with any Servlet Container, not just Jetty, and for that matter with any |
I understand the theoretical issues with dining philosophers and so on. Holding a lock while doing a blocking operation (e.g. acquiring another lock or blocking I/O) is generally fine when it comes to deadlocks as long as that blocking operation is independent from the lock. That's usually the case when a high-level abstraction calls down to a lower-level abstraction since the lower-level abstraction doesn't have any direct code path that leads to that lock. It's a bit surprising that a HTTP request wouldn't be independent but I guess that's what TPC head-of-line blocking is all about. Could you confirm that this would be the likely underlying reason also from your perspective or is there some other factor at play here as well? If TCP head-of-line blocking is the expected cause, then this issue could be closed on my (and thus also Vaadin's) behalf. I'm not saying that Vaadin's implementation might not be a mistake, but it's then at least a mistake that has taken more than 10 years for anyone to discover 😄. I would just prefer to have a proper understanding of the underlying mechanisms before I start digging into whether anything could be changed on Vaadin's side. |
@Legioth I think you have the locking understanding the other way around.
They are.
TCP HoL blocking is just one of the multiple possible cases of blocking, but any blocking would show the problem. I would not concentrate on TCP HoL blocking, as this is a problem at a different level. The API called by Vaadin are blocking, so if the remote client is not sending data then the API will block, no matter whether the data is transported by a protocol that suffers from TCP HoL blocking. I would not close the issue on the Vaadin side without a fix, no matter if it took 10 years to discover. |
I see the possibility of TCP HoL blocking as "proof" that there's nothing Jetty could do in this case regardless of whether that's actually what goes on in this specific case. This also means that this issue could be closed (which can only be done by the original reporter or project maintainers) while keeping the Vaadin issue open. But... I'm also curious to gain a more in-depth understanding even though I also realize that it's not your duty to educate me. 😄 My statement about holding locks while calling a lower abstraction is maybe not accurate - it goes for calling any code where you are not sure about what shared resources it might try to use. A typical deadlock has two separate shared resources that are acquired in an inconsistent order. Vaadin's session lock is one of those resources but there also has to be some other shared resource for a deadlock to happen. Treating blocking I/O in general as a shared resource might be a good heuristic but I would like to have something more specific for my mental model. In the case of TCP HoL blocking, that specific other resource is the read buffer of the shared TCP connection which causes seemingly independent HTTP/2 requests to actually depend on each other. What other shared resources could there be that might cause a deadlock between concurrent HTTP requests or responses? |
@Legioth this is technically not a deadlock because there is no circular wait, but instead it is just a "hold and wait" situation, which would not happen if I/O reads/writes never block. For HTTP/1.1, the wait in reads/writes is caused by TCP HoL blocking and TCP congestion, respectively. As for your mental model, I/O operations and blocking access to bounded resources (e.g. a connection pool towards a JDBC database, but also adding items to a bounded queue) are the "wait" part of the "hold and wait" situation, and you should avoid to "hold" when performing those "wait" operations.
While it is possible that few requests cause the HTTP/2 session flow control blocking (for reads and writes), or TCP HoL blocking, or TCP flow control blocking, so that other requests on the same connection would depend on the few ones, it does not seem to be what happens here. This is just one request failing to provide the request content, and other requests from the same client waiting on the Vaadin session lock (which I assume be per-client). Other requests from other clients should be able to proceed without problem, so there is no dependency. |
Thanks for the clarification. I think we have partially talked across each other based on slightly different initial assumptions. I have been assuming that the first thread was stalled because the bytes it needed were held up by something from one of the two other threads, i.e. that the whole situation could be avoided if either of the two other threads could acquire the session lock and proceed to a point where they would indirectly release the bytes needed by the first thread. It seems like your assumption is that the first thread was stalled because the needed bytes never even reached the server, i.e. that the first stalled thread is inevitable but a better locking strategy in Vaadin could prevent the two other threads from also being stalled. |
I doubt this is the case, looking at the stack traces.
Correct, this seems likely the case from the stack traces. |
I realized the same now when looking more closely at the stack traces. Thread 2 is a websocket close frame which shouldn't have any payload that remains to be read while thread 3 is a timer rather than any request handling. |
I just realized that we also use virtual threads in some cases, and those will not show up in our handmade thread dump. This might only be remotely relevant for this particular case, since as you said, one of the threads is probably stuck inevitably. Still, it's quite possible that those virtual threads also wait on the VaadinSession, since we use them for background threads that need to access Vaadin components. |
@mperktold if you use Jetty 12, you can use |
A WebSocket CLOSE Frame has payload. |
Some more insights that we gained in the meantime: I wrote that this happens during shutdown, but that's not true. It's just that once this has happened, it prevents our shutdown logic, because some users are still logged in. The request that never provides the content happens much earlier though. In our own installation, this happens like once every few days, but only for some users who typically work on notebooks with multiple tabs open. So while I agree that it looks like in these cases, the client never sends the request content, I don't think that this would happen so frequently. Could there be any other circumstances that lead to the same behavior, maybe related to WIFI, or power save mode? Could a bug in our or Vaadin's code potentially prevent the content from being read in this manner? |
In a new case, I have examined the logs and found the following exception:
I understand that this error is unavoidable, as this is just what happens when the client closes the connection. But a few minutes afterwards, it appears that we are again in the situation where a thread is stuck reading the content while holding the lock on the VaadinSession. I don't have a thread dump of this situation, so I don't know for sure, but the logs show a symptom that leads me to believe this is the case here (we start a timer task that locks the VaadinSession, but the logs neither show that the task is executed nor that it is cancelled, so it probably waits for the lock forever). Is it possible that this error isn't handled correctly by Jetty, in the sense that some threads might still try to read something from the connection that is already closed? Or maybe this only arises when a lock like the one on the VaadinSession is held like in our case? |
The connection is not closed, this is just a stream failure due to the fact that the client sent a reset to the server, just for that particular stream. Jetty does not have threads reading or otherwise interacting with this canceled stream ever again: the failure is notified to the application, and if the application blocks you should see a clear stack trace. |
I see. Sorry, I am not very familiar with low level HTTP/2 stuff, thanks for the clarification.
What if a thread was already reading in a blocking way from it, will it wake up in this case? I am talking about the first stack trace in my original post, where a thread is blocked while waiting on the request content. I don't know whether the thread was already blocked before the stream reset or afterwards, or whether it is actually the same stream or not. I just want to investigate whether these two things, i.e. a thread blocked waiting for request content, and a stream reset sent by the client, could lead to the behavior we are seeing. On a related note: if the request was really blocked because the client never sends the content, what do you think would happen when the user closes the browser and maybe even shuts down the device? Would the server notice that and give up on the request content, or would it still block forever? |
Yes, it will be woken up with an exception (your first stack trace). Upon receiving a request, a stream is always subject to the idle timeout. |
That stack trace is from a thread dump, there is no exception. On the contrary, it shows that the request is still blocked after a very long time.
I tried to replicate the scenario with a simple Jetty client and server, and indeed, when I shut down the client before it sends the request content, the server wakes up from blocking on the content and continues normally. On the other hand, we have reports where the thread remains blocked over night, so I wonder how that is possible. |
Yes, the discussion in this issue has been that there is locking that prevents the read to be woken up. Can you reproduce the issue with Jetty DEBUG logs enabled? |
I think I managed to reproduce the issue with Jetty's HTTP2Client. The server part is a echo servlet that simply writes the request content back to the response: var server = new Server();
var connector = new ServerConnector(server, new HTTP2ServerConnectionFactory());
connector.setPort(8080);
server.addConnector(connector);
var servletContextHandler = new ServletContextHandler("/");
server.setHandler(servletContextHandler);
servletContextHandler.addServlet(new HttpServlet() {
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws IOException {
System.out.println("POST begin");
try (var reader = req.getReader();
var writer = resp.getWriter()
) {
reader.transferTo(writer);
}
finally {
System.out.println("POST end");
}
}
}, "/");
server.start();
server.join(); The client part uses the HTTP2Client to send a data frame: try (var client = new HTTP2Client(new ClientConnector())) {
client.start();
Session session = client.connect(
new InetSocketAddress("localhost", 8080),
new ServerSessionListener() {}
).join();
var request = new MetaData.Request(
"POST",
HttpURI.from("http://localhost:8080/"),
HttpVersion. HTTP_2,
HttpFields.EMPTY
);
var headersFrame = new HeadersFrame(request, null, false);
var responseListener = new Stream.Listener() {
@Override
public void onDataAvailable(Stream stream) {
while (true) {
Stream.Data data = stream.readData();
if (data == null) {
stream.demand();
return;
}
System.out.println(data);
data.release();
if (data.frame().isEndStream())
return;
}
}
};
Stream stream = session.newStream(headersFrame, responseListener).join();
ByteBuffer content = StandardCharsets.UTF_8.encode("Hello Jetty");
var dataFrame = new DataFrame(stream.getId(), content, false);
System.out.println(stream.data(dataFrame).join());
stream.demand();
session.shutdown().join();
} Note that the data frame is constructed with I'm certainly misusing the HTTP2Client here, but I think there should be some way for the server to notice that the client is gone. It could be an exception that is thrown when the stream is closed, or some kind of idle timeout, or maybe something else. If this is not a valid/helpful reproducer, I can also try to enable DEBUG logs. |
Added test case. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Updated test case to use Input/OutputStream as well as Reader/Writer. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@mperktold I'm looking into this. |
Fixed test case expectations. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
More tests. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@mperktold there are multiple issues at play here.
I have ported your test (and more) into #12506. Your test is calling This is expected behavior when using the low-level HTTP/2 APIs: you have more control, but you have to be more precise in what your application is doing. In case the client does not finish the pending streams, either by reset or by sending the last frame, the server will eventually idle timeout, and wake up the blocked read by throwing an exception. In summary, Jetty behaves correctly. It might be something that we did not find yet, but you need to prove it is a Jetty fault 😄 |
Thanks for your investigation and clarification. It's my first time using But in part, I do that on purpose here, to see how the server reacts when the client fails to shut down gracefully.
This would be perfect, but I was not seeing this in my tests. I tried again and made the following observations:
So apparently, the idle timeout works on the client, but not in the server in this case. When it occurs on the client, it probably notifies the server about it via a GOAWAY frame or something similar, but the server doesn't seem to notice the idle timeout itself. Can you maybe replicate this in your tests? |
@mperktold since you can replicate, can you please detail the exact steps you are doing, post the exact code, and DEBUG logs? Thanks! |
Alright. I basically took the same example, I just modified the client a bit such that it kills itself after one second, along some simplifications: public static void main(String[] args) throws Exception {
var client = new HTTP2Client(new ClientConnector());
client.start();
client.setIdleTimeout(60_000);
client.setStreamIdleTimeout(60_000);
Session session = client.connect(
new InetSocketAddress("localhost", 8080),
new ServerSessionListener() {}
).join();
var request = new MetaData.Request(
"POST",
HttpURI.from("http://localhost:8080/"),
HttpVersion. HTTP_2,
HttpFields.EMPTY
);
var headersFrame = new HeadersFrame(request, null, false);
var responseListener = new Stream.Listener() {};
Stream stream = session.newStream(headersFrame, responseListener).join();
ByteBuffer content = StandardCharsets.UTF_8.encode("Hello Jetty");
var dataFrame = new DataFrame(stream.getId(), content, false);
System.out.println(stream.data(dataFrame).join());
stream.demand();
try {
session.shutdown().get(1, TimeUnit.SECONDS);
}
finally {
System.exit(0);
}
} Note that client and server have their own main method, so we can kill the whole client process. Here are the debug logs: Somewhere near the bottom of |
I should add that I can't always reproduce, but in like 50% of the cases. I'm also not sure how to turn this into a unit tests, because you can't have two processes there. |
@mperktold this is what I am seeing:
Can you please post the server-side code too, wrap the server-side read into a All I see seems normal behavior, except perhaps the In particular, I see the server-side stream to be failed on the server, and that should wake up the server-side blocked read. |
Here is the server-side code with an additional catch block and the idle timeout reduced to 2 seconds: public static void main(String[] args) throws Exception {
var server = new Server();
var connector = new ServerConnector(server, new HTTP2ServerConnectionFactory());
connector.setPort(8080);
connector.setIdleTimeout(2_000);
server.addConnector(connector);
var servletContextHandler = new ServletContextHandler("/");
server.setHandler(servletContextHandler);
servletContextHandler.addServlet(new HttpServlet() {
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) {
System.out.println("-------------------------------");
System.out.println("POST begin");
System.out.println("-------------------------------");
try {
req.getReader().transferTo(resp.getWriter());
}
catch (Exception exc) {
System.err.println("-------------------------------");
exc.printStackTrace();
System.err.println("-------------------------------");
}
finally {
System.out.println("-------------------------------");
System.out.println("POST end");
System.out.println("-------------------------------");
}
}
}, "/");
server.start();
server.join();
} When I wait 10 seconds for shutdown on the client, everything works as expected: When I switch that back to 1 second (i.e. less then the server idle timeout), I still get the same problem, the thread doesn't wake up, neither with an exception nor otherwise: |
Fixed the case where a GOAWAY followed by a TCP FIN was causing a race between closing the `EndPoint` and running the failure `Runnable` task. The TCP FIN after the GOAWAY causes the streams to be failed on the server; in turn, failing the streams generates failure `Runnable` tasks that are submitted to the HTTP/2 execution strategy; however, the streams were destroyed before the failure `Runnable` tasks actually ran, so the `EndPoint` was closed; closing the `EndPoint` would close the `HTTP2Connection`, which in turn would stop the execution strategy; this lead to the fact that the failure `Runnable` tasks were never run. Now, the failure `Runnable` tasks are invoked via `ThreadPool.executeImmediately()` rather than being submitted to the execution strategy. This ensures that they would be run and not queued, even in case of lack of threads, so that they could unblock blocked reads or writes, freeing up blocked threads. Additionally, improved `HTTP2Stream.onFailure()` to destroy the stream only after the failure tasks have completed. Smaller other fixes to improve the code. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Okay, we are at the bottom of this. Turned out to be a combination of close/shutdown + TCP FIN that caused a race condition on the server, so that the task that was supposed to unblock the read was queued but never run. @mperktold can you please try the fix in #12506 and report back if it fixes the problem for you? |
OK, this is probably a dumb question, but how can I do that? 😅 |
I did that now and I cannot reproduce anymore, so the problem seems to be fixed. 👍 Thank you very much! |
@mperktold thanks for the feedback and for insisting about this. I owe you a beer! 😄 |
Fixed the case where a GOAWAY followed by a TCP FIN was causing a race between closing the `EndPoint` and running the failure `Runnable` task. The TCP FIN after the GOAWAY causes the streams to be failed on the server; in turn, failing the streams generates failure `Runnable` tasks that are submitted to the HTTP/2 execution strategy; however, the streams were destroyed before the failure `Runnable` tasks actually ran, so the `EndPoint` was closed; closing the `EndPoint` would close the `HTTP2Connection`, which in turn would stop the execution strategy; this lead to the fact that the failure `Runnable` tasks were never run. Now, the failure `Runnable` tasks are invoked via `ThreadPool.executeImmediately()` rather than being submitted to the execution strategy. This ensures that they would be run and not queued, even in case of lack of threads, so that they could unblock blocked reads or writes, freeing up blocked threads. Additionally, improved `HTTP2Stream.onFailure()` to destroy the stream only after the failure tasks have completed. Smaller other fixes to improve the code. Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Jetty version(s)
Jetty 12.0.13
Jetty Environment
core, ee10
Java version/vendor
(use: java -version)
openjdk version "21.0.3" 2024-04-16 LTS
OpenJDK Runtime Environment Temurin-21.0.3+9 (build 21.0.3+9-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.3+9 (build 21.0.3+9-LTS, mixed mode, sharing)
OS type/version
Windows 11
Description
This is a repost of vaadin/flow#19938
We have several reports of our application not being able to shutdown. Apparently, some VaadinSessions stay alive and cannot be destroyed. Here are the thread dumps of two such cases:
StackTraces1.txt
StackTraces2.txt
I found some common patterns in these dumps:
One of the threads blocks on a Jetty semaphore while reading the request content of an UIDL request. Note that this thread holds the lock on the VaadinSession while blocking.
A second thread blocks on the VaadinSession while trying to close the websocket:
A third thread also blocks on the VaadinSession while handling a connection loss, but this one comes from the HeartbeatInterception:
I'm not sure where things go wrong, but this does look a bit suspicious to me.
The blocked thread is waiting in BlockingContentProducer.nextChunk until content is available.
From what I see, the semaphore should eventually be released in BlockingContentProducer.onContentAvailable when new content is available, which should be called from HttpInput.run.
I'm not actually seeing a deadlock here, but I find it suspicious that some threads wait on the VaadinSession and another holds it while blocking for some other reason.
How to reproduce?
Unfortunately, I don't have a reproducer. However, we have several reports of this on shutdown, so that might have to do something with it. I hop you can do something with the thread dumps.
The text was updated successfully, but these errors were encountered: