-
Notifications
You must be signed in to change notification settings - Fork 41.2k
NPE when shutting down long running web request #21557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A simple fix could be to check for |
Ahh, I remember now. We've seen this before in #16407. |
I'm getting the same stacktrace during normal operation as well. No shutdown involved. |
@OrangeDog I've been working with @aheritier but we haven't yet figured out what's clearing the request attributes when using Undertow. If you can share a sample that reproduces it, I'd love to take a look at it too. |
@OrangeDog I can confirm now that it's not related to a shutdown. I will try to plug a debugger to our staging environment to get more details for @wilkinsona I sadly cannot reproduce locally the issue on dev env :( |
As far as I can tell it has nothing to do with the request being slow either.
|
@OrangeDog That's a different problem to the one being tracked here. This issue is about a |
@wilkinsona I caught the NPE in |
Sorry, I misunderstood. That's really useful, thank you. So that's the stack trace of the exception that's passed into |
I've managed to reproduce the problem by debugging and throwing an IOException from within the depths of Undertow's HTTP layer. Thanks again for the pointer, @OrangeDog. |
Ok, it looks like a bug in Undertow to me. It's clearing out the request attributes too soon. Here's where it happens:
It closes its view of the connection when the write fails. As part of this close processing, the HTTP server exchange is ended which clears the attributes on the request. The exception is then rethrown. It's caught by Spring MVC and it drives after completion so that things can be tidied up at the end of the request handling. At this point the attribute used to track long-task timing has gone so it doesn't get cleaned up properly. |
So workaround is to downgrade Undertow? Any idea which is the latest that doesn't have the bug? |
Awesome finding @wilkinsona let's hope to have now a good reactivity in the Undertow side. |
Nicely found, @aheritier. That change in Undertow 2.1.2 looks like the cause of the regression. |
@wilkinsona should we open a ticket on Undertow side ? |
@aheritier I've finally managed to reduce things down to something fairly minimal that reproduces the failure. I've opened https://issues.redhat.com/browse/UNDERTOW-1743. |
awesome @wilkinsona Thanks a lot for your help |
We had this issue also in Spring Boot 2.2.8 with Tomcat (at least 9.0.36). What I don't know is what triggered this call, only that this was printed in our log right before: |
CVE-2020-10687 requires an update, meaning this problem can no longer be avoided. Is there an easy way to disable this interceptor? Edit: excluding |
Given the lack of traction on Undertow issues that we have raised such as UNDERTOW-1743 and the Undertow team choosing not to fix CVEs in 2.1 which GAed only 5 months ago, I think it's worth anyone using Undertow re-evaluating the benefits that attracted them to it. Bugs not being fixed and requiring minor upgrades to avoid security vulnerabilities would weigh very heavily for me against any other benefits. I'd want to be using a container with a maintenance approach that better met my needs. |
I agree @wilkinsona :( |
Does anyone know if any changes in Spring Boot We are on Undertow Our I also note the thoughts above on concerns around maintenance of Undertow. Concerning to me too. Not sure about other containers, but the features that earlier attracted me were its fast start-up time inside test automation; relatively modern codebase, small footprint, top level support for NIO via XNIO, relative lack of bloat, performance etc. But this was a few years ago, and perhaps this may be an outdated view of the world c.f Tomcat or Jetty as they currently stand. |
I'm not aware of any specific changes that would have affected this behaviour, particularly in Boot. @rstoyanchev @bclozel, have there been any changes in the latest Framework releases around flushing responses and the like that would affect the timing of when a response is actually written to the client?
I wouldn't expect a client to close the connection until it has received the entire response. At that point, I wouldn't expect Undertow to complain of a broken pipe if the connection was closed. What client are you using? My understanding of the Undertow bug is that it will occur whenever an IOException is thrown while writing the response. If anything is making the problem occur more frequently, I would suspect that it's most likely to be on the client side. |
Thanks @wilkinsona .
Sorry, probably should have been a bit clearer. Since on our side these are long-poll types of endpoints intended for browser/AJAX/fetch consumption, a browser could give up on them at any time - or in this specific case, some automated Locust performance tests which run for 'n' minutes, and then close all the outstanding connections. To add some additional complexity, we also have Istio service mesh proxies in front of the actual Spring Boot service networking, so the proxy has its own pool of internal HTTP connection to Undertow which is really the "true" client in this case. In theory these connections should actually be kept open almost indefinitely, so the broken pipe is a bit of a mystery in that regard - but we didn't change anything here, and we have alerts on all I was theorising that there is possibly a race because we also have a server-side timeout on the long polls, upon which we return a |
We have a regression in 5.2.10 related to the closing of the response after use of Jackson. This could long running requests that write periodically but I don't know if that's applicable. |
We do use Jackson, and the broken pipe is experienced when trying to take an object representing an error to serialize it. So this does sound possibly related. spring-projects/spring-framework#25987 Will try specifically downgrading Spring FW to |
I'm really pleased to report that there's been some movement on UNDERTOW-1743 with fixes made in the yet-to-be-released 2.0.33.Final, 2.1.5.Final, and 2.2.3.Final. We'll keep this issue open until those releases have happened and we have upgraded. |
Still seeing this issue, although UNDERTOW-1743 is fixed.
Stacktrace:
|
@tderoock Oh dear. In that case it would appear that the fix in Undertow is faulty or incomplete. Please open a new Undertow issue so that they are aware. If you comment here with a link to it, I'll do my best to help with the problem diagnosis. |
I've just tried my reproducer from UNDERTOW-1743 with Undertow 2.2.3 and the null pointer exception does not occur so it would appear that there's another code path within Undertow that may cause it to prematurely clear the request attributes. @tderoock If you open an Undertow issue, please take the time to describe to them in detail the situation in which the NPE occurs and, ideally, provide a sample similar to mine from UNDERTOW-1743 that reproduces the problem. I'm going to close this issue now as I don't think there's anything we can do in Boot to mitigate a container that clears request attributes incorrectly. Checking for null would just hide the problem, leading to inaccurate metrics with no indication why. For the Tomcat side of things, please refer to #16407. |
To reproduce add a
Thread.sleep
to a MVC+Actuator app then force close it from the command line.The text was updated successfully, but these errors were encountered: