-
Notifications
You must be signed in to change notification settings - Fork 847
Fix crash in H2 priority tree cleanup. #2781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This will resolve the issue, and removing the line from |
|
This fix made things a lot better, but not perfect. One crash overnight instead of a crash every 10 minutes. The core shows a "bad" node in the dependency try. So we probably need a combination of the two fixes. I just compiled a binary with this fix and your fixes. I simplified that a bit to just always call delete_stream and then release_stream in Http2Stream::destroy(). I don't see the value of calling delete_stream, release_stream, delete_stream. The second delete_stream will always be an immediate return. |
|
We want this for 7.1.2 I assume? I cherry-picked #2774 to 7.1.x already. |
|
@shinrich I’m fine with this but should we wait for the combinated and simplified update? |
218f990 to
534338a
Compare
|
Pushed another set of changes. I spent yesterday iterating over a variety of changes, but each fix would continue to crash with what appeared to be stale Nodes in one of the Http2DependencyTree queues. On @SolidWallOfCode's advice, I added "in" methods to the PriorityQueue and Http2DependencyTree and ran last night with asserts that in was false at the end of Http2DependencyTree::remove. It failed two times last night, both on the same stack, see below. I added another lock within Http2ConnectionState::delete_stream. That code has been running on my prod box for a couple hours. Will let it keep cooking today as prime time east coast comes on. But so far so good. The changes at this point include
My test version also had event tracking on the XMIT and FINI events, but that didn't seem to help much, so I didn't push forward those changes. |
f990139 to
e25837b
Compare
e25837b to
850feec
Compare
|
@masaori335 It's priority tree stuff :) |
| ink_assert(client_streams_out_count > 0); | ||
| --client_streams_out_count; | ||
| } | ||
| stream_list.remove(stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now release_stream() doesn't release stream ;) We need some cleanup here like a) move counter updates outside of this function, b) remove argument, and c) rename this function.
But fixing the crash is urgency, so I'm fine with this changes.
masaori335
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree<T>::in() looks useful for debugging, and other changes looks reasonable. Let's land this if @shinrich's test on production goes well.
|
A couple crashes overnight after I removed my outstanding event tracking. Adding them back in my test build. |
|
Fighting possibly unrelated issues on my test machine today, so I am not confident on my enhancements to this PR. I think what is here is definitely much better than what we started with. I think there are still crashes, but once or twice a day instead of once every 10 minutes. I will continue to work on this tomorrow. But we may want to take this and get it moved over to 7.1.x if we are getting ready to make drop there. |
|
@maskit We ok with landing this as-is, and cherry-pick to 7.1.x ? |
|
@zwoop Yeh, this should be cherry-picked. |
|
Cherry-picked to 7.1.2 |
Removing the code to remove the stream from the ConnectionState stream list in Http2ConnectionState::release_stream. Most of the time Http2ConnectionState::delete_stream is called before Http2ConnectionState::release_stream. But if it is called the other way around, delete_stream will exit right away and the logic to remove the stream from the dependence tree is not executed. That leaves references to deleted streams.