Fix crash in Http2ClientSession::release_netvc #4147
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've seen this crash every few days for a while. Finally think I figured out the issue.
The stack is confusing when you look at the code be cause it appears that we are calling through a null client_vc, but the line above we check that client_vc is not null. I think the real problem is that client_vc has been closed and frees in the ProxyClientSession::handle_api_return() call down the stack.
If the original event came in on client_vc, this would have been ok because the vc's recursion count would be > 0 and the actual free wouldn't occur until we unwound the stack. But in this case the EOS event came in on the server VC, and this client VC is accessed form the HttpSM shutdown.
Just reordering the calls to vc->do_io_close and this->release_netvc in ProxyClientSession::handle_api_return should fix the issue. But looking at the bigger picture, I don't think that release_netvc is needed at all at this point. All it does is make sure that there is no lingering client session data on the netvc. But it is always called right next to a call to close the netvc, so at this point it adds no value.
I'm putting up this PR, but will be running this change internally for a bit to verify that it does address the issue.