Delay cleanup when cache write continues after early client response #6469
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We saw a number of crashes in one of our 9.0 deployments last week.
Most of the crashes look like the NetVC associated with the ProxySession has already been deleted and we are trying to delete it again or access it from the HttpSM::kill_this() path.
Here are a couple stacks.
The client_vc in Http1ClientSession seems to be already freed.
Here the ssl_vc in frame 7 is associated with a different frame's net handler so the mutex assert triggers. I'm assuming that ssl_vc has been freed and reallocated.
Looking more closely into the history of the HttpSM of the first stack. The HttpSM sends a 304 back to the client but then sets SM_ACTION_INTERNAL_CACHE_WRITE to write the body of the 200 response it got from the origin to cache.
The issue is that the code does call ProxyTransaction::release after sending the 304 which causes another Session specific handler to be set to handle future network IO on the client_vc. If a EOS or timeout occurs, the client_vc object will cleaned up. But ProxySession objects will still refer to the client_vc stale pointer. This PR clears the write_vio and clears the client_vc when the ProxySession object receives a _EOS or session ending timeout.
Testing this patch on a prod box.