-
Notifications
You must be signed in to change notification settings - Fork 844
fix race condition with session thread migration #10897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2e0be9 to
a44d752
Compare
src/proxy/http/HttpSessionManager.cc
Outdated
| // If thread must be migrated, clear out the VC's | ||
| // data and event handling on the original thread. | ||
| if (ethread != server_vc->get_thread()) { | ||
| SCOPED_MUTEX_LOCK(vclock, server_vc->mutex, ethread); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to do a try-lock instead? If the other thread handles a close before you get the lock here, things could disappear from underneath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just changing that to try-lock ends up with leaked connection messages in diags.log
3506d58 to
c6a8b8f
Compare
c6a8b8f to
55b848e
Compare
src/proxy/http/HttpSessionManager.cc
Outdated
| (!(match_style & TS_SERVER_SESSION_SHARING_MATCH_MASK_HOSTSNISYNC) || validate_host_sni(sm, first->get_netvc())) && | ||
| (!(match_style & TS_SERVER_SESSION_SHARING_MATCH_MASK_CERT) || validate_cert(sm, first->get_netvc()))) { | ||
| zret = HSM_DONE; | ||
| auto iter = m_fqdn_pool.find(hostname_hash); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this iter variable is re-declared below. This is a nitpick, but it makes it harder to reason about the scope of this variable with the other one introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 different scopes, the iterators are different types (perhaps kill the auto and make it more explicit?). Generally I only renamed "first" to "iter", and then moved the confusing zret logic down to the bottom of the function.
The changes to this function don't contribute to the PR itself, it's just cleanup.
0def5c6 to
dd5d30d
Compare
shinrich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks clean to me.
* fix race condition with session thread migration * document why event_loop could be null * do less work if per thread session pool * remove extra unecessasry do_io_write call in critical section
This reverts commit 94f366d.
This is a fix for a race condition (found and debugged in ats92) when a session is taken from the session pool and migrated to another thread. This is with GLOBAL session pools. There are still rare cases where events are still leaking through to the VC in the pre-migrated thread causing various different asserts.
This PR moves thread migration out of the session manager acquire critical section and locks and stops events on the original thread netvc during migration. Global (and likely hybrid) pool performance is improved in this case.
Initial testing using test case verifies that ats master crashes (quickly and a lot) without the patch and runs fine the patch.
Will test in prod for stability.
should close #9690
Also related to: #9689