-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][c++ client] avoid race condition causing double callback on close #15508
Conversation
The problem is that the decrement and checking of the counter are not done in atomic fashion: if (*numberOfOpenHandlers > 0) {
--(*numberOfOpenHandlers);
}
if (*numberOfOpenHandlers == 0) {
// .....
} We should instead convert into: if (--(*numberOfOpenHandlers) == 0) {
// .....
} |
Are we guaranteed that |
You're right, I was convinced we were using |
…se (#15508) * avoid race condition causing double callback on close * Update pulsar-client-cpp/lib/ClientImpl.cc Co-authored-by: Yunze Xu <xyzinfernity@163.com>
…se (#15508) * avoid race condition causing double callback on close * Update pulsar-client-cpp/lib/ClientImpl.cc Co-authored-by: Yunze Xu <xyzinfernity@163.com>
…se (#15508) * avoid race condition causing double callback on close * Update pulsar-client-cpp/lib/ClientImpl.cc Co-authored-by: Yunze Xu <xyzinfernity@163.com>
…se (apache#15508) * avoid race condition causing double callback on close * Update pulsar-client-cpp/lib/ClientImpl.cc Co-authored-by: Yunze Xu <xyzinfernity@163.com> (cherry picked from commit b2cafb3)
Motivation
The pulsar-client-node library uses the c api to interact with the pulsar client. The model for the c api is to pass in a
void *ctx
to async functions with a callback. Typically, these callbacks free the context and rely heavily on the fact that the callback is called once. I was able to reproducibly trigger the close callback to call twice, resulting in a double free error in pulsar-client-node.Modifications
My running theory is that the cause for this is a race condition in the decrementing of
numberOfOpenHandlers
shared pointer. It is possible for two threads to decrement the number in parallel, and to both end up with*numberOfOpenHandlers == 0
. This fix assumes this can happen and synchronizes on the state, so even if this does happen, only one thread will be able to set the state toClosed
and create the subsequent shutdown task.Verifying this change
This change is already covered by existing tests, such as all of the tests in ClientTest.cc. Given that this is a race condition it's not possible to reproduce consistently for tests. I will run reproduction steps with pulsar-client-node.
Does this pull request potentially affect one of the following parts:
Documentation
doc-required
no-need-doc
doc
doc-added