Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][c++ client] avoid race condition causing double callback on close #15508

Merged
merged 2 commits into from
May 9, 2022

Conversation

Matt-Esch
Copy link
Contributor

Motivation

The pulsar-client-node library uses the c api to interact with the pulsar client. The model for the c api is to pass in a void *ctx to async functions with a callback. Typically, these callbacks free the context and rely heavily on the fact that the callback is called once. I was able to reproducibly trigger the close callback to call twice, resulting in a double free error in pulsar-client-node.

Modifications

My running theory is that the cause for this is a race condition in the decrementing of numberOfOpenHandlers shared pointer. It is possible for two threads to decrement the number in parallel, and to both end up with *numberOfOpenHandlers == 0. This fix assumes this can happen and synchronizes on the state, so even if this does happen, only one thread will be able to set the state to Closed and create the subsequent shutdown task.

Verifying this change

  • Make sure that the change passes the CI checks.
  • Run reproduction steps with pulsar-client-node using the new version

This change is already covered by existing tests, such as all of the tests in ClientTest.cc. Given that this is a race condition it's not possible to reproduce consistently for tests. I will run reproduction steps with pulsar-client-node.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

  • doc-required

  • no-need-doc

  • doc

  • doc-added

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label May 9, 2022
@BewareMyPower BewareMyPower added component/client-c++ type/bug The PR fixed a bug or issue reported a bug labels May 9, 2022
pulsar-client-cpp/lib/ClientImpl.cc Outdated Show resolved Hide resolved
@merlimat
Copy link
Contributor

merlimat commented May 9, 2022

The problem is that the decrement and checking of the counter are not done in atomic fashion:

    if (*numberOfOpenHandlers > 0) {
        --(*numberOfOpenHandlers);
    }
    if (*numberOfOpenHandlers == 0) {
        // .....
    }

We should instead convert into:

    if (--(*numberOfOpenHandlers) == 0) {
        // .....
    }

@Matt-Esch
Copy link
Contributor Author

Are we guaranteed that --(*numberOfOpenHandlers) == 0 is atomic if the underlying data type isn't atomic? The fix I have put in place allows the decrement race but still only one thread can win due to the lock.

@merlimat
Copy link
Contributor

merlimat commented May 9, 2022

You're right, I was convinced we were using std::atomic<int> but we're not..

@merlimat merlimat merged commit 421fa1f into apache:master May 9, 2022
merlimat pushed a commit that referenced this pull request May 9, 2022
…se (#15508)

* avoid race condition causing double callback on close

* Update pulsar-client-cpp/lib/ClientImpl.cc

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
merlimat pushed a commit that referenced this pull request May 9, 2022
…se (#15508)

* avoid race condition causing double callback on close

* Update pulsar-client-cpp/lib/ClientImpl.cc

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
merlimat pushed a commit that referenced this pull request May 9, 2022
…se (#15508)

* avoid race condition causing double callback on close

* Update pulsar-client-cpp/lib/ClientImpl.cc

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
@merlimat merlimat added cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 labels May 9, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 11, 2022
…se (apache#15508)

* avoid race condition causing double callback on close

* Update pulsar-client-cpp/lib/ClientImpl.cc

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit b2cafb3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.3 release/2.10.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants