Skip to content

Commit

Permalink
[C++] Fix the race condition of connect timeout task (#14823)
Browse files Browse the repository at this point in the history
Fixes #14665

### Motivation

In C++ client, a connect timeout task is created each time before an
asynchronous connect operation is performed, if the connection cannot be
established in the configured timeout, the callback of the task will be
called to close the connection and then the `createProducer` or
`subscribe` methods will return `ResultConnectError`.

`ClientConnection::connectTimeoutTask_`, which is a shared pointer,
represents the timeout task. However, after `ClientConnection::close` is
called, the shared pointer will be reset, and the underlying `PeriodicTask`
object will be released. After that, when `stop` method is called on the
released `PeriodicTask` object in the callback (`handleTcpConnected`), a
segmentation fault will happen.

The root cause is that `connectTimeoutTask_` can be accessed in two
threads while one of them could release the memory. See #14665 for more
explanations. This race condition leads to flaky Python tests as well,
because we also have the similar test in Python tests. See
https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221

So this PR might also fix #14714.

### Modifications

Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After
that, the `connectTimeoutTask_` will always points to the same
`PeriodicTask` object, whose methods are thread safe.

### Verifying this change

Execute the following command

```bash
./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10
```

to runs the `testConnectTimeout` for 10 times. In my local env, it never
failed, while before applying this patch, it's very easy to fail.
  • Loading branch information
BewareMyPower authored Mar 23, 2022
1 parent f7cbc1e commit 0c3aad1
Showing 1 changed file with 1 addition and 4 deletions.
5 changes: 1 addition & 4 deletions pulsar-client-cpp/lib/ClientConnection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1551,10 +1551,7 @@ void ClientConnection::close(Result result) {
consumerStatsRequestTimer_.reset();
}

if (connectTimeoutTask_) {
connectTimeoutTask_->stop();
connectTimeoutTask_.reset();
}
connectTimeoutTask_->stop();

lock.unlock();
LOG_INFO(cnxString_ << "Connection closed");
Expand Down

0 comments on commit 0c3aad1

Please sign in to comment.