-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Fix UnknownError might be returned for a partitioned producer #15161
[C++] Fix UnknownError might be returned for a partitioned producer #15161
Conversation
Fixes apache#15078 ### Motivation When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. ### Modifications Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. ### Verifying this change Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`.
@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs? |
@BewareMyPower is this a bug fix and we do not need to update docs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @BewareMyPower
@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs? |
…pache#15161) Fixes apache#15078 ### Motivation When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. ### Modifications Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. ### Verifying this change Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`.
…15161) Fixes #15078 ### Motivation When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. ### Modifications Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. ### Verifying this change Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`. (cherry picked from commit 0f85596)
…15161) Fixes #15078 When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`. (cherry picked from commit 0f85596)
…pache#15161) Fixes apache#15078 ### Motivation When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. ### Modifications Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. ### Verifying this change Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`.
…15161) Fixes #15078 When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`. (cherry picked from commit 0f85596)
…pache#15161) Fixes apache#15078 When a partitioned producer is created and some of the partitioned failed to create, `closeAsync` will be called immediately, even if other partitions were still in progress of creating the associated single producers. Since `closeAsync` is called before calling `setFailed` on the `partitionedProducerCreatedPromise_` field, there is a race condition that all single producers are closed before the promise is set. Then the promise will be set with `ResultUnknownError`, see https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317. Only after all single producers failed or succeeded then call `closeAsync` if one of them failed. And ensure `partitionedProducerCreatedPromise_` was completed before calling `closeAsync`. This PR also makes the state of a partitioned producer atomic because using a mutex to protect it makes code hard to write. Create a separate namespace `public/test-backlog-quotas` to test the case when the backlog quota exceeds. Then add `testBacklogQuotasExceeded` test to make some backlog via creating a consumer and sending some messages to a partition of the topic. In this test, only 1 partition has backlog and will fail with the related error. So the test verifies that `createProducer` could return a correct error instead of `ResultUnknownError`. (cherry picked from commit 0f85596) (cherry picked from commit 95b5ef8)
Fixes #15078
Motivation
When a partitioned producer is created and some of the partitioned
failed to create,
closeAsync
will be called immediately, even if otherpartitions were still in progress of creating the associated single
producers.
Since
closeAsync
is called before callingsetFailed
on thepartitionedProducerCreatedPromise_
field, there is a race conditionthat all single producers are closed before the promise is set. Then the
promise will be set with
ResultUnknownError
, seepulsar/pulsar-client-cpp/lib/PartitionedProducerImpl.cc
Line 317 in 4aeeed5
Modifications
Only after all single producers failed or succeeded then call
closeAsync
if one of them failed. And ensurepartitionedProducerCreatedPromise_
was completed before callingcloseAsync
.This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.
Verifying this change
Create a separate namespace
public/test-backlog-quotas
to test thecase when the backlog quota exceeds. Then add
testBacklogQuotasExceeded
test to make some backlog via creating aconsumer and sending some messages to a partition of the topic.
In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that
createProducer
could return acorrect error instead of
ResultUnknownError
.Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-required
(Your PR needs to update docs and you will update later)
no-need-doc
(Please explain why)
doc
(Your PR contains doc changes)
doc-added
(Docs have been already added)