Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Fix UnknownError might be returned for a partitioned producer #15161

Conversation

BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Apr 13, 2022

Fixes #15078

Motivation

When a partitioned producer is created and some of the partitioned
failed to create, closeAsync will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since closeAsync is called before calling setFailed on the
partitionedProducerCreatedPromise_ field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with ResultUnknownError, see

partitionedProducerCreatedPromise_.setFailed(ResultUnknownError);

Modifications

Only after all single producers failed or succeeded then call
closeAsync if one of them failed. And ensure
partitionedProducerCreatedPromise_ was completed before calling
closeAsync.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

Verifying this change

Create a separate namespace public/test-backlog-quotas to test the
case when the backlog quota exceeds. Then add
testBacklogQuotasExceeded test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that createProducer could return a
correct error instead of ResultUnknownError.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • no-need-doc
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-added
    (Docs have been already added)

Fixes apache#15078

### Motivation

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

### Modifications

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

### Verifying this change

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.
@github-actions
Copy link

@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@Anonymitaet
Copy link
Member

@BewareMyPower is this a bug fix and we do not need to update docs?

@BewareMyPower BewareMyPower added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Apr 14, 2022
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @BewareMyPower

@BewareMyPower BewareMyPower removed the doc-not-needed Your PR changes do not impact docs label Apr 14, 2022
@github-actions
Copy link

@BewareMyPower:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@BewareMyPower BewareMyPower merged commit 0f85596 into apache:master Apr 14, 2022
@BewareMyPower BewareMyPower deleted the bewaremypower/cpp-par-tp-incorrect-error branch April 14, 2022 06:59
@BewareMyPower BewareMyPower added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Apr 14, 2022
aparajita89 pushed a commit to aparajita89/pulsar that referenced this pull request Apr 18, 2022
…pache#15161)

Fixes apache#15078

### Motivation

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

### Modifications

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

### Verifying this change

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.
codelipenghui pushed a commit that referenced this pull request Apr 19, 2022
…15161)

Fixes #15078

### Motivation

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

### Modifications

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

### Verifying this change

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.

(cherry picked from commit 0f85596)
codelipenghui pushed a commit that referenced this pull request Apr 19, 2022
…15161)

Fixes #15078

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.

(cherry picked from commit 0f85596)
@codelipenghui codelipenghui added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Apr 19, 2022
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
…pache#15161)

Fixes apache#15078

### Motivation

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

### Modifications

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

### Verifying this change

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.
codelipenghui pushed a commit that referenced this pull request May 20, 2022
…15161)

Fixes #15078

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.

(cherry picked from commit 0f85596)
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label May 20, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 23, 2022
…pache#15161)

Fixes apache#15078

When a partitioned producer is created and some of the partitioned
failed to create, `closeAsync` will be called immediately, even if other
partitions were still in progress of creating the associated single
producers.

Since `closeAsync` is called before calling `setFailed` on the
`partitionedProducerCreatedPromise_` field, there is a race condition
that all single producers are closed before the promise is set. Then the
promise will be set with `ResultUnknownError`, see
https://github.com/apache/pulsar/blob/4aeeed5dab9dfe9493526f36d539b3ef29cf6fe5/pulsar-client-cpp/lib/PartitionedProducerImpl.cc#L317.

Only after all single producers failed or succeeded then call
`closeAsync` if one of them failed. And ensure
`partitionedProducerCreatedPromise_` was completed before calling
`closeAsync`.

This PR also makes the state of a partitioned producer atomic because
using a mutex to protect it makes code hard to write.

Create a separate namespace `public/test-backlog-quotas` to test the
case when the backlog quota exceeds. Then add
`testBacklogQuotasExceeded` test to make some backlog via creating a
consumer and sending some messages to a partition of the topic.

In this test, only 1 partition has backlog and will fail with the
related error. So the test verifies that `createProducer` could return a
correct error instead of `ResultUnknownError`.

(cherry picked from commit 0f85596)
(cherry picked from commit 95b5ef8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.3 release/2.10.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
4 participants