Skip to content

Fix bk write failure part 2 #5322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Oct 8, 2019

Conversation

jerrypeng
Copy link
Contributor

Master Issue: #5218

Motivation

Last PR to fix #5218 and a continuation of PR #5271. When a BK write failures happens, instead of waiting for 10 seconds before creating a new ledger, we should use proper signals to resume writing (create a new ledger) after the error recovery code has completed running to prevent any race conditions.

@jerrypeng jerrypeng added the type/bug The PR fixed a bug or issue reported a bug label Oct 5, 2019
@jerrypeng jerrypeng added this to the 2.4.2 milestone Oct 5, 2019
@jerrypeng jerrypeng self-assigned this Oct 5, 2019
@jerrypeng jerrypeng requested a review from srkukarni October 5, 2019 03:59
Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be some managed ledger unit tests that are dependent on the old behavior. We should fix them and add a new one to just test the readyToCreateNewLedger()

@jerrypeng jerrypeng force-pushed the fix_bk_write_failure_part_2 branch from 474607e to 8fb9c7d Compare October 6, 2019 01:19
@jerrypeng jerrypeng requested a review from merlimat October 6, 2019 01:20
@aahmed-se
Copy link
Contributor

run java8 tests

Copy link
Contributor

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if (exception instanceof ManagedLedgerFencedException) {
// If the managed ledger has been fenced, we cannot continue using it. We need to close and reopen
close();
} else {

// fence topic when failed to write a message to BK
isFenced = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we avoiding fencing ledger when topic actually sees ManagedLedgerFencedException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdhabalia I add this set here in a previous PR part of a line of PRs to fix problems in message deduplication (#5218). Setting "isFenced = true;" there when the exception is ManagedLedgerFencedException is inappropriate because of this check in close():

https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L895

There is already logic in the close() method to fence the topic

@jerrypeng
Copy link
Contributor Author

rerun java8 tests

@jerrypeng
Copy link
Contributor Author

rerun integration tests

@merlimat merlimat merged commit 60abcaa into apache:master Oct 8, 2019
wolfstudy pushed a commit that referenced this pull request Nov 20, 2019
* Bug in Message Deduplication that may cause incorrect behavior

* add tests

* fix error message

* fix client backoff

* fix tests

* cleaning up

* Fix handling of BK write failures for message dedup

* tests and clean up

* refactoring code

* fixing bugs

* addressing comments

* add missing license header

* Improve error handling of BK write failures

* fixing tests

* fixing bugs

* cleaning up

* addressing comments

* fixing tests

(cherry picked from commit 60abcaa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enabling Deduplication may drop messages
4 participants