-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] [ml] Add entry fail due to race condition about add entry failed/timeout and switch ledger #22221
[fix] [ml] Add entry fail due to race condition about add entry failed/timeout and switch ledger #22221
Conversation
pulsar-common/src/main/java/org/apache/pulsar/common/mutable/AtomicMutableBoolean.java
Outdated
Show resolved
Hide resolved
pulsar-common/src/main/java/org/apache/pulsar/common/mutable/AtomicMutableBoolean.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java
Outdated
Show resolved
Hide resolved
pulsar-common/src/main/java/org/apache/pulsar/common/mutable/AtomicMutableBoolean.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpAddEntry.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedLedgerImpl.java
Show resolved
Hide resolved
/pulsarbot rerun-failure-checks |
@poorbarcode Could you please rebase to the master branch since the last commit of this PR was happened a month ago. |
…d/timeout and switch ledger
8f1969f
to
cda2b7d
Compare
Rebased from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
this seems a pretty serious bug, to port to all the active branches
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22221 +/- ##
============================================
- Coverage 73.57% 73.20% -0.37%
+ Complexity 32624 2511 -30113
============================================
Files 1877 1889 +12
Lines 139502 141424 +1922
Branches 15299 15518 +219
============================================
+ Hits 102638 103536 +898
- Misses 28908 29883 +975
- Partials 7956 8005 +49
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…d/timeout and switch ledger (#22221)
…d/timeout and switch ledger (#22221)
…d/timeout and switch ledger (apache#22221) (cherry picked from commit b798e7f)
…d/timeout and switch ledger (apache#22221) (cherry picked from commit b798e7f)
…d/timeout and switch ledger (apache#22221) (cherry picked from commit b798e7f)
…d/timeout and switch ledger (apache#22221) (cherry picked from commit b798e7f)
@poorbarcode I have created #23208 about the excessive log warnings that this change causes. Please take a look. |
PR that fixes the excessive warnings: #23209 |
Issue 1 without enable
managedLedgerAddEntryTimeoutSeconds
Background: Flow when adding entry failed
Issue
COMPLETED
failed[4]Reproduce: no test to reproduce this case yet.
Issue 2 that enabled
managedLedgerAddEntryTimeoutSeconds
Background: Flow when adding entry timeout
Issue
The flow when adding entry timeout closing the first add entry request at first, it may cause an issue like this:
timeout task - step 1
: Close the first add entry request.callback of OpAddEntry 1
: discard the callback of the first OpAddEntry because it is closed, recycle the object OpAddEntry.callback of OpAddEntry 2
: tries do a completed callbackfalse
, and the first OpAddEntry has been recycled, Pulsar will get an NPE[2].Reproduce: the new test
testAddEntryResponseTimeout
Modifications
COMPLETED
failed.Footnotes
[1]: OpAddEntry.handleAddTimeoutFailure
[2]: OpAddEntry.run
[3]: ManagedLedgerImpl.createNewOpAddEntryForNewLedger
[4]: OpAddEntry.addComplete
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x