-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix ownership loss #23515
Conversation
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java
Outdated
Show resolved
Hide resolved
|
d6ed275
to
c765197
Compare
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java
Show resolved
Hide resolved
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
c63596f
to
f7afe5d
Compare
@lhotari @BewareMyPower @poorbarcode @heesung-sn This PR description has been updated, could you have a chance to review this PR? |
pulsar-broker/src/main/java/org/apache/pulsar/broker/namespace/OwnershipCache.java
Show resolved
Hide resolved
This sounds like this ownership metadata and cache inconsistency issue only occurs from the wrong test setup. Should we just fix the test code in this case? |
@heesung-sn Sure, this is the root cause.
In one case, the user may delete the ownership on the zk. This PR can fix this issue and help the user recover ownership. What do you think that? |
Please consider adding comments to explain this. I think this PR is useful to fix the possible inconsistency state, although I think that we better fix the root cause that makes this inconsistency state, if possible. |
Ok, I'll try to fix the root cause. |
Signed-off-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 576d341)
Signed-off-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 576d341)
Signed-off-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 576d341)
Signed-off-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 576d341) Signed-off-by: Zixuan Liu <nodeces@gmail.com>
Motivation
While testing Pulsar 3.0, I encountered a failure in the
org.apache.pulsar.client.api.BrokerServiceLookupTest#testLookupConnectionNotCloseIfGetUnloadingExOrMetadataEx
test, both locally and in theascentstream/pulsar
CI environment.Ownership assignment and verification flow
Step 1: Ownership verification (Lookup phase)
The lookup mechanism first checks if the bundle is present in
ownedBundlesCache
. If absent, it queries ZooKeeper (zk) to determine the ownership status. Details on this process can be found inNamespaceService#findBrokerServiceUrl
andOwnershipCache#getOwnerAsync
.Step 2: Assigning ownership
If no owner is identified:
NamespaceService#searchForCandidateBroker
is used to find a broker.candidateBroker
matches the local broker ID (pulsar.getBrokerId()
), Pulsar caches the bundle by callingownershipCache.tryAcquiringOwnership(bundle)
.If an owner is found, the lookup result is returned to the client.
Step 3: Ownership verification for topic loading or creation
During topic loading or creation,
BrokerService#loadOrCreatePersistentTopic
invokesBrokerService#checkTopicNsOwnership
, which verifies the bundle's presence inownedBundlesCache
. If missing, an error is raised, indicating the namespace bundle is not managed by the current broker.Ownership loss
Direct deletion of ownership information from
ownedBundlesCache
or zk is generally avoided. However, the testBrokerServiceLookupTest#testLookupConnectionNotCloseIfGetUnloadingExOrMetadataEx
simulates such conditions to verify error handling in the lookup process.Simulating ownership release
Using
bundleOfTopic.releaseBundleLockAndMakeAcquireFail()
:ownedBundlesCache
.OPERATIONTIMEOUT
error:Restoring zk
The method
bundleOfTopic.makeAcquireBundleLockSuccess()
removes theOPERATIONTIMEOUT
error:This action restores the ownership in zk, though
ownedBundlesCache
remains empty.Loop triggered by ownership check during client connection
When a client connects and the bundle is missing from
ownedBundlesCache
, the following log message appears:This triggers the client to retry, resulting in a loop of repeated ownership verification and lookup attempts. Since zk shows ownership,
NamespaceService#searchForCandidateBroker
is bypassed, causing persistent retries.Modifications
getOwnerAsync
, the broker will try to acquire ownership by theownedBundlesCache
to avoid the ownership loss in the cache. In one case, the user may delete the ownership on the zk. This changes can fix that to help the user recover ownership.Documentation
doc
doc-required
doc-not-needed
doc-complete