-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pulsar-broker] close managed-ledgers before giving up bundle ownership to avoid bad zk-version #5599
Conversation
…ip to avoid bad zk-version
rerun integration tests |
rerun integration tests |
@@ -1860,7 +1860,7 @@ protected void unloadTopic(TopicName topicName, boolean authoritative) { | |||
validateTopicOwnership(topicName, authoritative); | |||
try { | |||
Topic topic = getTopicReference(topicName); | |||
topic.close().get(); | |||
topic.close(false).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have an enum here because it's not clear what false means in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm.. I think closing forcefully can be represent with boolean flag as we do similar thing at multiple places: PersistentTopic::delete(boolean... flags)
Also, I was also trying to think about how to accommodate enum here instead of flag. One thing I can think of is to add below enum under Topic
instead of flag.
enum CLOSE_ACTION { CLOSE_ALL, CLOSE_WITHOUT_CLIENT_WAIT }
But I feel enum is not helping much. Instead we can rename the flag to give more meaning closeWithoutWaitingClientDisconnect
.
So, for PersistentTopic
if flag is enabled then broker skips waiting on client-disconnect and immediately closes managed-ledger before giving up bundle ownership.
And for NonPersistentTopic
just completes the close if flag is enabled.
Can you please let me know if I am missing anything while renaming flag instead making enum.? any thoughts?
* @return | ||
*/ | ||
public CompletableFuture<Integer> unloadServiceUnit(NamespaceBundle serviceUnit) { | ||
public CompletableFuture<Integer> unloadServiceUnit(NamespaceBundle serviceUnit, boolean force) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, the force
meaning is not evident when calling this method.
@@ -110,7 +110,7 @@ default long getOriginalSequenceId() { | |||
|
|||
CompletableFuture<Void> checkReplication(); | |||
|
|||
CompletableFuture<Void> close(); | |||
CompletableFuture<Void> close(boolean force); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the bool vs enum discussion, why do we need 2 different behaviors? Can't we just consider "force" the only approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just consider "force" the only approach?
No, because first we want to close topic gracefully by closing all clients first and then close managed-ledger. If things don't get closed gracefully then close topic forcefully by closing managed-ledger before giving up ownership of the bundle. So, we need both behavior.
@rdhabalia l will change the |
@merlimat addressed your comments and renamed the flag with more meaningful name instead making it enum as enum doesn't seem appropriate to define this method behavior. can you please review it again or can you please let me know if you have any thought on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, overall lgtm
…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ```
…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ```
Add label 2.5.1, due to #6339 dependency |
### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change
…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (cherry picked from commit 0a259ab)
### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change (cherry picked from commit 275854e)
…ip to avoid bad zk-version (#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (cherry picked from commit 0a259ab)
…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ``` (cherry picked from commit 0a259ab)
### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change (cherry picked from commit 275854e)
…ip to avoid bad zk-version (apache#5599) ### Motivation We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception: `ManagedLedgerException$BadVersionException` right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and `ManagedLedgerFactoryImpl` closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk. ``` 01:01:13.452 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: my-property/my-cluster/my-ns/0xd0000000_0xe0000000 : 01:01:13.653 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.service.BrokerService - [persistent://my-property/my-cluster/my-ns/topic-partition-53] Unloading topic : 01:02:13.677 [shutdown-thread-57-1] INFO org.apache.pulsar.broker.namespace.OwnedBundle - Unloading my-property/my-cluster/my-ns/0xd0000000_0xe0000000 namespace-bundle with 0 topics completed in 60225.0 ms : 01:02:13.675 [shutdown-thread-57-1] ERROR org.apache.pulsar.broker.namespace.OwnedBundle - Failed to close topics in namespace my-property/my-cluster/my-ns/0xd0000000_0xe0000000 in 1/MINUTES timeout 01:02:13.677 [pulsar-ordered-OrderedExecutor-7-0-EventThread] INFO org.apache.pulsar.broker.namespace.OwnershipCache - [/namespace/my-property/my-cluster/my-ns/0xd0000000_0xe0000000] Removed zk lock for service unit: OK : 01:02:14.404 [shutdown-thread-57-1] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53] Closing managed ledger ``` ### Modification This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves ``` 01:02:30.995 [bookkeeper-ml-workers-OrderedExecutor-3-0] ERROR org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [my-property/my-cluster/my-ns/persistent/topic-partition-53][my-sub] Metadata ledger creation failed org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion Caused by: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.4.13.jar:3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03] at org.apache.bookkeeper.mledger.impl.MetaStoreImplZookeeper.lambda$null$125(MetaStoreImplZookeeper.java:288) ~[managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [managed-ledger-original-2.4.5-yahoo.jar:2.4.5-yahoo] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [bookkeeper-common-4.9.0.jar:4.9.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.32.Final.jar:4.1.32.Final] at java.lang.Thread.run(Thread.java:834) [?:?] ```
### Motivation Since apache#5599 merged, it introduce some conflict code with master branch, maybe the reason is apache#5599 not rebase with master ### Verifying this change This is a test change
Motivation
We have seen multiple below occurrence where unloading topic doesn't complete and gets stuck. and broker gives up ownership after a timeout and closing ml-factory closes unclosed managed-ledger which corrupts metadata zk-version and topic owned by new broker keeps failing with exception:
ManagedLedgerException$BadVersionException
right now, while unloading bundle: broker removes ownership of bundle after timeout even if topic's managed-ledger is not closed successfully and
ManagedLedgerFactoryImpl
closes unclosed ml-ledger on broker shutdown which causes bad zk-version in to the new broker and because of that cursors are not able to update cursor-metadata into zk.Modification
This fix will make sure that broker closes managed-ledger before giving up bundle ownership to avoid below exception at new broker where bundle moves