Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][standalone] correctly delete bookie registration znode #23497

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

nodece
Copy link
Member

@nodece nodece commented Oct 22, 2024

Motivation

Sometimes, the standalone server may not start because /ledgers/available/%s have not been deleted correctly, which is introduced by #21407:

2024-10-21T09:51:29,569+0000 [BookieStateManagerService-0] INFO  org.apache.bookkeeper.discover.ZKRegistrationManager - Previous bookie registration znode: /ledgers/available/bk0test exists, so waiting zk sessiontimeout: 10000 ms for znode deletion
2024-10-21T09:51:29,570+0000 [BookieJournal-3181] INFO  org.apache.bookkeeper.bookie.JournalChannel - Opening journal data/standalone/bookkeeper0/current/192ae7cd847.txn
2024-10-21T09:51:35,584+0000 [SessionTracker] INFO  org.apache.zookeeper.server.ZooKeeperServer - Expiring session 0x100091b67e50000, timeout of 6000ms exceeded
2024-10-21T09:51:39,570+0000 [BookieStateManagerService-0] ERROR org.apache.bookkeeper.discover.ZKRegistrationManager - ZK exception checking and wait ephemeral znode /ledgers/available/bk0test expired : 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /ledgers/available/bk0test
	at org.apache.bookkeeper.discover.ZKRegistrationManager.checkRegNodeAndWaitExpired(ZKRegistrationManager.java:198) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.discover.ZKRegistrationManager.doRegisterBookie(ZKRegistrationManager.java:264) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.discover.ZKRegistrationManager.registerBookie(ZKRegistrationManager.java:224) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.bookie.BookieStateManager.doRegisterBookie(BookieStateManager.java:293) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.bookie.BookieStateManager.doRegisterBookie(BookieStateManager.java:281) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.bookie.BookieStateManager$2.call(BookieStateManager.java:244) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.bookie.BookieStateManager$2.call(BookieStateManager.java:239) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
	at java.lang.Thread.run(Unknown Source) ~[?:?]
2024-10-21T09:51:39,575+0000 [main] ERROR org.apache.bookkeeper.bookie.Bookie - Couldn't register bookie with zookeeper, shutting down : 
java.util.concurrent.ExecutionException: java.io.IOException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: java.io.IOException: ZK exception checking and wait ephemeral znode /ledgers/available/bk0test expired
	at java.util.concurrent.FutureTask.report(Unknown Source) ~[?:?]
	at java.util.concurrent.FutureTask.get(Unknown Source) ~[?:?]
	at org.apache.bookkeeper.bookie.BookieImpl.start(BookieImpl.java:722) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:124) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:87) ~[org.apache.bookkeeper-bookkeeper-server-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) ~[org.apache.bookkeeper-bookkeeper-common-4.16.6.jar:4.16.6]
	at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.16.6.jar:4.16.6]
	at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422) ~[com.google.guava-guava-32.1.1-jre.jar:?]
	at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.16.6.jar:4.16.6]
	at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startBK(LocalBookkeeperEnsemble.java:460) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.runBookies(LocalBookkeeperEnsemble.java:326) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.zookeeper.LocalBookkeeperEnsemble.startStandalone(LocalBookkeeperEnsemble.java:437) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.PulsarStandalone.startBookieWithZookeeper(PulsarStandalone.java:482) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.PulsarStandalone.start(PulsarStandalone.java:301) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.PulsarStandaloneStarter.start(PulsarStandaloneStarter.java:121) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
	at org.apache.pulsar.PulsarStandaloneStarter.main(PulsarStandaloneStarter.java:171) ~[com.ascentstream.pulsar-pulsar-broker-3.0.8.0-SNAPSHOT-16a7bcc.jar:3.0.8.0-SNAPSHOT-16a7bcc]
Caused by: java.io.IOException: org.apache.bookkeeper.bookie.BookieException$MetadataStoreException: java.io.IOException: ZK exception checking and wait ephemeral znode /ledgers/available/bk0test expired

Because we set up a bookie id, the bk will use the bookie id as that znode path:

    public static BookieId getBookieId(ServerConfiguration conf) throws UnknownHostException {
        String customBookieId = conf.getBookieId();
        if (customBookieId != null) {
            return BookieId.parse(customBookieId);
        }
        return getBookieAddress(conf).toBookieId();
    }

Modifications

  • Delete the /ledgers/available/$bookieAdvertisedAddress:$bookiePort and /ledgers/available/$bookieId znodes before starting bk.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 22, 2024
@nodece nodece self-assigned this Oct 22, 2024
@nodece
Copy link
Member Author

nodece commented Oct 22, 2024

/pulsarbot rerun-failure-checks

@nodece nodece merged commit ebb3cb5 into apache:master Oct 25, 2024
60 of 63 checks passed
@nodece nodece deleted the deleteBookieRegistrationZnode branch October 25, 2024 03:25
lhotari pushed a commit that referenced this pull request Oct 30, 2024
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit ebb3cb5)
nodece added a commit to ascentstream/pulsar that referenced this pull request Nov 6, 2024
…23497)

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit ebb3cb5)
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 7, 2024
…23497)

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit ebb3cb5)
(cherry picked from commit b0af444)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 7, 2024
…23497)

Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit ebb3cb5)
(cherry picked from commit b0af444)
lhotari pushed a commit that referenced this pull request Nov 13, 2024
Signed-off-by: Zixuan Liu <nodeces@gmail.com>
(cherry picked from commit ebb3cb5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants