Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][broker] Register the broker to metadata store without version id compare #23298

Conversation

BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Sep 12, 2024

Motivation

It's observed in our test environment many times after restarting the only broker with the extensible load manager's broker registry service.

2024-09-12T14:00:20,557+0000 [main] ERROR org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - Failed to start the extensible load balance and close broker registry org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl@6d6c4775.
org.apache.pulsar.broker.PulsarServerException: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /loadbalance/brokers/<broker-id>:8080 is already locked
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.start(BrokerRegistryImpl.java:112) ~
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.start(ExtensibleLoadManagerImpl.java:370) 
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.start(ExtensibleLoadManagerWrapper.java:50) 
        at org.apache.pulsar.broker.PulsarService.startLoadManagementService(PulsarService.java:1426) 
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:986) 
        ...
Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /loadbalance/brokers/<broker-id>:8080 is already locked
        at org.apache.pulsar.metadata.api.MetadataStoreException.unwrap(MetadataStoreException.java:186) ~
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.register(BrokerRegistryImpl.java:131) ~
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.start(BrokerRegistryImpl.java:110) ~
        ... 16 more

The "Resource xxx is already locked" is usually caused by a BadVersionException. This issue can happen when:

  1. Broker somehow crashed (e.g. killed by kill -9 or the container crashed) so it did not delete the node from the metadata store.
  2. Broker restarted, but the node still existed so it failed to acquire the lock.
  3. This error can only be recovered by restarting the broker after the metadata store deleted that node (manually by operators or automatically by session timeout detected)

Actually, it's not a good use case of LockManager here. The broker registry only:

  1. Put it's own broker's lookup data under the of /loadbalance/brokers/<broker-id> when starting (in the main thread)
  2. Delete its own broker id from when closing.
  3. Query any broker's lookup data from the /loadbalance/brokers/<broker-id>
  4. List all children nodes of /loadbalance/brokers.

Currently, step 1 and 2 are performed by the distribution lock implemented by LockManager. However, there should be no race condition because the broker id is the unique identifier of a broker. There is no case that a broker starts or closes concurrently. In addition, it's meaningless to pass a version explicitly to the metadata store.

Regarding step 3 and 4, they are simply wrapping the MetadataCache's methods. They are actually never related to something like "lock" though they're named listLocks and readLock.

Modifications

  • Add a put method to MetadataCache, which just calls MetadataStore#put with Optional.empty() underlying.
  • Replace LockManager with MetadataCache in BrokerRegistryImpl.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 12, 2024
@BewareMyPower BewareMyPower self-assigned this Sep 12, 2024
@BewareMyPower BewareMyPower added the type/bug The PR fixed a bug or issue reported a bug label Sep 12, 2024
@BewareMyPower BewareMyPower added this to the 4.0.0 milestone Sep 12, 2024
@BewareMyPower BewareMyPower added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages and removed type/bug The PR fixed a bug or issue reported a bug labels Sep 12, 2024
@codelipenghui
Copy link
Contributor

Currently, step 1 and 2 are performed by the distribution lock implemented by LockManager. However, there should be no race condition because the broker id is the unique identifier of a broker. There is no case that a broker starts or closes concurrently. In addition, it's meaningless to pass a version explicitly to the metadata store.

@BewareMyPower But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment. And the comment #23297 (comment) is also the same concern about the lock issue. It looks like something cause the lock didn't get released even if the broker unregister itself from the metadata server.

@lhotari
Copy link
Member

lhotari commented Sep 12, 2024

@BewareMyPower But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment.

If multiple brokers have the same broker id, the configuration would have multiple other problems. Wouldn't it be a valid requirement that the admin of the Pulsar system ensures that broker ids are unique?

And the comment #23297 (comment) is also the same concern about the lock issue. It looks like something cause the lock didn't get released even if the broker unregister itself from the metadata server.

Yes, that seems to be a real issue that would need to be addressed in any case.

@BewareMyPower
Copy link
Contributor Author

But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment.

I agree with Lari that we should not guarantee a certain behavior with wrong configuration. If you assumed invalid configuration could happen, what about two brokers configured different metadata stores? How can you prevent the broker with wrong metadata URL from starting?

@BewareMyPower
Copy link
Contributor Author

BewareMyPower commented Sep 13, 2024

From another perspective, this PR resolves the case the broker exited unexpectedly (e.g. by a crash). Just like the incident we met online:

Sep 12, 2024 @ 07:00:10.089	#  SIGSEGV (0xb) at pc=0x00000000000049c6, pid=1, tid=454

The JVM process exited by a SIGSEGV signal and no cleanup was done, even the metadata store client was not closed. Then when the broker restarted, it would fail with LockBusyException. After some time, the metadata store server detected the client's session timeout and then it deleted the node. Then restarting the broker would succeed.

When programming, we should handle exceptions rather than errors.

  • Unexpected crash of some other unexpected logics that skip the unregister are "exceptions": it can be recovered by ignoring the version compare, like what this PR does.
  • Misconfiguration at the beginning are "errors": it cannot be recovered until the operators realized this error and updated the correct configuration.

There is another solution that we can add retry logic to BrokerRegistryImpl#register, here is a simple example (without using Backoff or exception type check):

            final var deadline = System.currentTimeMillis() + pulsar.getConfig().getMetadataStoreSessionTimeoutMillis();
            Throwable throwable = null;
            while (true) {
                if (System.currentTimeMillis() > deadline) {
                    throw MetadataStoreException.unwrap(throwable == null ? new TimeoutException("Failed to register")
                            : throwable);
                }
                try {
                    this.brokerLookupDataLock = brokerLookupDataLockManager.acquireLock(keyPath(brokerId), brokerLookupData)
                            .get(conf.getMetadataStoreOperationTimeoutSeconds(), TimeUnit.SECONDS);
                    this.state = State.Registered;
                    break;
                } catch (InterruptedException | ExecutionException | TimeoutException e) {
                    throwable = e;
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException ex) {
                        throw MetadataStoreException.unwrap(ex);
                    }
                }
            }

@BewareMyPower
Copy link
Contributor Author

I just thought it again. Retry is not a good solution. If the broker crashed, the registered data could still exist in the metadata store so that this "dead" broker could be selected as the owner. However, even after it restarted, it could not serve any request until the metadata store session timeout. So many topics could be unavailable for a long time.

Therefore, it does not make sense to keep the current flaky logic just to fail fast for a misconfiguration case. @codelipenghui

@BewareMyPower BewareMyPower marked this pull request as draft September 13, 2024 04:05
@BewareMyPower BewareMyPower marked this pull request as ready for review September 13, 2024 06:34
@codelipenghui
Copy link
Contributor

I'm not going to say we should handle the incorrect configuration from the user side, but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Sep 13, 2024

I'm not going to say we should handle the incorrect configuration from the user side, but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

In this case, one of the benefits of this change is that in the case of broker crash, that the broker can recover immediately. The session of the crashed instance will block starting of a new instance if we don't make this change. I think that it's a different matter to detect if someone has duplicate broker ids in their Pulsar cluster configuration.

@BewareMyPower Perhaps the description could be updated to include the actual reason why we want to do this change, so that a crashed instance can recover immediately, without needing to wait until the session of the crashed instance expires?

Copy link
Member

@Demogorgon314 Demogorgon314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to apply the same change for ModularLoadManager?

@BewareMyPower
Copy link
Contributor Author

Perhaps the description could be updated to include the actual reason why we want to do this change,

@lhotari Updated.

but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

@codelipenghui I'd like to say, undefined behavior. Kafka's behavior is also that only one broker will succeed if multiple brokers configure the same broker.id. However, this behavior is never documented. It only suggests generating broker id automatically by ZooKeeper:

To avoid conflicts between ZooKeeper generated broker id's and user configured broker id's, generated broker ids start from reserved.broker.max.id + 1.

Actually, in real world, only if two brokers were configured with the same advertisedAddress and webServicePort would they have the same broker id. Even if they both don't fail after the change of this PR, users can still easily found the service didn't work because the advertised address is wrong.

It would be good if users can quickly know they've configured the same broker id for different brokers. However, IMO, this issue caused by misconfiguration can hardly exist in practical rather than in theory.

@BewareMyPower
Copy link
Contributor Author

Do we need to apply the same change for ModularLoadManager?

@Demogorgon314 +1 to me

@BewareMyPower BewareMyPower marked this pull request as draft September 13, 2024 08:52
@BewareMyPower BewareMyPower marked this pull request as ready for review September 13, 2024 09:15
@BewareMyPower
Copy link
Contributor Author

I swallowed the exception in BrokerRegistryImpl#unregister().

BTW, I think all synchronous close operations for components should not throw any exception. Otherwise, the following close operations will be skipped. Just add error logs for unexpected exceptions and warning logs for expected exceptions.

@codecov-commenter
Copy link

codecov-commenter commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 74.56%. Comparing base (bbc6224) to head (acf0321).
Report is 579 commits behind head on master.

Files with missing lines Patch % Lines
.../pulsar/metadata/cache/impl/MetadataCacheImpl.java 40.00% 4 Missing and 2 partials ⚠️
...ker/loadbalance/extensions/BrokerRegistryImpl.java 78.57% 2 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23298      +/-   ##
============================================
+ Coverage     73.57%   74.56%   +0.99%     
- Complexity    32624    34308    +1684     
============================================
  Files          1877     1927      +50     
  Lines        139502   145040    +5538     
  Branches      15299    15860     +561     
============================================
+ Hits         102638   108150    +5512     
+ Misses        28908    28617     -291     
- Partials       7956     8273     +317     
Flag Coverage Δ
inttests 27.94% <41.66%> (+3.36%) ⬆️
systests 24.69% <4.16%> (+0.37%) ⬆️
unittests 73.90% <62.50%> (+1.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...ker/loadbalance/extensions/BrokerRegistryImpl.java 82.07% <78.57%> (-2.69%) ⬇️
.../pulsar/metadata/cache/impl/MetadataCacheImpl.java 85.49% <40.00%> (-3.77%) ⬇️

... and 567 files with indirect coverage changes

@BewareMyPower BewareMyPower merged commit 13c19b5 into apache:master Sep 13, 2024
51 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/broker-registry-improve branch September 13, 2024 11:07
@shibd shibd requested a review from poorbarcode September 18, 2024 02:35
BewareMyPower added a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked/branch-3.3 doc-not-needed Your PR changes do not impact docs release/3.3.1 release/3.3.2 type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants