Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][broker] Fix the bug that elected leader thinks it's a follower #23138

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

heesung-sn
Copy link
Contributor

Fixes #23125

Motivation

As reported in the issue, #23125, we have seen a case where the elected leader indefinitely thinks it's a follower. As a result, this issue can block load shedding(run by the leader) and cause broker load imbalance.

Additionally, I think the broker leadership should be checked more aggressively upon zk re-connection as well because znode deletion watch events could be missed while reconnected[1]

Watches are maintained locally at the ZooKeeper server to which the client is connected. This allows watches to be lightweight to set, maintain, and dispatch. When a client connects to a new server, the watch will be triggered for any session events. Watches will not be received while disconnected from a server. When a client reconnects, any previously registered watches will be reregistered and triggered if needed. In general this all occurs transparently. There is one case where a watch may be missed: a watch for the existence of a znode not yet created will be missed if the znode is created and deleted while disconnected.

Refs:
[1] https://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkWatches

Modifications

  • Fix a bug to return Leading LeaderElectionState when the leader is the same as the proposed and from the same session.
  • elect() more aggressively upon zk reconnection events.
  • Set LeaderElectionState to NoLeader when closing LeaderElection.
  • Add more logs to better debug leader election

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: heesung-sn#77

@heesung-sn heesung-sn requested review from merlimat and lhotari August 7, 2024 17:49
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 7, 2024
@heesung-sn heesung-sn self-assigned this Aug 7, 2024
@heesung-sn heesung-sn changed the title [fix][broker] Fix the error that elected leader thinks it's a follower [fix][broker] Fix the bug that elected leader thinks it's a follower Aug 7, 2024
@lhotari lhotari added this to the 3.4.0 milestone Aug 7, 2024
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and good work @heesung-sn! LGTM

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.56%. Comparing base (bbc6224) to head (f797af3).
Report is 501 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23138      +/-   ##
============================================
+ Coverage     73.57%   74.56%   +0.99%     
- Complexity    32624    33576     +952     
============================================
  Files          1877     1919      +42     
  Lines        139502   144174    +4672     
  Branches      15299    15758     +459     
============================================
+ Hits         102638   107504    +4866     
+ Misses        28908    28436     -472     
- Partials       7956     8234     +278     
Flag Coverage Δ
inttests 27.75% <54.54%> (+3.17%) ⬆️
systests 24.75% <54.54%> (+0.42%) ⬆️
unittests 73.92% <100.00%> (+1.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...metadata/coordination/impl/LeaderElectionImpl.java 81.39% <100.00%> (+2.54%) ⬆️

... and 492 files with indirect coverage changes

@lhotari lhotari requested a review from poorbarcode August 8, 2024 08:18
@lhotari lhotari merged commit 3560ddb into apache:master Aug 8, 2024
51 checks passed
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
lhotari pushed a commit that referenced this pull request Aug 8, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 9, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 12, 2024
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] The Elected Leader Broker Thinks It's a Follower
3 participants