Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in Raft config change in a two nodes cluster #4472

Closed
kikimo opened this issue Jul 28, 2022 · 1 comment
Closed

Deadlock in Raft config change in a two nodes cluster #4472

kikimo opened this issue Jul 28, 2022 · 1 comment
Assignees
Labels
affects/master PR/issue: this bug affects master version. later Solution: this issue will be handle in later version process/done Process of bug severity/minor Severity of bug type/bug Type: something is unexpected wontfix Solution: this will not be worked on recently

Comments

@kikimo
Copy link
Contributor

kikimo commented Jul 28, 2022

Please check the FAQ documentation before raising an issue

Describe the bug (required)

Nebula use apply after commit strategy in making raft membership change which suffer from a severe dead lock like problem when there are only two nodes in the cluster. In a two node cluster consisting leader and follower:

  1. user proppose a conf change command to remove leader
  2. leader remove itself from this cluster after committed and applied this command, while the follower might not have received or committed this command
  3. at this time, the follower will never be able to proceed anymore, this cluster will stay in a dead lock like status

this problem might happends when we try to perform balance data operation in a 1 replicas space.

Your Environments (required)

  • OS: uname -a
  • Compiler: g++ --version or clang++ --version
  • CPU: lscpu
  • Commit id all nebula versions

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Step 1
  2. Step 2
  3. Step 3

Expected behavior

Additional context

@kikimo kikimo added the type/bug Type: something is unexpected label Jul 28, 2022
@kikimo kikimo changed the title Dead lock in Raft config change in a two node cluster Dead lock in Raft config change in a two nodes cluster Jul 28, 2022
@kikimo kikimo changed the title Dead lock in Raft config change in a two nodes cluster Deadlock in Raft config change in a two nodes cluster Jul 28, 2022
@Sophie-Xie Sophie-Xie added this to the v3.3.0 milestone Jul 29, 2022
@critical27
Copy link
Contributor

critical27 commented Aug 5, 2022

Bug confirmed, maybe there won't be a good way to fix it. It is not quite same as what said in summary:

  1. user propose a conf change command to remove leader
  2. follower remove the old leader from its peers, however, old leader failed to do it (RPC timeout).
  3. at this time, the old leader has 2 replica, the follower (new added node) only has 1 replica, so it could elect itself as leader. And the balance task would failed at phase REMOVE_PEER.

There is a possible workaround, delete the part from disk in old leader and reboot.

@Sophie-Xie Sophie-Xie added the wontfix Solution: this will not be worked on recently label Aug 5, 2022
@Sophie-Xie Sophie-Xie removed this from the v3.3.0 milestone Aug 31, 2022
@Sophie-Xie Sophie-Xie added later Solution: this issue will be handle in later version and removed wontfix Solution: this will not be worked on recently labels Aug 31, 2022
@jinyingsunny jinyingsunny added the severity/minor Severity of bug label Nov 11, 2022
@HarrisChu HarrisChu added the affects/none PR/issue: this bug affects none version. label Dec 1, 2022
@jinyingsunny jinyingsunny added affects/master PR/issue: this bug affects master version. and removed affects/none PR/issue: this bug affects none version. labels Dec 1, 2022
@Sophie-Xie Sophie-Xie added the wontfix Solution: this will not be worked on recently label Dec 9, 2022
@github-actions github-actions bot added the process/fixed Process of bug label Dec 9, 2022
@Hester-Gu Hester-Gu added the process/done Process of bug label Jan 13, 2023
@github-actions github-actions bot removed the process/fixed Process of bug label Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/master PR/issue: this bug affects master version. later Solution: this issue will be handle in later version process/done Process of bug severity/minor Severity of bug type/bug Type: something is unexpected wontfix Solution: this will not be worked on recently
Projects
None yet
Development

No branches or pull requests

6 participants