-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PD keeps transfering leader to a down store #3353
Labels
severity/moderate
status/TODO
The issue will be done in the future.
type/bug
The issue is confirmed as a bug.
Comments
The current PD uses a timeout of 10 minutes (not configurable) for the entire operator including peer movement. It does not use a separate timeout for each step, which is the main cause of this bug. |
Maybe we should increase the judgment of the store state during the scheduling. |
nolouch
added
status/TODO
The issue will be done in the future.
and removed
status/TODO
The issue will be done in the future.
labels
Oct 14, 2021
disksing
added a commit
to oh-my-tidb/pd
that referenced
this issue
Oct 19, 2021
fix tikv#3353 Signed-off-by: disksing <i@disksing.com>
disksing
added a commit
to oh-my-tidb/pd
that referenced
this issue
Oct 19, 2021
close tikv#3353 Signed-off-by: disksing <i@disksing.com>
ti-chi-bot
pushed a commit
that referenced
this issue
Nov 23, 2021
* operator: check store status for running operators close #3353 Signed-off-by: disksing <i@disksing.com> * add test Signed-off-by: disksing <i@disksing.com> * add tests Signed-off-by: disksing <i@disksing.com> * address comment Signed-off-by: disksing <i@disksing.com>
ti-chi-bot
pushed a commit
to ti-chi-bot/pd
that referenced
this issue
Nov 23, 2021
close tikv#3353 Signed-off-by: disksing <i@disksing.com>
This was referenced Nov 23, 2021
ti-chi-bot
pushed a commit
to ti-chi-bot/pd
that referenced
this issue
Nov 23, 2021
close tikv#3353 Signed-off-by: disksing <i@disksing.com>
ti-chi-bot
pushed a commit
to ti-chi-bot/pd
that referenced
this issue
Nov 23, 2021
close tikv#3353 Signed-off-by: disksing <i@disksing.com>
IcePigZDB
pushed a commit
to IcePigZDB/pd
that referenced
this issue
Nov 29, 2021
* operator: check store status for running operators close tikv#3353 Signed-off-by: disksing <i@disksing.com> * add test Signed-off-by: disksing <i@disksing.com> * add tests Signed-off-by: disksing <i@disksing.com> * address comment Signed-off-by: disksing <i@disksing.com>
disksing
pushed a commit
that referenced
this issue
Nov 30, 2021
disksing
pushed a commit
that referenced
this issue
Dec 1, 2021
ti-chi-bot
added a commit
that referenced
this issue
Dec 1, 2021
* operator: check store status for running operators close #3353 Signed-off-by: disksing <i@disksing.com> * add test Signed-off-by: disksing <i@disksing.com> * add tests Signed-off-by: disksing <i@disksing.com> * address comment Signed-off-by: disksing <i@disksing.com> * fix build Signed-off-by: disksing <i@disksing.com> * fix ci (try) Signed-off-by: disksing <i@disksing.com> Co-authored-by: disksing <i@disksing.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
severity/moderate
status/TODO
The issue will be done in the future.
type/bug
The issue is confirmed as a bug.
Bug Report
What did you do?
I used nightly build of PD to test joint consensus.
I enabled shuffle region scheduling and set max store down time to 30s. After killing two stores in the same label, a region will stuck and keep 4 replicas until 10min. One example looked like following:
, no pending peer was reported. Store 4 was killed and PD's log kept report
The text was updated successfully, but these errors were encountered: