Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dr-autosync] scheduler resume after switching pd leader for 5min #6988

Closed
mayjiang0203 opened this issue Aug 25, 2023 · 2 comments · Fixed by #7044
Closed

[dr-autosync] scheduler resume after switching pd leader for 5min #6988

mayjiang0203 opened this issue Aug 25, 2023 · 2 comments · Fixed by #7044
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.

Comments

@mayjiang0203
Copy link

mayjiang0203 commented Aug 25, 2023

Bug Report

What did you do?

After change pd leader, check scheduler status.

What did you expect to see?

Be better if take less than 2min.

What did you see instead?

After fix merge in 6.5.4, still hit same issue with #6920, but the fix for #6920 did reduce the probability of occurrence of this problem.

What version of PD are you using (pd-server -V)?

v6.5.4

@mayjiang0203 mayjiang0203 added the type/bug The issue is confirmed as a bug. label Aug 25, 2023
@mayjiang0203
Copy link
Author

/severity major
/assign @HuSharp

@HuSharp
Copy link
Member

HuSharp commented Aug 28, 2023

Assume this situation:
If pd1 is the leader, then transfer the leader to pd2. Assume the pd1 and pd2 sync index both are 100, as time passes there no sync index is updated, and then transfer the leader to pd1.

when pd2 is looking for pd1 to sync region, it will use the next index on pd1 and compare it with pd2.
But since pd1 and pd2 were already equal before, which means that the region will not be synchronized to pd1 anymore.
This un-sync action results in syncing region client will not put region's fromHeartbeat into false
which means in processRegionHeartbeat process will judge all regions IsNew is false and then can not collect all regions.
The failed collection resulted in Scheduling is blocked for around 5 mins.

@HuSharp HuSharp added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. and removed may-affects-5.2 may-affects-5.3 may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Sep 21, 2023
@HuSharp HuSharp removed affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. labels Oct 11, 2023
ti-chi-bot bot added a commit that referenced this issue Oct 13, 2023
close #6988, close #7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
disksing added a commit to oh-my-tidb/pd that referenced this issue Nov 8, 2023
close tikv#6988, close tikv#7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Nov 14, 2023
close tikv#6988, close tikv#7016

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Nov 14, 2023
close tikv#6988, close tikv#7016

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@HuSharp HuSharp added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. labels Nov 20, 2023
ti-chi-bot bot added a commit that referenced this issue Nov 24, 2023
…7363)

close #6988, close #7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit that referenced this issue Dec 1, 2023
…7364)

close #6988, close #7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: husharp <jinhao.hu@pingcap.com>
Co-authored-by: Hu# <jinhao.hu@pingcap.com>
rleungx pushed a commit to rleungx/pd that referenced this issue Dec 1, 2023
close tikv#6988, close tikv#7016

Signed-off-by: husharp <jinhao.hu@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.
Projects
Development

Successfully merging a pull request may close this issue.

2 participants