Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster: when rolling restart PD, confirm that each peers can serve #1032

Merged
merged 1 commit into from
Dec 31, 2020

Conversation

HunDunDM
Copy link
Contributor

@HunDunDM HunDunDM commented Dec 31, 2020

Signed-off-by: Zheng Xiangsheng hundundm@gmail.com

What problem does this PR solve?

  • When tiup reload pd is executed, PD service may be unavailable for about 10 seconds.
    • If the PD immediately becomes the target of the transfer leader when it is just started, the publish operation of the ETCD in the PD will block until it times out after 10 seconds, causing the PD Leader to be unavailable for a short time.

What is changed and how it works?

  • When rolling restart PD, confirm that each peers can serve after starting

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    • sysbench
    • tiup reload pd

Related changes

  • Need to cherry-pick to the release branch

Release notes:

Fix: wait the restarted PD instance to be healthy before stopping next one during rolling restart of PD nodes.

…ve after starting

Signed-off-by: Zheng Xiangsheng <hundundm@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Dec 31, 2020

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 31, 2020
@HunDunDM
Copy link
Contributor Author

/label type/bug-fix

@ti-srebot ti-srebot added the type/bug-fix Categorizes PR as a bug-fix label Dec 31, 2020
@codecov-io
Copy link

codecov-io commented Dec 31, 2020

Codecov Report

Merging #1032 (f86d28c) into master (d044652) will decrease coverage by 5.60%.
The diff coverage is 75.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1032      +/-   ##
==========================================
- Coverage   55.68%   50.08%   -5.61%     
==========================================
  Files         280      280              
  Lines       19742    19750       +8     
==========================================
- Hits        10994     9892    -1102     
- Misses       7042     8239    +1197     
+ Partials     1706     1619      -87     
Flag Coverage Δ
cluster 35.44% <75.00%> (-8.18%) ⬇️
dm 23.98% <0.00%> (-0.08%) ⬇️
integrate 43.92% <75.00%> (-6.05%) ⬇️
playground 20.31% <ø> (ø)
tiup 16.48% <0.00%> (-0.02%) ⬇️
unittest 22.27% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/cluster/spec/pd.go 56.05% <75.00%> (-12.41%) ⬇️
components/cluster/command/check.go 6.27% <0.00%> (-73.27%) ⬇️
pkg/cluster/task/limits.go 0.00% <0.00%> (-68.75%) ⬇️
pkg/cluster/task/sysctl.go 0.00% <0.00%> (-66.67%) ⬇️
pkg/cluster/task/tls.go 0.00% <0.00%> (-61.54%) ⬇️
components/cluster/command/audit.go 27.27% <0.00%> (-54.55%) ⬇️
pkg/cluster/manager/cacert.go 0.00% <0.00%> (-53.20%) ⬇️
pkg/cluster/operation/check.go 0.00% <0.00%> (-53.07%) ⬇️
pkg/cluster/manager/patch.go 0.00% <0.00%> (-51.81%) ⬇️
pkg/cluster/task/rmdir.go 0.00% <0.00%> (-50.00%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d044652...f86d28c. Read the comment docs.

@HunDunDM HunDunDM changed the title [DNM] cluster/spec: when rolling restart PD, confirm that each peer can serve [DNM] cluster/spec: when rolling restart PD, confirm that each peers can serve Dec 31, 2020
@HunDunDM HunDunDM changed the title [DNM] cluster/spec: when rolling restart PD, confirm that each peers can serve [DNM] cluster: when rolling restart PD, confirm that each peers can serve Dec 31, 2020
@King-Dylan King-Dylan added this to the v1.4.0 milestone Dec 31, 2020
@King-Dylan
Copy link
Contributor

The test results appear to be valid(affect about 1~2s):
[ 575s ] thds: 16 tps: 6177.44 qps: 6177.44 (r/w/o: 0.00/6177.44/0.00) lat (ms,95%): 3.96 err/s: 0.00 reconn/s: 0.00
[ 576s ] thds: 16 tps: 6429.72 qps: 6429.72 (r/w/o: 0.00/6429.72/0.00) lat (ms,95%): 3.25 err/s: 0.00 reconn/s: 0.00
[ 577s ] thds: 16 tps: 6286.55 qps: 6286.55 (r/w/o: 0.00/6286.55/0.00) lat (ms,95%): 3.55 err/s: 0.00 reconn/s: 0.00
[ 578s ] thds: 16 tps: 6504.44 qps: 6504.44 (r/w/o: 0.00/6504.44/0.00) lat (ms,95%): 3.19 err/s: 0.00 reconn/s: 0.00
[ 579s ] thds: 16 tps: 6643.56 qps: 6643.56 (r/w/o: 0.00/6643.56/0.00) lat (ms,95%): 3.02 err/s: 0.00 reconn/s: 0.00

[ 580s ] thds: 16 tps: 3950.13 qps: 3950.13 (r/w/o: 0.00/3950.13/0.00) lat (ms,95%): 3.36 err/s: 0.00 reconn/s: 0.00

[ 581s ] thds: 16 tps: 655.29 qps: 655.29 (r/w/o: 0.00/655.29/0.00) lat (ms,95%): 4.25 err/s: 0.00 reconn/s: 0.00

[ 582s ] thds: 16 tps: 4388.59 qps: 4388.59 (r/w/o: 0.00/4388.59/0.00) lat (ms,95%): 3.07 err/s: 0.00 reconn/s: 0.00
[ 583s ] thds: 16 tps: 5389.28 qps: 5389.28 (r/w/o: 0.00/5389.28/0.00) lat (ms,95%): 3.68 err/s: 0.00 reconn/s: 0.00
[ 584s ] thds: 16 tps: 6331.06 qps: 6331.06 (r/w/o: 0.00/6331.06/0.00) lat (ms,95%): 3.55 err/s: 0.00 reconn/s: 0.00
[ 585s ] thds: 16 tps: 6439.14 qps: 6439.14 (r/w/o: 0.00/6439.14/0.00) lat (ms,95%): 3.62 err/s: 0.00 reconn/s: 0.00

@HunDunDM HunDunDM changed the title [DNM] cluster: when rolling restart PD, confirm that each peers can serve cluster: when rolling restart PD, confirm that each peers can serve Dec 31, 2020
@King-Dylan King-Dylan mentioned this pull request Dec 31, 2020
2 tasks
@HunDunDM
Copy link
Contributor Author

The test results appear to be valid(affect about 1~2s):

Thanks!

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 31, 2020
@AstroProfundis AstroProfundis added the category/stability Categorizes issue or PR as a stability enhancement. label Dec 31, 2020
@AstroProfundis
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: f86d28c

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 31, 2020
@ti-chi-bot ti-chi-bot merged commit 811854d into pingcap:master Dec 31, 2020
@HunDunDM HunDunDM deleted the pd-restart branch December 31, 2020 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category/stability Categorizes issue or PR as a stability enhancement. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/bug-fix Categorizes PR as a bug-fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants