-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/pd scale in topo not updated #824
Fix/pd scale in topo not updated #824
Conversation
Codecov Report
@@ Coverage Diff @@
## master #824 +/- ##
==========================================
+ Coverage 50.53% 52.73% +2.19%
==========================================
Files 258 262 +4
Lines 18805 18951 +146
==========================================
+ Hits 9504 9994 +490
+ Misses 7794 7399 -395
- Partials 1507 1558 +51
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
The test_scale_core_tls test case failed, and can be reproduced in local env, eg: bash /tiup-cluster/tests/tiup-cluster/run.sh test_scale_core_tls failed of scale in pd failed: start scale in pd
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/test_scale_core_25107/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/test_scale_core_25107/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.101
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.104
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.105
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.103
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.102
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.104
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.101
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.105
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.101
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.102
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.103
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.103
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.104
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.105
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.104
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.101
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.101
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.105
+ [Parallel] - UserSSH: user=tidb, host=172.19.0.103
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[172.19.0.103:2379] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: CleanupData:false CleanupLog:false RetainDataRoles:[] RetainDataNodes:[]}
Error: failed to scale in: no endpoint available, the last err is: error requesting https://172.19.0.105:2379/pd/api/v1/members/name/pd-172.19.0.103-2379, response: "etcdserver: unhealthy cluster"
, code 500 dig into the pd.log: [2020/09/28 22:31:22.322 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45176] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.322 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45174] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.424 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45194] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.425 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45196] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.522 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45212] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.522 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45214] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.622 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45232] [server-name=] [error="remote error: tls: bad certificate"]
[2020/09/28 22:31:22.622 +00:00] [WARN] [config_logging.go:279] ["rejected connection"] [remote-addr=172.19.0.104:45234] [server-name=] [error="remote error: tls: bad certificate"] Maybe the TLS's certificate not updated after pd scale-in or some other error. I'll deep dive into it later. |
6d2098c
to
2304ef8
Compare
Codecov Report
@@ Coverage Diff @@
## master #824 +/- ##
==========================================
+ Coverage 52.98% 53.01% +0.03%
==========================================
Files 261 261
Lines 19005 18999 -6
==========================================
+ Hits 10069 10072 +3
+ Misses 7376 7368 -8
+ Partials 1560 1559 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
maybe merged after #836 |
1af06e9
to
df5f156
Compare
df5f156
to
0b80670
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What problem does this PR solve?
fix #786
What is changed and how it works?
Delete pd instances in
topo.PDServers
if pd instances were removed.Check List
Tests
Code changes
Side effects
Related changes
Release notes: