cluster: Improve cluster restart messaging #2442

zph · 2024-07-19T01:54:11Z

What problem does this PR solve?

Cluster restart messaging indicates to user that the:

Cluster will be unavailable (

tiup/pkg/cluster/manager/basic.go

Line 235 in 654b087

fmt.Sprintf("Will restart the cluster %s with nodes: %s roles: %s.\nCluster will be unavailable\nDo you want to continue? [y/N]:",

)

This message is confusing when the command run only requests a single node restart, especially monitoring nodes or ones in an HA deployment.

What is changed and how it works?

Commit changes the messaging so that users performing full cluster restart retain the same message as before. But users only restarting selected nodes or roles will have that replaced with a more detailed warning message:

Cluster functionality related to nodes: %s and roles: % will be unavailable

Check List

Tests

Manual test (add detailed scripts or steps below)
- I recompiled the code and ran both a full cluster restart request and a single node restart request

Code changes

N/a

Side effects

N/a

Related changes

Need to cherry-pick to the release branch
Need to update the documentation

Release notes:

NONE

Cluster restart messaging indicates to user that the: - Cluster will be unavailable (https://github.com/pingcap/tiup/blob/654b087a1d293128424c0bd04e1ba216b064ffd1/pkg/cluster/manager/basic.go#L235) This message is confusing when the command run only requests a single node restart, especially monitoring nodes or ones in an HA deployment. Commit changes the messaging so that users performing full cluster restart retain the same message as before. But users only restarting selected nodes or roles will have that replaced with a more detailed warning message: `Cluster functionality related to nodes: %s and roles: % will be unavailable`

ti-chi-bot · 2024-07-19T01:54:15Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CLAassistant · 2024-07-19T01:54:18Z

All committers have signed the CLA.

ti-chi-bot · 2024-07-19T01:54:21Z

Welcome @zph! It looks like this is your first PR to pingcap/tiup 🎉

ti-chi-bot · 2024-07-19T02:52:00Z

[LGTM Timeline notifier]

Timeline:

2024-07-19 02:52:00.393137721 +0000 UTC m=+581542.384079185: ☑️ agreed by xhebox.

zph · 2024-07-19T03:20:00Z

I'm away from my computer tonight, but if there are persistent test failures I can retry my manual regression

bb7133

LGTM, PTAL @kaaaaaaang

zph · 2024-07-19T14:55:47Z

Looking at the failing test, I'm unable to see enough detail to assess if it's my changes or unrelated test issues.

The part closest to pd receiving a signal is a connection error:

[2024/07/19 07:09:00.517 +00:00] [WARN] [server.go:530] ["Server.onConn handshake"] [conn=5506538949855674773] [error="read tcp 127.0.0.1:4000->127.0.0.1:55138: read: connection reset by peer"] ["remote addr"=127.0.0.1:55138]
[2024/07/19 07:09:01.300 +00:00] [INFO] [manager.go:263] ["revoke session"] ["owner info"="[log-backup] /tidb/br-stream/owner ownerManager 0491a53d-ac93-459c-b17c-1adcb6c564e1"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
...
check detail log from: /home/runner/work/tiup/tiup/go/src/github.com/pingcap/tiup/tests/tiup-playground/_tmp/home/data/test_play/tidb-0/tidb.log
pd quit: signal: killed

Please let me know if I should look into this test failure further or how I can run it locally to debug with more logs if it seems related to my change.

ti-chi-bot bot requested review from kaaaaaaang and nexustar July 19, 2024 01:54

ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 19, 2024

xhebox approved these changes Jul 19, 2024

View reviewed changes

ti-chi-bot bot added the lgtm label Jul 19, 2024

bb7133 approved these changes Jul 19, 2024

View reviewed changes

kaaaaaaang approved these changes Jul 19, 2024

View reviewed changes

xhebox merged commit 8d7cb71 into pingcap:master Jul 22, 2024
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster: Improve cluster restart messaging #2442

cluster: Improve cluster restart messaging #2442

zph commented Jul 19, 2024

ti-chi-bot bot commented Jul 19, 2024

CLAassistant commented Jul 19, 2024 •

edited

Loading

ti-chi-bot bot commented Jul 19, 2024

ti-chi-bot bot commented Jul 19, 2024

zph commented Jul 19, 2024

bb7133 left a comment

zph commented Jul 19, 2024

cluster: Improve cluster restart messaging #2442

cluster: Improve cluster restart messaging #2442

Conversation

zph commented Jul 19, 2024

What problem does this PR solve?

What is changed and how it works?

Check List

ti-chi-bot bot commented Jul 19, 2024

CLAassistant commented Jul 19, 2024 • edited Loading

ti-chi-bot bot commented Jul 19, 2024

ti-chi-bot bot commented Jul 19, 2024

[LGTM Timeline notifier]

zph commented Jul 19, 2024

bb7133 left a comment

Choose a reason for hiding this comment

zph commented Jul 19, 2024

CLAassistant commented Jul 19, 2024 •

edited

Loading