Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move the version check from leader campaign to startup #7978

Closed
dbsid opened this issue Mar 26, 2024 · 0 comments · Fixed by #7981
Closed

move the version check from leader campaign to startup #7978

dbsid opened this issue Mar 26, 2024 · 0 comments · Fixed by #7981
Assignees
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@dbsid
Copy link

dbsid commented Mar 26, 2024

Enhancement Task

This is the propose to move the version check from leader campaign to startup

we have a case when we lost the leader.

  1. the pd binary is built without version tag
  2. upgrade the 3 pd node to the wrong pd build
  3. after the upgrade completes, all the 3 pd nodes are into the crash loop during compaign leader, and the cluster lost the pd leader and no longer function.

Here is the pd log for panic during leader campaign

{"level":"INFO","time":"2024/03/26 00:12:16.368 +00:00","caller":"versioninfo.go:89","message":"Welcome to Placement Driver (PD)"}
{"level":"INFO","time":"2024/03/26 00:12:16.368 +00:00","caller":"versioninfo.go:90","message":"PD","release-version":"62227fb4c"}
...
{"level":"INFO","time":"2024/03/26 00:43:45.706 +00:00","caller":"server.go:1670","message":"campaign PD leader ok","campaign-leader-name":"pd-1"}
{"level":"FATAL","time":"2024/03/26 00:43:46.950 +00:00","caller":"versioninfo.go:61","message":"version string is illegal","error":"[PD:semver:ErrSemverNewVersion]62227fb4c is not in dotted-tri format: 62227fb4c is not in dotted-tri format","errorVerbose":"[PD:semver:E
rrSemverNewVersion]62227fb4c is not in dotted-tri format: 62227fb4c is not in dotted-tri format\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.(*Error).GenWithSt
ackByCause\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/normalize.go:307\ngithub.com/tikv/pd/pkg/versioninfo.ParseVersion\n\t/mnt/tidb/pd/pkg/versioninfo/versioninfo.go:52\ngithub.com/tikv/pd/pkg/versioninfo.MustParseVersion\n\t/mnt/tid
b/pd/pkg/versioninfo/versioninfo.go:59\ngithub.com/tikv/pd/server.CheckPDVersion\n\t/mnt/tidb/pd/server/util.go:40\ngithub.com/tikv/pd/server.(*Server).campaignLeader\n\t/mnt/tidb/pd/server/server.go:1743\ngithub.com/tikv/pd/server.(*Server).leaderLoop\n\t/mnt/tidb/pd/s
erver/server.go:1639\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650","stack":"github.com/tikv/pd/pkg/versioninfo.MustParseVersion\n\t/mnt/tidb/pd/pkg/versioninfo/versioninfo.go:61\ngithub.com/tikv/pd/server.CheckPDVersion\n\t/mnt/tidb/pd/server/util.go:40
\ngithub.com/tikv/pd/server.(*Server).campaignLeader\n\t/mnt/tidb/pd/server/server.go:1743\ngithub.com/tikv/pd/server.(*Server).leaderLoop\n\t/mnt/tidb/pd/server/server.go:1639"}
{"level":"WARN","time":"2024/03/26 00:43:51.984 +00:00","caller":"member.go:250","message":"the pd leader has not changed, delete and campaign again","old-pd-leader":"name:\"pd-1\" member_id:1438954984562261702 peer_urls:\"http
s://infra-tidb-pd-shopping-catalog-prod-0a019086.ec2.pin220.com:2380\" client_urls:\"https://infra-tidb-pd-shopping-catalog-prod-0a019086.ec2.pin220.com:2379\" "}
{"level":"INFO","time":"2024/03/26 00:43:51.986 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:52.187 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:52.388 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:43:58.012 +00:00","caller":"server.go:1607","message":"pd leader has changed, try to re-campaign a pd leader"}
{"level":"INFO","time":"2024/03/26 00:43:58.012 +00:00","caller":"server.go:1632","message":"skip campaigning of pd leader and check later","server-name":"pd-1","etcd-leader-id":626574301973153734,"member-id":143895498456226170
2}
{"level":"INFO","time":"2024/03/26 00:44:04.477 +00:00","caller":"server.go:1607","message":"pd leader has changed, try to re-campaign a pd leader"}
{"level":"INFO","time":"2024/03/26 00:44:04.477 +00:00","caller":"server.go:1644","message":"start to campaign PD leader","campaign-leader-name":"pd-1"}
{"level":"INFO","time":"2024/03/26 00:44:04.481 +00:00","caller":"leadership.go:181","message":"check campaign resp","resp":{"header":{"cluster_id":8850434198915930927,"member_id":16443876602637797343,"revision":55937640,"raft_term":342},"succeeded":true,"responses":[{"
Response":{"ResponsePut":{"header":{"revision":55937640}}}}]}}
{"level":"INFO","time":"2024/03/26 00:44:04.481 +00:00","caller":"server.go:1670","message":"campaign PD leader ok","campaign-leader-name":"pd-1"}

@dbsid dbsid added the type/enhancement The issue or PR belongs to an enhancement. label Mar 26, 2024
@JmPotato JmPotato self-assigned this Mar 26, 2024
ti-chi-bot bot added a commit that referenced this issue Mar 27, 2024
…7981)

close #7978

Move the release version check before the startup to ensure we can know it as soon as possible.

Signed-off-by: JmPotato <ghzpotato@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants