Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

Closed
sdojjy opened this issue Apr 20, 2023 · 2 comments · Fixed by #8813
Closed

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

sdojjy opened this issue Apr 20, 2023 · 2 comments · Fixed by #8813
Labels
affects-6.1 affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.

Comments

@sdojjy
Copy link
Member

sdojjy commented Apr 20, 2023

reproduce step

Start a TiDB cluster with 3 PDs ① ② ③ and a ticdc connected
Scale-out 3 more PDs ④ ⑤ ⑥
Wait 31 seconds
Scale-in the original PDs ① ② ③

ticdc is restarted.

@asddongmen
Copy link
Contributor

asddongmen commented Apr 20, 2023

TiCDC may shutdown or get stuck in this scenario.
It was stuck in this function:

// campaign to be an owner.
func (c *captureImpl) campaign(ctx context.Context) error {
	failpoint.Inject("capture-campaign-compacted-error", func() {
		failpoint.Return(errors.Trace(mvcc.ErrCompacted))
	})
	// TODO: `Campaign` will get stuck when send SIGSTOP to pd leader.
	// For `Campaign`, when send SIGSTOP to pd leader, cdc maybe call `cancel`
	// (cause by `processor routine` exit). And inside `Campaign`, the routine
	// return from `waitDeletes`(https://github.com/etcd-io/etcd/blob/main/client/v3/concurrency/election.go#L93),
	// then call `Resign`(note: use `client.Ctx`) to etcd server. But the etcd server
	// (the client connects to) has entered the STOP state, which means that
	// the server cannot process the request, but will still maintain the GRPC
	// connection. So `routine` will block 'Resign'.
	return cerror.WrapError(cerror.ErrCaptureCampaignOwner, c.election.campaign(ctx, c.info.ID))
}

@asddongmen
Copy link
Contributor

asddongmen commented Apr 20, 2023

ref: pingcap/tidb#42643

@ti-chi-bot ti-chi-bot bot closed this as completed in #8813 May 6, 2023
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this issue May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 affects-6.5 affects-7.1 area/ticdc Issues or PRs related to TiCDC. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants