The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

sdojjy · 2023-04-20T02:58:04Z

reproduce step

Start a TiDB cluster with 3 PDs ① ② ③ and a ticdc connected
Scale-out 3 more PDs ④ ⑤ ⑥
Wait 31 seconds
Scale-in the original PDs ① ② ③

ticdc is restarted.

asddongmen · 2023-04-20T04:06:54Z

TiCDC may shutdown or get stuck in this scenario.
It was stuck in this function:

// campaign to be an owner.
func (c *captureImpl) campaign(ctx context.Context) error {
	failpoint.Inject("capture-campaign-compacted-error", func() {
		failpoint.Return(errors.Trace(mvcc.ErrCompacted))
	})
	// TODO: `Campaign` will get stuck when send SIGSTOP to pd leader.
	// For `Campaign`, when send SIGSTOP to pd leader, cdc maybe call `cancel`
	// (cause by `processor routine` exit). And inside `Campaign`, the routine
	// return from `waitDeletes`(https://github.com/etcd-io/etcd/blob/main/client/v3/concurrency/election.go#L93),
	// then call `Resign`(note: use `client.Ctx`) to etcd server. But the etcd server
	// (the client connects to) has entered the STOP state, which means that
	// the server cannot process the request, but will still maintain the GRPC
	// connection. So `routine` will block 'Resign'.
	return cerror.WrapError(cerror.ErrCaptureCampaignOwner, c.election.campaign(ctx, c.info.ID))
}

asddongmen · 2023-04-20T04:07:36Z

ref: pingcap/tidb#42643

close #8812

close pingcap#8812

close #8812

sdojjy added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. labels Apr 20, 2023

sdojjy added affects-6.1 affects-6.5 affects-7.0 affects-5.3 affects-5.2 affects-5.4 affects-6.0 affects-6.2 affects-6.3 affects-6.4 affects-6.6 labels Apr 20, 2023

asddongmen mentioned this issue Apr 20, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client #8813

Merged

asddongmen added the severity/major label Apr 20, 2023

ti-chi-bot added may-affects-5.1 may-affects-7.1 labels Apr 20, 2023

asddongmen removed may-affects-5.1 may-affects-7.1 affects-5.3 affects-5.2 affects-5.4 affects-6.0 affects-6.2 affects-6.3 affects-6.4 labels Apr 20, 2023

asddongmen removed affects-6.6 affects-7.0 labels Apr 20, 2023

ti-chi-bot bot closed this as completed in #8813 May 6, 2023

ti-chi-bot bot pushed a commit that referenced this issue May 6, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813)

58b465a

close #8812

This was referenced May 6, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813) #8902

Closed

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813) #8903

Merged

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this issue May 6, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (pingcap#8813)

887d751

close pingcap#8812

asddongmen added the affects-7.1 label May 8, 2023

ti-chi-bot mentioned this issue May 8, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813) #8909

Merged

ti-chi-bot bot pushed a commit that referenced this issue May 9, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813) (#8909)

b8c9eb7

close #8812

ti-chi-bot bot pushed a commit that referenced this issue May 9, 2023

Etcd (ticdc, dm):add AutoSyncInterval for Etcd client (#8813) (#8903)

bf96a69

close #8812

nongfushanquan mentioned this issue May 25, 2023

releases: add TiDB 7.1.0 release notes (stable version) pingcap/docs-cn#13896

Merged

16 tasks

nongfushanquan mentioned this issue Jun 9, 2023

add v6.5.3 release notes pingcap/docs-cn#14168

Merged

17 tasks

asddongmen mentioned this issue Jun 27, 2023

pd, changefeed (ticdc): fix pd related issues (#8884, #8813, #9106, #9174) #8901

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

sdojjy commented Apr 20, 2023

asddongmen commented Apr 20, 2023 •

edited

Loading

asddongmen commented Apr 20, 2023 •

edited

Loading

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

The etcd client does not use auto-sync and fails when the PD cluster address changed #8812

Comments

sdojjy commented Apr 20, 2023

reproduce step

asddongmen commented Apr 20, 2023 • edited Loading

asddongmen commented Apr 20, 2023 • edited Loading

asddongmen commented Apr 20, 2023 •

edited

Loading

asddongmen commented Apr 20, 2023 •

edited

Loading