Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiCDC owner gets stuck when PD leader shutdowns and transfers to new node #3615

Closed
Tracked by #3844
amyangfei opened this issue Nov 25, 2021 · 2 comments
Closed
Tracked by #3844
Assignees

Comments

@amyangfei
Copy link
Contributor

amyangfei commented Nov 25, 2021

What did you do?

  • Create a tidb cluster, with 6 TiKVs, 3 PDs, 3 TiCDCs
  • Shutdown PD leader at 2021/11/25 11:17:10
  • Observe ticdc replication status

What did you expect to see?

TiCDC can recover asap

What did you see instead?

TiCDC owner gets stuck, checkpoint doesn't move forward

Suspect ticdc owner is stuck in etcd worker txn

goroutine 577 [select]:
google.golang.org/grpc.(*pickerWrapper).pick(0xc000b9a600, 0x3bb1a90, 0xc00c7eaf60, 0xc00860ed00, 0x3540214, 0x14, 0x3bb1a90, 0xc00c7eaf60, 0xc004e36410, 0x52ea3c, ...)
        google.golang.org/grpc@v1.40.0/picker_wrapper.go:152 +0x19a
google.golang.org/grpc.(*ClientConn).getTransport(0xc000be0000, 0x3bb1a90, 0xc00c7eaf60, 0xc00c7eaf00, 0x3540214, 0x14, 0x110, 0x7fed1ece4d28, 0x120, 0xc006b3b0e0, ...)
        google.golang.org/grpc@v1.40.0/clientconn.go:881 +0x85
google.golang.org/grpc.(*clientStream).newAttemptLocked(0xc006b3b0e0, 0x0, 0x0, 0x0, 0x0, 0x0)
        google.golang.org/grpc@v1.40.0/stream.go:350 +0x18b
google.golang.org/grpc.newClientStream(0x3bb1a90, 0xc00c7eaf60, 0x63b9600, 0xc000be0000, 0x3540214, 0x14, 0xc0060ee880, 0x3, 0x4, 0x0, ...)
        google.golang.org/grpc@v1.40.0/stream.go:283 +0x98d
google.golang.org/grpc.invoke(0x3bb1a90, 0xc00c7eaed0, 0x3540214, 0x14, 0x32d7a40, 0xc00bd0ae60, 0x32d7b40, 0xc00c7eae70, 0xc000be0000, 0xc0060ee880, ...)
        google.golang.org/grpc@v1.40.0/call.go:66 +0x99
go.etcd.io/etcd/clientv3.(*Client).unaryClientInterceptor.func1(0x7fecf7d32a78, 0xc00c7eaed0, 0x3540214, 0x14, 0x32d7a40, 0xc00bd0ae60, 0x32d7b40, 0xc00c7eae70, 0xc000be0000, 0x3656ef8, ...)
        go.etcd.io/etcd@v0.5.0-alpha.5.0.20210512015243-d19fbe541bf9/clientv3/retry_interceptor.go:58 +0x45e
google.golang.org/grpc.(*ClientConn).Invoke(0xc000be0000, 0x7fecf7d32a78, 0xc000de39e0, 0x3540214, 0x14, 0x32d7a40, 0xc00bd0ae60, 0x32d7b40, 0xc00c7eae70, 0x63ba4c0, ...)
        google.golang.org/grpc@v1.40.0/call.go:35 +0x109
google.golang.org/grpc.Invoke(...)
        google.golang.org/grpc@v1.40.0/call.go:60
go.etcd.io/etcd/etcdserver/etcdserverpb.(*kVClient).Txn(0xc00000fee0, 0x7fecf7d32a78, 0xc000de39e0, 0xc00bd0ae60, 0x63ba4c0, 0x3, 0x3, 0x31bd480, 0x1, 0xc00bd0ae60)
        go.etcd.io/etcd@v0.5.0-alpha.5.0.20210512015243-d19fbe541bf9/etcdserver/etcdserverpb/rpc.pb.go:3497 +0xcf
go.etcd.io/etcd/clientv3.(*retryKVClient).Txn(0xc000bb66c0, 0x7fecf7d32a78, 0xc000de39e0, 0xc00bd0ae60, 0x63ba4c0, 0x3, 0x3, 0x0, 0x0, 0x0)
        go.etcd.io/etcd@v0.5.0-alpha.5.0.20210512015243-d19fbe541bf9/clientv3/retry.go:117 +0x7c
go.etcd.io/etcd/clientv3.(*txn).Commit(0xc00a4a2fc0, 0x0, 0x0, 0x0)
        go.etcd.io/etcd@v0.5.0-alpha.5.0.20210512015243-d19fbe541bf9/clientv3/txn.go:146 +0x158
github.com/pingcap/ticdc/pkg/orchestrator.(*EtcdWorker).commitChangedState(0xc00ba35180, 0x7fecf7d32a78, 0xc000de39e0, 0xc00c7ea1b0, 0x8a, 0x5, 0x8a)
        github.com/pingcap/ticdc/pkg/orchestrator/etcd_worker.go:365 +0xb63
github.com/pingcap/ticdc/pkg/orchestrator.(*EtcdWorker).applyPatchGroups(0xc00ba35180, 0x7fecf7d32a78, 0xc000de39e0, 0xc0083c2a80, 0x5, 0x5, 0x1, 0x1, 0x0, 0x0, ...)
        github.com/pingcap/ticdc/pkg/orchestrator/etcd_worker.go:316 +0x11b
github.com/pingcap/ticdc/pkg/orchestrator.(*EtcdWorker).Run(0xc00ba35180, 0x7fecf7d32a78, 0xc000de39e0, 0xc000c22060, 0xbebc200, 0x7fffd4aafe51, 0x11, 0x0, 0x0)
        github.com/pingcap/ticdc/pkg/orchestrator/etcd_worker.go:186 +0xabd
github.com/pingcap/ticdc/cdc/capture.(*Capture).runEtcdWorker(0xc000de83c0, 0x3bd4ae8, 0xc000de39e0, 0x3b51a00, 0xc00a342460, 0x3b84178, 0xc00b433590, 0xbebc200, 0x0, 0x0)
        github.com/pingcap/ticdc/cdc/capture/capture.go:304 +0x167
github.com/pingcap/ticdc/cdc/capture.(*Capture).campaignOwner(0xc000de83c0, 0x3bd4ae8, 0xc000de39e0, 0x3012360, 0x63eee70)
        github.com/pingcap/ticdc/cdc/capture/capture.go:282 +0x6c7
github.com/pingcap/ticdc/cdc/capture.(*Capture).run.func2(0xc000eb60a0, 0xc000de83c0, 0x3bd4ae8, 0xc000de39e0, 0xc000bb6080)
        github.com/pingcap/ticdc/cdc/capture/capture.go:203 +0xb5
created by github.com/pingcap/ticdc/cdc/capture.(*Capture).run
        github.com/pingcap/ticdc/cdc/capture/capture.go:197 +0x2de

goroutines:
goroutines.tar.gz

cdc log:

part_cdc.log.tar.gz

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

./tikv-server -V
TiKV
Release Version:   5.4.0-alpha
Edition:           Community
Git Commit Hash:   152f5ff11fc1f4769d2913bec663800e1c12327f
Git Commit Branch: master
UTC Build Time:    2021-11-19 09:17:09
Rust Version:      rustc 1.56.0-nightly (2faabf579 2021-07-27)
Enable Features:   jemalloc mem-profiling portable sse test-engines-rocksdb cloud-aws cloud-gcp
Profile:           dist_release

TiCDC version (execute cdc version):

./cdc version
Release Version: v5.2.0-master
Git Commit Hash: 00ea942a4ee87d8cd2bf7b4dfcb243e711719987
Git Branch: master
UTC Build Time: 2021-11-23 02:55:43
Go Version: go version go1.16.4 linux/amd64
Failpoint Build: false
@amyangfei amyangfei added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. severity/critical labels Nov 25, 2021
@Tammyxia
Copy link

Tammyxia commented Nov 26, 2021

Reproduced this issue with these steps:

  • all cdc-server start command specify --pd= pd-addr1, here we expect cdc can get all other pd address via pd-addr1.
  • shut down pd-addr1.
  • check changefeed status: always [CDC:ErrOwnerNotFound]
    image

@overvenus
Copy link
Member

Fixed by #3667

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants