-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC cloud: data inconsistency after CDC owner switching #2230
Comments
|
|
trying to reproduce the issue at IDC, not reproduced yet |
run several times with load |
In a inconsistent scenario, we found multiple nodes write to TiDB with the same table. |
Cloud env log analysisFrom the original log, there could be several reasons:
ReproduceI have reproduced multi TiCDC nodes write the same table in test environment, to make bug reproduce more easily, I injected a delay before a table pipeline is closed. ref: amyangfei@8fc643f. (The branch is based on The test procedure is simple
Take
The From these two logs we can confirm the first TiCDC node writes to the table t3 after t3 is started in the second TiCDC node. cdc logs: Cause analysis
Questions remained
|
From the cloud env log, this issue is the same scenario (new capture joins TiCDC cluster and TiCDC owner reschedules some tables to this new capture) with #2244. IMHO we can close one of them @Tammyxia
|
One possible hotfix:
Since step 3 is too ugly, an alternative to step 3, use global |
Bug Report
Please answer these questions before submitting your issue. Thanks!
Pause 1x changefeed
Load data to upstream tidb: $ bin/go-ycsb load mysql -P workloads/betting -p recordcount=9900000 -p mysql.host=xxx -p mysql.port=4000 --threads 200 -p dbnameprefix=testcc -p databaseproportions=1.0 -p unitnameprefix=unit1 -p unitscount=1 -p tablecount=200 -p loadbatchsize=500 -p mysql.password=12345678
Make CDC owner switch: $ kubectl delete pod db-ticdc-1 --kubeconfig=/etc/kubernetes/cluster1.conf -n tidbxxx, delete ticdc-0, ticdc-1, ticdc-2 one by one.
Load data to upstream again: $ bin/go-ycsb run mysql -P workloads/betting -p operationcount=5000000 -p mysql.host=xxx -p mysql.port=4000 --threads 200 -p dbnameprefix=testcc -p databaseproportions=1.0 -p unitnameprefix=unit1 -p unitscount=1 -p tablecount=200 -p mysql.password=12345678
Make all pd restart: $ kubectl delete pod db-pd-0 db-pd-1 db-pd-2 --kubeconfig=/etc/kubernetes/cluster1.conf -n tidbxxx
Check changefeed status, when upstream has stopped to write, and sync completed, check if data consistent.
sync-diff log:
[2021/07/06 09:54:56.487 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_12
] [where="((TRUE) AND TRUE)"] ["source checksum"=3733393904] ["target checksum"=2605381674] ["get source checksum cost"=250.960066ms] ["get target checksum cost"=206.317042ms][2021/07/06 09:55:05.426 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_18
] [where="((TRUE) AND TRUE)"] ["source checksum"=3258024002] ["target checksum"=317619229] ["get source checksum cost"=183.408826ms] ["get target checksum cost"=87.105012ms][2021/07/06 09:55:21.657 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_11
] [where="((TRUE) AND TRUE)"] ["source checksum"=446946590] ["target checksum"=3514961352] ["get source checksum cost"=214.137411ms] ["get target checksum cost"=146.654629ms][2021/07/06 09:55:45.994 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_14
] [where="((TRUE) AND TRUE)"] ["source checksum"=4122421317] ["target checksum"=274903492] ["get source checksum cost"=209.464336ms] ["get target checksum cost"=109.468714ms][2021/07/06 09:56:19.416 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_79
] [where="((TRUE) AND TRUE)"] ["source checksum"=1531020405] ["target checksum"=3060465391] ["get source checksum cost"=78.444038ms] ["get target checksum cost"=8.828986ms][2021/07/06 09:56:31.858 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_1
] [where="((TRUE) AND TRUE)"] ["source checksum"=1478455333] ["target checksum"=1441817492] ["get source checksum cost"=538.675238ms] ["get target checksum cost"=434.188737ms][2021/07/06 09:57:15.840 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_statistics_agent_game_day
] [where="((TRUE) AND TRUE)"] ["source checksum"=3721642887] ["target checksum"=4238525603] ["get source checksum cost"=82.975369ms] ["get target checksum cost"=19.861323ms][2021/07/06 09:57:19.254 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_21
] [where="((TRUE) AND TRUE)"] ["source checksum"=4167676186] ["target checksum"=3207274853] ["get source checksum cost"=116.469711ms] ["get target checksum cost"=107.899528ms][2021/07/06 09:57:25.812 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_61
] [where="((TRUE) AND TRUE)"] ["source checksum"=1741577261] ["target checksum"=3192701895] ["get source checksum cost"=76.139605ms] ["get target checksum cost"=15.00056ms][2021/07/06 09:57:35.276 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_statistics_user_game_day
] [where="((TRUE) AND TRUE)"] ["source checksum"=3141552518] ["target checksum"=4014241092] ["get source checksum cost"=90.358791ms] ["get target checksum cost"=30.339078ms][2021/07/06 09:57:50.224 +00:00] [WARN] [diff.go:551] ["checksum is not equal"] [table=
testcc0
.unit10_game_bets_game_tag_analysis_31
] [where="((TRUE) AND TRUE)"] ["source checksum"=1131104107] ["target checksum"=2463968134] ["get source checksum cost"=139.692822ms] ["get target checksum cost"=86.867459ms]For exmaple in table testcc0. unit10_statistics_agent_game_day, rows count is 4155, the different row is " id: 946 “
Versions of the cluster
Upstream TiDB cluster version (execute
SELECT tidb_version();
in a MySQL client):TiCDC version (execute
cdc version
):The text was updated successfully, but these errors were encountered: