Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ticdc 5.3.0] tikv_scale_plan_tidb case fail: 1. cdc oom, 2. data inconsistency, 3. changefeed stuck #3503

Closed
3 tasks done
Tracked by #3545
Tammyxia opened this issue Nov 17, 2021 · 4 comments
Closed
3 tasks done
Tracked by #3545
Assignees
Labels
area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/major subject/correctness Denotes an issue or pull request is related to correctness. type/bug The issue is confirmed as a bug.

Comments

@Tammyxia
Copy link

Tammyxia commented Nov 17, 2021

What did you do?

// create 1 changefeed, 3 cdc, 3 tikv
// run tpcc 100 warehouse prepare
// run tpcc run, meanwhile, scale cluster tikv from 3 -> 7, after 15m, scale in to 3
// create table "finishmark"
// wait table "finishmark" sync to downstream
//check data consistency

What did you expect to see?

pass

What did you see instead?

  • 1. cdc oom
  • 2. The two table has data inconsistency: warehouse,order_line
  • 3. changefeed stuck at almost the time create table "finishmark"

testbed saved for 24 hours.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

TiKV 
Release Version:   5.3.0
Edition:           Community
Git Commit Hash:   d514230a40974393297050645c223bcf1db9aedc
Git Commit Branch: heads/refs/tags/v5.3.0
UTC Build Time:    2021-11-16 12:18:25
Rust Version:      rustc 1.56.0-nightly (2faabf579 2021-07-27)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp
Profile:           dist_release

TiCDC version (execute cdc version):

/cdc version
Release Version: v5.3.0
Git Commit Hash: f847b331572379527bf37a7f19be20448a74b2c2
Git Branch: heads/refs/tags/v5.3.0
UTC Build Time: 2021-11-16 11:54:34
Go Version: go version go1.16.4 linux/amd64
Failpoint Build: false
@Tammyxia Tammyxia added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. severity/critical labels Nov 17, 2021
@Tammyxia
Copy link
Author

upstream tidb:
mysql> select count(1) from workload.order_line;
+----------+
| count(1) |
+----------+
| 41568034 |
+----------+

downstream tidb:
mysql> select count(1) from workload.order_line;
+----------+
| count(1) |
+----------+
| 25413084 |
+----------+

almost 2:1

warehouse row count is the same, only columne w_ytd totally different.

@overvenus
Copy link
Member

overvenus commented Nov 18, 2021

CDC OOM is similar to https://github.com/pingcap/ticdc/pull/3439, has merged to master and will be fixed in next releases.

@cyliu0 cyliu0 added the found/automation Bugs found by automation cases label Nov 18, 2021
@amyangfei amyangfei added the subject/correctness Denotes an issue or pull request is related to correctness. label Nov 18, 2021
overvenus added a commit to overvenus/ticdc that referenced this issue Nov 20, 2021
Note this is a workaround that reduces the probability of pingcap#3503,
sink may still reports a checkpoint that larger than resolved ts,
and may cause data lose and changefeed stuck.

Signed-off-by: Neil Shen <overvenus@gmail.com>
@overvenus
Copy link
Member

#3540 has mitigated this issue, change to severity/major.

@overvenus
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/major subject/correctness Denotes an issue or pull request is related to correctness. type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants