Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case "cdc_minor_components_unavaible" failed because changefeed stuck #3521

Closed
Tracked by #3545
Tammyxia opened this issue Nov 18, 2021 · 6 comments
Closed
Tracked by #3545
Assignees
Labels
area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/moderate type/bug The issue is confirmed as a bug.

Comments

@Tammyxia
Copy link

What did you do?

// testbed: 3 ticdc, 3 tidb, 3 pd, 3 tikv
// create a changefeed, downstream is mysql8
// pause changefeed
// create database
// run tpcc prepare 20 warehouse, then tpcc run 20m.
// resume the changefeed
// make pd lead switch, one tidb failed, one tikv failed, by pod failure chaos.
// create table finishmark and wait for sync task to complete

What did you expect to see?

pass

What did you see instead?

  • sync task do not complete because changefeed stuck
  • ticdc-0 is owner, it OOM after changefeed resume for 3m, during scaning history data.

Versions of the cluster

Upstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

TiKV 
Release Version:   5.3.0
Edition:           Community
Git Commit Hash:   d514230a40974393297050645c223bcf1db9aedc
Git Commit Branch: heads/refs/tags/v5.3.0
UTC Build Time:    2021-11-16 12:18:25
Rust Version:      rustc 1.56.0-nightly (2faabf579 2021-07-27)
Enable Features:   jemalloc mem-profiling portable sse protobuf-codec test-engines-rocksdb cloud-aws cloud-gcp
Profile:           dist_release

TiCDC version (execute cdc version):

/cdc version
Release Version: v5.3.0
Git Commit Hash: f847b331572379527bf37a7f19be20448a74b2c2
Git Branch: heads/refs/tags/v5.3.0
UTC Build Time: 2021-11-16 11:54:34
Go Version: go version go1.16.4 linux/amd64
Failpoint Build: false
@Tammyxia Tammyxia added type/bug The issue is confirmed as a bug. area/ticdc Issues or PRs related to TiCDC. severity/major labels Nov 18, 2021
@Tammyxia
Copy link
Author

the checkpoint stopped at 2021-11-18 06:04:28.096 UTC, +8 is 14:04:28 beijing time, but from metrics, there's sink write during 14:30-14:40
image

@Tammyxia
Copy link
Author

Tammyxia commented Nov 18, 2021

[
{
"id": "cdc-minor-components-unavailable",
"summary": {
"state": "stopped",
"tso": 429186211668557827,
"checkpoint": "2021-11-18 06:04:28.096",
"error": {
"addr": "upstream-ticdc-1.upstream-ticdc-peer.cdc-testbed--tps-420033-1-334.svc:8301",
"code": "CDC:ErrOwnerUnknown",
"message": "rpc error: code = Unknown desc = [PD:tso:ErrGenerateTimestamp]generate timestamp failed, requested pd is not leader of cluster"
}
}
}
]

  • Then resume it, the changefeed checkpoint can move on.

@Tammyxia Tammyxia added the found/automation Bugs found by automation cases label Nov 18, 2021
@Tammyxia
Copy link
Author

Tammyxia commented Nov 18, 2021

  • After changefeed sync task completed(checkpoint is the latest time, no sink write), check if data consistency which is failed.

upstream tidb:
mysql> select count() from workload.order_line;
+----------+
| count(
) |
+----------+
| 9977391 |
+----------+
downstream mysql:
mysql> select count() from workload.order_line;
+----------+
| count(
) |
+----------+
| 8551337 |
+----------+

  • Other table is the same except order_line;

mysql> show create table order_line;
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| order_line | CREATE TABLE order_line (
ol_o_id int NOT NULL,
ol_d_id int NOT NULL,
ol_w_id int NOT NULL,
ol_number int NOT NULL,
ol_i_id int NOT NULL,
ol_supply_w_id int DEFAULT NULL,
ol_delivery_d datetime DEFAULT NULL,
ol_quantity int DEFAULT NULL,
ol_amount decimal(6,2) DEFAULT NULL,
ol_dist_info char(24) DEFAULT NULL,
PRIMARY KEY (ol_w_id,ol_d_id,ol_o_id,ol_number)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |

@overvenus
Copy link
Member

#3540 has mitigated this issue, change to severity/major.

@overvenus
Copy link
Member

The bug doesn't happen after since, change to severity//moderate.

@overvenus
Copy link
Member

The root cause has been fixed, see https://github.com/pingcap/ticdc/issues/3545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ticdc Issues or PRs related to TiCDC. found/automation Bugs found by automation cases severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants