Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: [date 12.29]standalone regression: update 60million rows resume cdc to mysql cost about 3hours #21017

Open
1 task done
heni02 opened this issue Dec 30, 2024 · 2 comments
Assignees
Labels
kind/bug Something isn't working phase/testing severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Milestone

Comments

@heni02
Copy link
Contributor

heni02 commented Dec 30, 2024

Is there an existing issue for the same bug?

  • I have checked the existing issues.

Branch Name

2.0-dev

Commit ID

8d91524

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

整个同步阶段的cdc metrics:
https://shanghai.idc.matrixorigin.cn:30001/d/de1wtrqxri6f4b/frontend-metrics?orgId=1&var-interval=1m&var-datasource=prometheus-standalone&var-instance=10.222.1.128:7001&from=1735542622343&to=1735555287091
企业微信截图_22dad7d4-a333-4753-b22e-cecfa3bfc294

15:21-15:45时间段:create cdc同步6千万的表(cdc性能比较预期)
15:49-18:28时间段:下游同步完6千万数据后,update6千万数据,resume cdc该表 (cdc性能不预期)

测试步骤见Steps to Reproduce

mo log:
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%2210-222-1-128%5C%22,%20filename%3D%5C%22%2Fdata1%2Frunners%2Faction-runner%2F_work%2Fmo-nightly-regression%2Fmo-nightly-regression%2Fhead%2Fmo-service-8d91524f1-20241229-222811.log%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221735544745258%22,%22to%22:%221735554591076%22%7D%7D%7D&schemaVersion=1&orgId=1

Expected Behavior

No response

Steps to Reproduce

mysql -h 127.0.0.1 -uac1:admin -P6001 -p123 -e "CREATE TABLE test_db.LINEITEM( L_ORDERKEY    BIGINT NOT NULL, L_PARTKEY     INTEGER NOT NULL, L_SUPPKEY     INTEGER NOT NULL, L_LINENUMBER  INTEGER NOT NULL, L_QUANTITY    DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL, L_DISCOUNT    DECIMAL(15,2) NOT NULL, L_TAX         DECIMAL(15,2) NOT NULL, L_RETURNFLAG  VARCHAR(1) NOT NULL, L_LINESTATUS  VARCHAR(1) NOT NULL, L_SHIPDATE    DATE NOT NULL, L_COMMITDATE  DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE     CHAR(10) NOT NULL, L_COMMENT      VARCHAR(44) NOT NULL, PRIMARY KEY (L_ORDERKEY, L_LINENUMBER) );"
  mysql -h 127.0.0.1 -uac1:admin -P6001 -p123 -e "load data infile '/data/mo-load-data/data/tpch_10/lineitem.tbl' into table test_db.lineitem  FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' parallel 'true';"
  cd $GITHUB_WORKSPACE/tool/mo-backup/
  echo "Create cdc_resume_6kw cdc test"
  ./mo_cdc task create --task-name "cdc_resume_6kw" --source-uri="***127.0.0.1:6001" --sink-type="mysql" --sink-uri="***10.222.1.129:3306" --tables='test_db.lineitem:cdc_regression_db.lineitem' --level="account" --account="ac1"
  sleep 1400s
  echo "Show cdc_resume_6kw cdc test status"
  ./mo_cdc task show --task-name "cdc_resume_6kw" --source-uri="***127.0.0.1:6001"
  echo "mysql data consistency verification,expect values=59986052"
  mysql -h 10.222.1.129 -u root -P3306 -pmo123456 -e "select count(*) from cdc_regression_db.lineitem;"
  echo "Pause cdc_resume_6kw cdc test"
  ./mo_cdc task pause --task-name "cdc_resume_6kw" --source-uri="***127.0.0.1:6001"
  # mo update data
  mysql -h 127.0.0.1 -uac1:admin -P6001 -p123 -e 'update test_db.lineitem set L_COMMENT="yyyyuuuu";'
  echo "Show cdc_resume_6kw cdc test status"
  ./mo_cdc task show --task-name "cdc_resume_6kw" --source-uri="***127.0.0.1:6001"
  echo "Resume cdc_resume_6kw cdc test"
  ./mo_cdc task resume --task-name "cdc_resume_6kw" --source-uri="***127.0.0.1:6001"
mysql -h 10.222.1.129 -u root -P3306 -pmo123456 -e 'select count(*) from cdc_regression_db.lineitem where L_COMMENT="yyyyuuuu";'

Additional information

No response

@heni02 heni02 added kind/bug Something isn't working severity/s0 Extreme impact: Cause the application to break down and seriously affect the use labels Dec 30, 2024
@heni02 heni02 added this to the 2.0.2 milestone Dec 30, 2024
@ck89119
Copy link
Contributor

ck89119 commented Jan 2, 2025

2.0.2在做mo_pubs/subs结构改造,没时间看,移到2.1.0

@ck89119 ck89119 modified the milestones: 2.0.2, 2.1.0 Jan 2, 2025
@ck89119
Copy link
Contributor

ck89119 commented Jan 3, 2025

通过查看log得知,任务最开始是在执行存量同步,中间在做增量同步,增量同步没把数据同步完changes数据被gc了导致stale read,触发cdc任务重新启动又开始跑存量同步,所以上图中左右会有两个高峰,是在执行存量同步,中间低的是在跑增量同步。

任务期间下游mysql所在机器的磁盘监控,中间段磁盘的负载很低:
image

重新跑了一次任务实时看了cpu负载,发现mysql进程的cpu是打满了:
企业微信截图_7b7b951d-6311-4d95-a175-f89ac92e3289

任务中间这段时间是在同步增量的数据变更,这里是update,cdc会把1条update语句变换为 delete + insert 2条sql, 生成的sql可能是这样:

delete from t1 where pk = 1;
insert into t1 values (1, whatever ...);
delete from t1 where pk = 2;
insert into t1 values (2, whatever ...);
...
delete from t1 where pk = n;
insert into t1 values (n, whatever ...);

对比同步存量数据生成的sql:

insert into t1 values (1, whatever ...), (2, whatever ...), ... (n, whatever ...);

sql语句大小相同的情况下,明显同步存量数据的sql中的有效数据会多很多,sql语句中insert into ...前缀只会出现一次,后面全是要插入的有效数据;而同步增量数据的sql,为保证数据的正确性,会出现deleteinsert交错的情况,无法把批量数据合并起来,sql语句中insertdelete前缀大量出现,真正有效的数据在sql语句中所占的比重非常低。

这大致能解释上面监控看到的现象:mysql把大量时间用于接收sql语句,parse sql语句,需要进行的IO操作反而不多,因此cpu负载很高,而磁盘反而没什么压力,这个场景的性能瓶颈在cpu。

在当前技术方案下,这属于下游cpu拉满的正常现象。

@ck89119 ck89119 assigned heni02 and unassigned ck89119 Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working phase/testing severity/s0 Extreme impact: Cause the application to break down and seriously affect the use
Projects
None yet
Development

No branches or pull requests

3 participants