Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*/: add pd write frequency control from cdc server parameter #937

Merged
merged 6 commits into from
Sep 9, 2020

Conversation

amyangfei
Copy link
Contributor

@amyangfei amyangfei commented Sep 9, 2020

What problem does this PR solve?

Currently, we use a watch mechanism to update processor/changefeed checkpoint, the update frequency could be 20+/s, which may be too frequent in most scenarios.

What is changed and how it works?

Add parameters in cdc server to control owner/processor flush checkpoint interval

Before, with 200ms interval and 500ms interval

微信截图_20200909163819

After resolved ts check, processor flush interval=100ms, owner flush interval=200ms. But this has a bug, if both checkpoint ts and resolved ts are not forwarded and new table is dispatched to this processor, replication will be blocked!. We should fix this.

image

Current status

image

Check List

Tests

  • Unit test
  • Integration test

Related changes

  • Need to update the documentation

Release note

  • Support to control the write frequency of checkpoint flush from cdc server parameter

@amyangfei amyangfei added this to the v4.0.6 milestone Sep 9, 2020
@amyangfei amyangfei added the release-blocker This issue blocks a release. Please solve it ASAP. label Sep 9, 2020
@zier-one
Copy link
Contributor

zier-one commented Sep 9, 2020

please resolve the conflicts

@amyangfei amyangfei force-pushed the pd-update-frequency branch 2 times, most recently from 491b342 to 6a82f78 Compare September 9, 2020 07:10
Copy link
Member

@overvenus overvenus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +118 to +125
metrics := map[string]prometheus.Counter{
etcd.EtcdPut: etcdRequestCounter.WithLabelValues(etcd.EtcdPut, captureAddr),
etcd.EtcdGet: etcdRequestCounter.WithLabelValues(etcd.EtcdGet, captureAddr),
etcd.EtcdDel: etcdRequestCounter.WithLabelValues(etcd.EtcdDel, captureAddr),
etcd.EtcdTxn: etcdRequestCounter.WithLabelValues(etcd.EtcdTxn, captureAddr),
etcd.EtcdGrant: etcdRequestCounter.WithLabelValues(etcd.EtcdGrant, captureAddr),
etcd.EtcdRevoke: etcdRequestCounter.WithLabelValues(etcd.EtcdRevoke, captureAddr),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's better to put the map in package pkg/etcd, and pass capture address to the Wrap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metrics has cdc specific label capture, so I think it is better to put the metrics map in the upper caller.

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2020
@ti-srebot ti-srebot removed the status/LGT1 Indicates that a PR has LGTM 1. label Sep 9, 2020
@ti-srebot ti-srebot added the status/LGT2 Indicates that a PR has LGTM 2. label Sep 9, 2020
@zier-one
Copy link
Contributor

zier-one commented Sep 9, 2020

please resolve the conflicts

@zier-one
Copy link
Contributor

zier-one commented Sep 9, 2020

/run-integration-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

1 similar comment
@zier-one
Copy link
Contributor

zier-one commented Sep 9, 2020

/run-integration-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

- p.position.CHeckPointTs update to latest checkpoint, but interval
doesn't reach flushCheckpointInterval, doesn't flush
- checkpoint doesn't push, such as waiting for DDL execute, the
checkpoint will never forward then.
@amyangfei
Copy link
Contributor Author

/run-integration-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2020

Codecov Report

Merging #937 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master       #937   +/-   ##
===========================================
  Coverage   33.4140%   33.4140%           
===========================================
  Files           100        100           
  Lines         11974      11974           
===========================================
  Hits           4001       4001           
  Misses         7581       7581           
  Partials        392        392           

@amyangfei
Copy link
Contributor Author

/run-integration-tests tidb=release-4.0 tikv=release-4.0 pd=release-4.0

@amyangfei amyangfei merged commit 703e83d into pingcap:master Sep 9, 2020
@amyangfei amyangfei deleted the pd-update-frequency branch September 9, 2020 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker This issue blocks a release. Please solve it ASAP. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants