-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDC/etcd_worker: add rate limiter to limit EtcdWorker tick frequency #3219
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
/run-all-tests |
the current etcd qps will increase exponentially with the number of tables that need to be replicated. ----- @asddongmen could u explain why? This pr is a workaround or a solution? |
I still can't get the point of your tests. "write 50w rows of data to each table in upstream" means a heavy continuous flow in upstream, so it is reasonable to increase the etcdworker ticks to speed up. Am I right? Or merge some resolvedTs or checkpointTs report will be a better solution? instead of just limit the rate. |
From what I understand, the point of this PR is to address the problem that, within a given period, the number of ticks in one node must be greater than or equal to the number of updates by all nodes. This behavior is not optimal from a very low level. If you mean batching updates for different changefeeds by "merge some resolvedTs or checkpointTs report", I think this will be a different optimization. The behavior of the business logic is orthogonal to what we are trying to fix here. We can investigate your proposal and address it in another PR. |
dde836d
to
9adb87d
Compare
/run-integration-tests |
Codecov Report
@@ Coverage Diff @@
## master #3219 +/- ##
================================================
- Coverage 57.2251% 56.8692% -0.3559%
================================================
Files 163 211 +48
Lines 19453 22768 +3315
================================================
+ Hits 11132 12948 +1816
- Misses 7261 8493 +1232
- Partials 1060 1327 +267 |
/merge |
This pull request has been accepted and is ready to merge. Commit hash: ea892d8
|
/run-all-tests |
/merge |
/run-dm-integration-tests |
In response to a cherrypick label: new pull request created: #3267. |
In response to a cherrypick label: new pull request created: #3268. |
In response to a cherrypick label: new pull request created: #3269. |
In response to a cherrypick label: new pull request created: #3270. |
In response to a cherrypick label: new pull request created: #3271. |
@asddongmen You don't seem to have filled out the release note correctly, so please fill out the release note for the next fix. |
What problem does this PR solve?
#3112
Too frequent etcd worker tick will cause etcd to be overburdened, and the current etcd qps will increase exponentially with the number of tables that need to be replicated.
What is changed and how it works?
Add rate limiter to limit EtcdWorker ticks frequency.
Check List
Tests
Summary: Limiting the tick frequency of
EtcdWoker
can reduce etcd qps by about 50%, but it will reduce the replication speed by about 20%.Test Environment:
Test1 (EtcdWorker ticks limit 10 times / s)
Create 16 changefeeds, synchronize a table for each, and write 50w rows of data to each table in upstream.
Test2 (EtcdWorker ticks limit 10 times / s)
Create 30 changefeeds, synchronize a table for each, and write 50w rows of data to each table in upstream.
Test3 (EtcdWorker ticks without limit )
Create 16 changefeeds, synchronize a table for each, and write 50w rows of data to each table in upstream.
Test4 (EtcdWorker ticks without limit )
Create 30 changefeeds, synchronize a table for each, and write 50w rows of data to each table in upstream.
Test5 (Incremental scan、EtcdWorker ticks limit 10 times / s)
Test6 (Incremental scan、EtcdWorker ticks limit 10 times / s)
When resume changefeed, the owner was down instantly.
Side effects
Related changes
Release note