Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler(cdc): add ProcessorEpoch #4768

Merged
merged 11 commits into from
Mar 7, 2022

Conversation

liuzix
Copy link
Contributor

@liuzix liuzix commented Mar 4, 2022

What problem does this PR solve?

Issue Number: close #4769

What is changed and how it works?

  • Added epoch in Sync and DispatchTable messages, so that outdated dispatches will be ignored by the processor.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test

Restarts all TiCDC nodes at once with a latency of 2000ms injected between each pair of them. The changefeed did not pause and report error, but recovered quickly.
image

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch

Release note

Fix ErrProcessorDuplicateOperations when new scheduler is enabled (disabled by default)

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 4, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • 3AceShowHand
  • amyangfei

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 4, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 4, 2022

/run-leak-tests

@codecov-commenter
Copy link

codecov-commenter commented Mar 4, 2022

Codecov Report

Merging #4768 (8dfd148) into master (9607554) will decrease coverage by 0.1719%.
The diff coverage is 54.1091%.

Flag Coverage Δ
cdc 59.6219% <54.1091%> (-0.3004%) ⬇️
dm 52.0344% <ø> (+0.0055%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4768        +/-   ##
================================================
- Coverage   55.6402%   55.4682%   -0.1720%     
================================================
  Files           494        521        +27     
  Lines         61283      64289      +3006     
================================================
+ Hits          34098      35660      +1562     
- Misses        23750      25109      +1359     
- Partials       3435       3520        +85     

@liuzix
Copy link
Contributor Author

liuzix commented Mar 4, 2022

/run-leak-tests

@liuzix liuzix changed the title [WIP]scheduler(cdc): add ProcessorEpoch scheduler(cdc): add ProcessorEpoch Mar 4, 2022
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 4, 2022

/run-leak-tests

@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 4, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 4, 2022

/run-leak-tests

@liuzix liuzix added the status/ptal Could you please take a look? label Mar 4, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 4, 2022

/run-leak-tests

@@ -272,6 +289,7 @@ func (a *BaseAgent) processOperations(ctx context.Context) error {
for tableID, op := range a.tableOperations {
switch op.status {
case operationReceived:
a.logger.Info("Agent start processing operation", zap.Any("op", op))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable that the operation related log is O(#table) scale.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this log, it would be difficult to trace scheduling problems.

a.epochMu.Lock()
defer a.epochMu.Unlock()

a.epoch = uuid.New().String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the unique constraint of this epoch, and don't need the serialization guarantee. Should we add comment about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 7, 2022
@liuzix liuzix requested a review from 3AceShowHand March 7, 2022 03:54
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 7, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 7, 2022

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 7c093be

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 7, 2022
@liuzix
Copy link
Contributor Author

liuzix commented Mar 7, 2022

/run-verify

@liuzix
Copy link
Contributor Author

liuzix commented Mar 7, 2022

/run-integration-tests
/run-kafka-integration-test

1 similar comment
@3AceShowHand
Copy link
Contributor

/run-integration-tests
/run-kafka-integration-test

@liuzix
Copy link
Contributor Author

liuzix commented Mar 7, 2022

/run-integration-tests

@ti-chi-bot ti-chi-bot merged commit 0578db3 into pingcap:master Mar 7, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Mar 7, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4789.

ti-chi-bot added a commit that referenced this pull request Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. status/ptal Could you please take a look?
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[TiCDC] (unreleased feature) High inter-node latency causes ErrProcessorDuplicateOperations
5 participants