Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd/client(ticdc): add retry operation for etcd transaction api (#4248) #4474

Merged
merged 17 commits into from
Feb 8, 2022

Conversation

CharlesCheung96
Copy link
Contributor

@CharlesCheung96 CharlesCheung96 commented Jan 25, 2022

What problem does this PR solve?

Issue Number: close #4248

What is changed and how it works?

  1. Add retry strategy for etcd transaction, which uniformly handles errors related to etcd transactions and only throws errors that are not retryable, such as context.Canceled, context.DeadlineExceeded, errors.ErrReachMaxTry.
  2. Catch all of the EtcdError in the etcd_worker and retry the Txn operation again to prevent the cdc process from exiting abnormally.
  3. When the connection between cdc and etcd is abnormal, session will be responsible for disconnecting and throwing ErrEtcdSessionDone error, which causes the capture to automatically recover after suicide.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

Side effects

  • Possible performance regression
  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch

Release note

Fix a bug that owner exits abnormally when PD leader is killed

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jan 25, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • amyangfei
  • asddongmen

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 25, 2022
@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

@@ -145,12 +159,31 @@ func isRetryableError(rpcName string) retry.IsRetryable {
return false
}
if rpcName == EtcdRevoke {
if etcdErr, ok := err.(rpctypes.EtcdError); ok && etcdErr.Code() == codes.NotFound {
if etcdErr, ok := err.(v3rpc.EtcdError); ok && etcdErr.Code() == codes.NotFound {
// it means the etcd lease is already expired or revoked
return false
}
}

Copy link
Contributor Author

@CharlesCheung96 CharlesCheung96 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract a function, such as EtcdRetryableError() bool in the future, similar to the work in #3242

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL

@CharlesCheung96
Copy link
Contributor Author

/run-all-tests

@CharlesCheung96
Copy link
Contributor Author

/run-dm-integration-tests
/run-integration-tests

@amyangfei amyangfei added needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. type/bugfix This PR fixes a bug. labels Jan 25, 2022
@liuzix
Copy link
Contributor

liuzix commented Jan 25, 2022

Please add unit tests

@overvenus overvenus added needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. component/owner Owner component. labels Jan 25, 2022
@CharlesCheung96 CharlesCheung96 force-pushed the fix_4248_owner_exit branch 2 times, most recently from bf43014 to 12d4b8e Compare January 28, 2022 03:05
pkg/etcd/client.go Show resolved Hide resolved
@@ -145,12 +159,31 @@ func isRetryableError(rpcName string) retry.IsRetryable {
return false
}
if rpcName == EtcdRevoke {
if etcdErr, ok := err.(rpctypes.EtcdError); ok && etcdErr.Code() == codes.NotFound {
if etcdErr, ok := err.(v3rpc.EtcdError); ok && etcdErr.Code() == codes.NotFound {
// it means the etcd lease is already expired or revoked
return false
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract a function, such as EtcdRetryableError() bool in the future, similar to the work in #3242

@codecov-commenter
Copy link

codecov-commenter commented Jan 28, 2022

Codecov Report

Merging #4474 (29cac10) into master (9607554) will decrease coverage by 0.0929%.
The diff coverage is 58.2043%.

Flag Coverage Δ
cdc 60.3270% <79.7202%> (+0.4047%) ⬆️
dm 51.5643% <52.0874%> (-0.4646%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #4474        +/-   ##
================================================
- Coverage   55.6402%   55.5472%   -0.0930%     
================================================
  Files           494        497         +3     
  Lines         61283      61760       +477     
================================================
+ Hits          34098      34306       +208     
- Misses        23750      24023       +273     
+ Partials       3435       3431         -4     

@amyangfei
Copy link
Contributor

Has the case in #4248 been tested manually

  • How long will PD recover after kill -9
  • What is the behavior of TiCDC after PD is killed

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4528.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4529.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4530.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4531.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4532.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 8, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #4533.

overvenus pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 21, 2022
overvenus pushed a commit to ti-chi-bot/tiflow that referenced this pull request Feb 21, 2022
ti-chi-bot pushed a commit that referenced this pull request Feb 21, 2022
…) (#4474)

close #4248

Signed-off-by: Neil Shen <overvenus@gmail.com>
CharlesCheung96 added a commit that referenced this pull request Apr 28, 2022
etcd/client(ticdc): add retry operation for etcd transaction api (#4248) (#4474)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/owner Owner component. needs-cherry-pick-release-4.0 Should cherry pick this PR to release-4.0 branch. needs-cherry-pick-release-5.0 Should cherry pick this PR to release-5.0 branch. needs-cherry-pick-release-5.1 Should cherry pick this PR to release-5.1 branch. needs-cherry-pick-release-5.2 Should cherry pick this PR to release-5.2 branch. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Owner exits unexpectedly when PD leader is killed -9
7 participants