Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv (ticdc): fix kvClient reconnection downhill loop #10559

Merged
merged 14 commits into from
Jan 31, 2024

Conversation

asddongmen
Copy link
Contributor

@asddongmen asddongmen commented Jan 29, 2024

What problem does this PR solve?

Issue Number: close #10584

What is changed and how it works?

  1. Add an ID to the eventFeedStream struct to identify a stream and prevent it from being deleted unexpectedly.
  2. Bind the cancel function of a gRPC stream to its eventFeedStream to prevent the stream from being canceled unexpectedly.
  3. Reduce the number of calls to s.deleteStream to only once to prevent the stream from being deleted unexpectedly.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
  1. Add hard code time.sleep(5 * time.second) before calling s.deleteStream in both unfixed and fixed version of code.
  2. Deploy a TiDB cluster with 3 TiKV nodes and 1 TiCDC server, create a changefeed.
  3. Patch the changed cdc binary, and restart a TiKV node randomly and observe the changefeed's lag.

unfixed cdc:
img_v3_027f_9b5e20b7-2aab-44b7-8884-fc29d2797beg

fixed cdc:
image

From the above graphs, it is evident that in the unfixed CDC, the lag of resolvedTs can exceed 12 minutes when a TiKV node is restarted. However, in the fixed CDC, the increase in resolvedTs is limited to a maximum of 35 seconds. This demonstrates the effectiveness of the fix.

Moreover, when the hard-coded time.sleep(5 * time.second) is removed and the fixed version of CDC is tested again, the lag becomes even smaller:
image

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Fix a bug in kv client that could cause an increase in changefeed lag when TiKV is restarted.

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 29, 2024
@asddongmen asddongmen added affects-6.5 affects-7.1 affects-7.5 needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. component/kv-client TiKV kv log client component. area/ticdc Issues or PRs related to TiCDC. and removed affects-6.5 affects-7.1 affects-7.5 labels Jan 29, 2024
@asddongmen asddongmen self-assigned this Jan 29, 2024
@3AceShowHand
Copy link
Contributor

/test verify

cdc/kv/client.go Outdated Show resolved Hide resolved
cdc/kv/region_worker.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 30, 2024
Copy link

codecov bot commented Jan 30, 2024

Codecov Report

Merging #10559 (6aad55e) into master (6600096) will increase coverage by 0.0797%.
Report is 6 commits behind head on master.
The diff coverage is 71.2707%.

Additional details and impacted files
Components Coverage Δ
cdc 61.9979% <71.2707%> (+0.1840%) ⬆️
dm 51.2214% <ø> (+0.0101%) ⬆️
engine 63.3717% <ø> (-0.0566%) ⬇️
Flag Coverage Δ
unit 57.5745% <71.2707%> (+0.0797%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master     #10559        +/-   ##
================================================
+ Coverage   57.4947%   57.5745%   +0.0797%     
================================================
  Files           848        848                
  Lines        126095     125889       -206     
================================================
- Hits          72498      72480        -18     
+ Misses        48136      47982       -154     
+ Partials       5461       5427        -34     

@asddongmen asddongmen removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 30, 2024
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 30, 2024
@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 30, 2024
Copy link
Contributor

ti-chi-bot bot commented Jan 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, hicqu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 30, 2024
Copy link
Contributor

ti-chi-bot bot commented Jan 30, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-01-30 05:03:27.863205456 +0000 UTC m=+1455449.427503161: ☑️ agreed by hicqu.
  • 2024-01-30 06:05:08.294276261 +0000 UTC m=+1459149.858573963: ☑️ agreed by 3AceShowHand.

@ti-chi-bot ti-chi-bot bot merged commit 98adc64 into pingcap:master Jan 31, 2024
28 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #10570.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jan 31, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #10571.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jan 31, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #10572.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jan 31, 2024
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
asddongmen added a commit to ti-chi-bot/tiflow that referenced this pull request Feb 7, 2024
@ti-chi-bot ti-chi-bot removed the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Feb 28, 2024
@ti-chi-bot ti-chi-bot removed the needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. label Mar 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/ticdc Issues or PRs related to TiCDC. component/kv-client TiKV kv log client component. lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

changefeed may stuck when tikv upgrade/restart/evict leader
4 participants