Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: pause scheduling immediately #7809

Merged
merged 1 commit into from
Feb 7, 2024

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Feb 6, 2024

What problem does this PR solve?

Issue Number: Ref #5839.

What is changed and how does it work?

We use a watch mechanism to sync the configuration to the scheduling server, so there might be a delay. In the online recovery scene, we need to pause scheduling immediately. In this PR, we drop all region heartbeat responses from the scheduling server once the IsSchedulingHalted is changed to true.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Release note

None.

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Copy link
Contributor

ti-chi-bot bot commented Feb 6, 2024

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • CabinfeverB
  • JmPotato

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Feb 6, 2024
@ti-chi-bot ti-chi-bot bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 6, 2024
@rleungx rleungx requested review from CabinfeverB and removed request for disksing February 6, 2024 07:36
Copy link

codecov bot commented Feb 6, 2024

Codecov Report

Merging #7809 (0dd5c4c) into master (f0699ba) will increase coverage by 0.04%.
Report is 2 commits behind head on master.
The diff coverage is 55.55%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7809      +/-   ##
==========================================
+ Coverage   73.45%   73.49%   +0.04%     
==========================================
  Files         432      432              
  Lines       47843    47860      +17     
==========================================
+ Hits        35142    35174      +32     
+ Misses       9663     9641      -22     
- Partials     3038     3045       +7     
Flag Coverage Δ
unittests 73.49% <55.55%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label Feb 6, 2024
@@ -262,6 +263,10 @@ func forwardRegionHeartbeatToScheduling(forwardStream schedulingpb.Scheduling_Re
errCh <- errors.WithStack(err)
return
}
// TODO: find a better way to halt scheduling immediately.
if rc.GetOpts().IsSchedulingHalted() {
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is that the operator generated by the scheduling server still exists in memory. Is it possible to cancel all the scheduling server operators after the online recovery is complete so that the scheduling can be resumed as soon as possible?

Copy link
Member Author

@rleungx rleungx Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends. IMO, if those operators still take effect, cancelling all is not the best way. But if the region is changed, we may need to cancel. So I prefer to leave it for now.

@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Feb 7, 2024
@rleungx
Copy link
Member Author

rleungx commented Feb 7, 2024

/merge

Copy link
Contributor

ti-chi-bot bot commented Feb 7, 2024

@rleungx: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Copy link
Contributor

ti-chi-bot bot commented Feb 7, 2024

This pull request has been accepted and is ready to merge.

Commit hash: 0dd5c4c

@ti-chi-bot ti-chi-bot bot added the status/can-merge Indicates a PR has been approved by a committer. label Feb 7, 2024
@ti-chi-bot ti-chi-bot bot merged commit 3965b4c into tikv:master Feb 7, 2024
24 of 27 checks passed
@rleungx rleungx deleted the stop-scheduling branch February 7, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants