Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to pause scheduling on specific regions #4620

Closed
wants to merge 1 commit into from
Closed

Support to pause scheduling on specific regions #4620

wants to merge 1 commit into from

Conversation

lightmelodies
Copy link

@lightmelodies lightmelodies commented Jan 28, 2022

What problem does this PR solve?

Currently Pd can only pause scheduler for whole cluster, which does not work well for online bulk load. This PR add support to pause scheduling on specific regions.

What is changed and how it works?

  1. Add a TTLCache in RaftCluster struct to store pinned regions.
  2. Modify most schedulers and checkers to skip pinned regions when calc operators.
  3. Add rest api to pin regions by key range or regions id with given duration, as well as list currently pinned regions.
  4. The pin api should be used immediately after split and scatter in most case.

Check List

Tests

  • Unit test

Release note

Support to pause scheduling on specific regions

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Jan 28, 2022
@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 28, 2022
Signed-off-by: lightmelodies <lightmelodies@outlook.com>
@codecov
Copy link

codecov bot commented Feb 7, 2022

Codecov Report

Merging #4620 (e63348d) into master (c1d1ecd) will decrease coverage by 0.04%.
The diff coverage is 36.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4620      +/-   ##
==========================================
- Coverage   74.76%   74.71%   -0.05%     
==========================================
  Files         281      281              
  Lines       27706    27784      +78     
==========================================
+ Hits        20714    20760      +46     
- Misses       5131     5168      +37     
+ Partials     1861     1856       -5     
Flag Coverage Δ
unittests 74.71% <36.66%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/api/region.go 61.94% <0.00%> (-5.65%) ⬇️
server/schedule/checker/split_checker.go 60.00% <0.00%> (-8.19%) ⬇️
server/schedulers/hot_region.go 82.22% <0.00%> (-0.36%) ⬇️
server/schedulers/random_merge.go 58.06% <0.00%> (-5.27%) ⬇️
server/schedulers/shuffle_hot_region.go 64.35% <0.00%> (-1.31%) ⬇️
server/cluster/cluster.go 82.80% <15.38%> (-1.18%) ⬇️
pkg/mock/mockcluster/mockcluster.go 95.00% <100.00%> (+0.03%) ⬆️
server/api/router.go 97.34% <100.00%> (+0.02%) ⬆️
server/schedule/checker/merge_checker.go 74.60% <100.00%> (+1.26%) ⬆️
server/schedule/healthy.go 100.00% <100.00%> (ø)
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1d1ecd...e63348d. Read the comment docs.

@nolouch
Copy link
Contributor

nolouch commented Feb 8, 2022

I prefer to do this with region labels, cc @disksing @rleungx

@nolouch nolouch requested review from disksing and removed request for HunDunDM February 8, 2022 06:37
@lightmelodies
Copy link
Author

I prefer to do this with region labels, cc @disksing @rleungx

There are some problems with current region labels implemention.

  1. No TTL support, so the pinning status cannot be automatically recovered if client crash.
  2. Only Key range are support now, which may be slower than lookup of regions id.

Anyway, This PR can be changed to use region labels strategy by modify IsRegionPinned 's implemention like this, which is not a huge work.

@disksing
Copy link
Contributor

disksing commented Feb 9, 2022

region label +1

@lightmelodies
Copy link
Author

For region label way, check this commit @nolouch @disksing

@ti-chi-bot
Copy link
Member

@lightmelodies: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 22, 2022
@nolouch
Copy link
Contributor

nolouch commented Feb 28, 2022

Hi, @lightmelodies. Thanks for your contribution. Sorry for that I ignored the changed commit with the region label. I saw you reviewed the pr #4649, which already meets a part of your requirement. and why you need a region ID rather than the key range?

@lightmelodies
Copy link
Author

Hi, @lightmelodies. Thanks for your contribution. Sorry for that I ignored the changed commit with the region label. I saw you reviewed the pr #4649, which already meets a part of your requirement. and why you need a region ID rather than the key range?

Not really need region ID. Though it's easy to implement with TTL. #4649 (with #4695) should be enough to meet my requirement after some prototype test. So I'll close this PR.

@nolouch
Copy link
Contributor

nolouch commented Mar 4, 2022

Thank your contribution.

@lightmelodies lightmelodies deleted the lightning branch March 7, 2022 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants