Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Jitter the StoreRebalancer loop's timing #31227

Merged
merged 1 commit into from
Oct 11, 2018

Conversation

a-robinson
Copy link
Contributor

Release note: None

Just as a best practice. It may make failures like #31006 even less likely, although it's hard to say for sure.

@a-robinson a-robinson requested review from a team and petermattis October 10, 2018 22:40
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@a-robinson
Copy link
Contributor Author

@tschottdorf any thoughts on backporting this? It's much more likely to avoid problems than cause them, but man has it gotten late in the cycle.

bors r+

craig bot pushed a commit that referenced this pull request Oct 11, 2018
31227: storage: Jitter the StoreRebalancer loop's timing r=a-robinson a=a-robinson

Release note: None

Just as a best practice. It may make failures like #31006 even less likely, although it's hard to say for sure.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
@craig
Copy link
Contributor

craig bot commented Oct 11, 2018

Build succeeded

@craig craig bot merged commit 82bef53 into cockroachdb:master Oct 11, 2018
@tbg
Copy link
Member

tbg commented Oct 11, 2018

Is there a specific situation in which you expect it to cause problems?
Generally agreed that the jittering, if anything, is going to help.

@a-robinson
Copy link
Contributor Author

I don't think it would ever get persistently stuck given that the goal for the StoreRebalancer is to rebalance the store within one or two rounds of work, but due to delays in propagating information about the number of replicas and load on each store, running store x's rebalancer right after store y's means that store x probably doesn't know about any changes that may have been triggered by store y. If both store x and store y need a few rounds of changes, this means that all x's decisions may be suboptimal for multiple minutes, rather than just for a subset of those rounds.

tl;dr It might reduce flakiness of the rebalance-replicas-by-load roachtest (since that only has 5 minutes to succeed), and it might allow load to rebalance more quickly in edge cases in a real cluster where multiple stores are overloaded, and it's generally just good practice. It's not solving any big problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants