storage: Jitter the StoreRebalancer loop's timing #31227

a-robinson · 2018-10-10T22:40:18Z

Release note: None

Just as a best practice. It may make failures like #31006 even less likely, although it's hard to say for sure.

Release note: None

cockroach-teamcity · 2018-10-10T22:40:24Z

This change is

a-robinson · 2018-10-11T15:12:06Z

@tschottdorf any thoughts on backporting this? It's much more likely to avoid problems than cause them, but man has it gotten late in the cycle.

bors r+

31227: storage: Jitter the StoreRebalancer loop's timing r=a-robinson a=a-robinson Release note: None Just as a best practice. It may make failures like #31006 even less likely, although it's hard to say for sure. Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>

craig · 2018-10-11T15:24:35Z

Build succeeded

GitHub CI (Cockroach)

tbg · 2018-10-11T16:48:20Z

Is there a specific situation in which you expect it to cause problems?
Generally agreed that the jittering, if anything, is going to help.

a-robinson · 2018-10-11T17:19:37Z

I don't think it would ever get persistently stuck given that the goal for the StoreRebalancer is to rebalance the store within one or two rounds of work, but due to delays in propagating information about the number of replicas and load on each store, running store x's rebalancer right after store y's means that store x probably doesn't know about any changes that may have been triggered by store y. If both store x and store y need a few rounds of changes, this means that all x's decisions may be suboptimal for multiple minutes, rather than just for a subset of those rounds.

tl;dr It might reduce flakiness of the rebalance-replicas-by-load roachtest (since that only has 5 minutes to succeed), and it might allow load to rebalance more quickly in edge cases in a real cluster where multiple stores are overloaded, and it's generally just good practice. It's not solving any big problems.

storage: Jitter the StoreRebalancer loop's timing

82bef53

Release note: None

a-robinson requested review from a team and petermattis October 10, 2018 22:40

tbg approved these changes Oct 11, 2018

View reviewed changes

craig bot merged commit 82bef53 into cockroachdb:master Oct 11, 2018

This was referenced Oct 12, 2018

roachtest: rebalance-leases-by-load failed #31303

Closed

release-2.1: storage: Jitter the StoreRebalancer loop's timing #31381

Merged

roachtest: rebalance-replicas-by-load failed #31006

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Jitter the StoreRebalancer loop's timing #31227

storage: Jitter the StoreRebalancer loop's timing #31227

a-robinson commented Oct 10, 2018

cockroach-teamcity commented Oct 10, 2018

a-robinson commented Oct 11, 2018

craig bot commented Oct 11, 2018

tbg commented Oct 11, 2018

a-robinson commented Oct 11, 2018

storage: Jitter the StoreRebalancer loop's timing #31227

storage: Jitter the StoreRebalancer loop's timing #31227

Conversation

a-robinson commented Oct 10, 2018

cockroach-teamcity commented Oct 10, 2018

a-robinson commented Oct 11, 2018

craig bot commented Oct 11, 2018

Build succeeded

tbg commented Oct 11, 2018

a-robinson commented Oct 11, 2018