-
Notifications
You must be signed in to change notification settings - Fork 4k
asim: make rebalancing more accurate #95685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously the testing utility to generate a simple workload existed within the `asim` pkg. This patch moves it into the `workload` pkg so that it can be depended upon by additional pkgs, such as metrics. Release note: None
Previously, the workload generator was ticked every background interval tick to reduce by enabling greater batching. This is problematic however when the workload ticks infrequently and the values are rated over a short period. An example problem is that it can lead to extremely high rated values for queries-per-second when a batch is applied for 10s of load in only a single tick. Release note: None
AlexTalks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/asim/config/settings.go line 37 at r5 (raw file):
var ( defaultStartTime = time.Date(2022, 03, 21, 11, 0, 0, 0, time.UTC)
nit: This could perhaps use a comment?
pkg/kv/kvserver/asim/state/impl.go line 717 at r3 (raw file):
store, ok := s.LeaseholderStore(rangeID) if !ok { panic(fmt.Sprintf("no leaseholder store found for range %d", storeID))
nit: any chance this may panic during races?
pkg/kv/kvserver/asim/state/load.go line 83 at r3 (raw file):
// information. func (rl *ReplicaLoadCounter) Load() allocator.RangeUsageInfo { if rl.loadStats == nil {
nit: are there any cases where loadStats can be nil anyway? It seems like there is one instance of a zero-value ReplicaLoadCounter{}, but otherwise most usages instantiate with NewReplicaLoadCounter already. This discrepancy maybe seems a bit subtle? Feels like maybe a nil *ReplicaLoadCounter should be used instead of a zero-value ReplicaLoadCounter with potentially nil fields that you have to check several places.
Previously, the allocator simulator used replica stats to track the rated stats of a replica. Replica load is a group of stats objects that is better suited. This patch replaces replica stats with replica load. Release note: None
Previously, the simulator relied on a bug where gossip shared a reference to each store descriptor in the cluster. This was fixed in cockroachdb#93945. This commit updates the simulator code to update the storepool state like the real code. Release note: None
Previously, the clock that is shared among all simulated stores was initialized to zero. The clock would then jump to the start time after simulator ticking begun. This is problematic for rated components such as replica stats which rely on the clock. This commit initializes the state clock with a passed in argument defining the start time. This commit also updates the simulation settings to use a start time and pass in a duration when running the simulation, as opposed to passing a start and end time. Release note: None
a0536b0 to
6185c28
Compare
kvoli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)
pkg/kv/kvserver/asim/config/settings.go line 37 at r5 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: This could perhaps use a comment?
Added.
pkg/kv/kvserver/asim/state/impl.go line 717 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: any chance this may panic during races?
Everything is built to run serially, mostly for determinism reasons. There are a lot of things that would panic during a race.
pkg/kv/kvserver/asim/state/load.go line 83 at r3 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: are there any cases where
loadStatscan be nil anyway? It seems like there is one instance of a zero-valueReplicaLoadCounter{}, but otherwise most usages instantiate withNewReplicaLoadCounteralready. This discrepancy maybe seems a bit subtle? Feels like maybe a nil*ReplicaLoadCountershould be used instead of a zero-valueReplicaLoadCounterwith potentially nil fields that you have to check several places.
Fair point here. I've updated to remove these checks and always return an instantiated ReplicaLoadCounter.
There's a TODO ontop of the load.go file which summarizes some changes that would include refining this interaction.
|
Bors r=alextalks |
|
Build succeeded: |
This series of commits resolves some inaccuracies in the allocator simulator. With this PR the simulated runs are now reasonably correct for replicate queue replica rebalancing, the store rebalancer (all) and load based splitting.
Notably the replicate queue simulation does not handle lease transfers nor any other action than rebalancing.
resolves: #83989
resolves: #83990
Release note: None