Skip to content

Conversation

@kvoli
Copy link
Contributor

@kvoli kvoli commented Jan 23, 2023

This series of commits resolves some inaccuracies in the allocator simulator. With this PR the simulated runs are now reasonably correct for replicate queue replica rebalancing, the store rebalancer (all) and load based splitting.

Notably the replicate queue simulation does not handle lease transfers nor any other action than rebalancing.

resolves: #83989
resolves: #83990

Release note: None

kvoli added 2 commits January 23, 2023 15:29
Previously the testing utility to generate a simple workload existed
within the `asim` pkg. This patch moves it into the `workload` pkg so
that it can be depended upon by additional pkgs, such as metrics.

Release note: None
Previously, the workload generator was ticked every background interval
tick to reduce by enabling greater batching. This is problematic however
when the workload ticks infrequently and the values are rated over a
short period. An example problem is that it can lead to extremely high
rated values for queries-per-second when a batch is applied for 10s of
load in only a single tick.

Release note: None
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli changed the title asim: move test workload gen into workload pkg asim: make rebalancing more accurate Jan 23, 2023
@kvoli kvoli self-assigned this Jan 23, 2023
@kvoli kvoli marked this pull request as ready for review January 23, 2023 16:51
@kvoli kvoli requested a review from a team as a code owner January 23, 2023 16:51
@kvoli kvoli requested a review from AlexTalks January 23, 2023 16:52
Copy link
Contributor

@AlexTalks AlexTalks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Had a couple small questions/comments but overall looks good!

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 2 of 2 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/asim/config/settings.go line 37 at r5 (raw file):

var (
	defaultStartTime = time.Date(2022, 03, 21, 11, 0, 0, 0, time.UTC)

nit: This could perhaps use a comment?


pkg/kv/kvserver/asim/state/impl.go line 717 at r3 (raw file):

	store, ok := s.LeaseholderStore(rangeID)
	if !ok {
		panic(fmt.Sprintf("no leaseholder store found for range %d", storeID))

nit: any chance this may panic during races?


pkg/kv/kvserver/asim/state/load.go line 83 at r3 (raw file):

// information.
func (rl *ReplicaLoadCounter) Load() allocator.RangeUsageInfo {
	if rl.loadStats == nil {

nit: are there any cases where loadStats can be nil anyway? It seems like there is one instance of a zero-value ReplicaLoadCounter{}, but otherwise most usages instantiate with NewReplicaLoadCounter already. This discrepancy maybe seems a bit subtle? Feels like maybe a nil *ReplicaLoadCounter should be used instead of a zero-value ReplicaLoadCounter with potentially nil fields that you have to check several places.

kvoli added 3 commits January 27, 2023 23:35
Previously, the allocator simulator used replica stats to track the
rated stats of a replica. Replica load is a group of stats objects that
is better suited. This patch replaces replica stats with replica load.

Release note: None
Previously, the simulator relied on a bug where gossip shared a
reference to each store descriptor in the cluster.

This was fixed in cockroachdb#93945. This commit updates the simulator code to
update the storepool state like the real code.

Release note: None
Previously, the clock that is shared among all simulated stores was
initialized to zero. The clock would then jump to the start time after
simulator ticking begun. This is problematic for rated components such
as replica stats which rely on the clock. This commit initializes the
state clock with a passed in argument defining the start time. This
commit also updates the simulation settings to use a start time and
pass in a duration when running the simulation, as opposed to passing a
start and end time.

Release note: None
@kvoli kvoli force-pushed the 230123.asim-misc-fix branch from a0536b0 to 6185c28 Compare January 27, 2023 23:37
Copy link
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks and @kvoli)


pkg/kv/kvserver/asim/config/settings.go line 37 at r5 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: This could perhaps use a comment?

Added.


pkg/kv/kvserver/asim/state/impl.go line 717 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: any chance this may panic during races?

Everything is built to run serially, mostly for determinism reasons. There are a lot of things that would panic during a race.


pkg/kv/kvserver/asim/state/load.go line 83 at r3 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

nit: are there any cases where loadStats can be nil anyway? It seems like there is one instance of a zero-value ReplicaLoadCounter{}, but otherwise most usages instantiate with NewReplicaLoadCounter already. This discrepancy maybe seems a bit subtle? Feels like maybe a nil *ReplicaLoadCounter should be used instead of a zero-value ReplicaLoadCounter with potentially nil fields that you have to check several places.

Fair point here. I've updated to remove these checks and always return an instantiated ReplicaLoadCounter.

There's a TODO ontop of the load.go file which summarizes some changes that would include refining this interaction.

@kvoli
Copy link
Contributor Author

kvoli commented Jan 28, 2023

Bors r=alextalks

@craig
Copy link
Contributor

craig bot commented Jan 28, 2023

Build succeeded:

@craig craig bot merged commit 502acb0 into cockroachdb:master Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

asim: validate unified rebalancing asim: validate range count rebalancing

3 participants