Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: check IO overload on lease transfer #97587

Merged
merged 1 commit into from
Mar 7, 2023

Conversation

kvoli
Copy link
Collaborator

@kvoli kvoli commented Feb 23, 2023

Previously, the allocator would return lease transfer targets without
considering the IO overload of stores involved. When leases would
transfer to the IO overloaded stores, service latency tended to degrade.

This commit adds IO overload checks prior to lease transfers. The IO
overload checks are similar to the IO overload checks for allocating
replicas in #97142.

The checks work by comparing a candidate store against
kv.allocator.lease_io_overload_threshold and the mean of other candidates.
If the candidate store is equal to or greater than both these values, it
is considered IO overloaded. The default value is 0.5.

The current leaseholder has to meet a higher bar to be considered IO
overloaded. It must have an IO overload score greater or equal to
kv.allocator.lease_shed_io_overload_threshold. The default value is
0.9.

The level of enforcement for IO overload is controlled by
kv.allocator.lease_io_overload_threshold_enforcement controls the
action taken when a candidate store for a lease transfer is IO overloaded.

  • ignore: ignore IO overload scores entirely during lease transfers
    (effectively disabling this mechanism);
  • block_transfer_to: lease transfers only consider stores that aren't
    IO overloaded (existing leases on IO overloaded stores are left as
    is);
  • shed: actively shed leases from IO overloaded stores to less IO
    overloaded stores (this is a super-set of block_transfer_to).

The default is block_transfer_to.

This commit also updates the existing replica IO overload checks to be
prefixed with Replica, to avoid confusion between lease and replica
IO overload checks.

Resolves: #96508

Release note (ops change): Range leases will no longer be transferred to
stores which are IO overloaded.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the 230222.io-transfers branch 6 times, most recently from 6f1c3ab to ce89141 Compare February 23, 2023 20:04
@kvoli kvoli self-assigned this Feb 23, 2023
@kvoli kvoli force-pushed the 230222.io-transfers branch from ce89141 to a8b6fbc Compare February 23, 2023 20:28
@kvoli kvoli marked this pull request as ready for review February 23, 2023 21:05
@kvoli kvoli requested a review from a team as a code owner February 23, 2023 21:05
@kvoli kvoli requested a review from andrewbaptist February 23, 2023 21:05
@kvoli kvoli force-pushed the 230222.io-transfers branch from a8b6fbc to 965b6db Compare February 23, 2023 22:08
@kvoli kvoli changed the title allocator: [wip] check store health on lease transfer allocator: check store health on lease transfer Feb 23, 2023
@kvoli kvoli force-pushed the 230222.io-transfers branch 3 times, most recently from 7638955 to d964813 Compare February 24, 2023 01:29
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @kvoli)


-- commits line 18 at r2:
[mega nit] Avoid the "health" verbiage as best you can. Lets make this more precise, "store health" is both vague and ambiguous; we're just checking the store's IO overload score.


-- commits line 21 at r2:
"IO overload" -> "IO overload score"


-- commits line 32 at r2:
Could you explain why the lease transfer threshold has a higher bar? Shouldn't it have a lower bar given lease transfers are cheaper, and we ought to try and stop those first before considering the more heavy weight replica transfers?


-- commits line 35 at r2:
This combination of two cluster settings seems complex -- every time we debug this we'd always have to go look at two things and sum them up. It also raises questions like "Can the delta be -ve?", and knowing the value of just one of the settings is not helpful. Why not just have one specifically for leases?


-- commits line 44 at r2:
This help text feels like it's assuming familiarity with allocator internals with its reference to "candidates" or things we're "excluding". I find it hard to use. Lets instead say something like the following (note the change in cluster setting name and enums):

We can control the lease transfer behaviour in the presence of IO overloaded stores using kv.allocator.io_overload_threshold_lease_enforcement:
- ignore: ignore IO overload scores entirely during lease transfers (effectively disabling this mechanism);
- future_lease_transfers: for future lease transfers only consider stores that aren't IO overloaded (existing leases on IO overloaded stores are left as is);
- shed_current_leases: actively shed leases from IO overloaded stores to less IO overloaded stores (this is a super-set of future_lease_transfers).

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2035 at r2 (raw file):

}

// healthyLeaseTargets returns a list of healthy replica targets and whether

[same mega nit] Please avoid this "health" moniker. We've overused this term in the allocator code, consider this code comment for example:

	// StoreFilterSuspect requests that the returned store list additionally
	// exclude stores that have been suspected as unhealthy. We dont want unhealthy
	// stores to be considered for rebalancing or for lease transfers. i.e. we dont
	// actively shift leases or replicas away from them, but we dont allow them to
	// get any new ones until they get better.

But this has nothing to do with IO overload scores -- it's got to do with liveness heartbeat blips due to intermittent connectivity or write stalls.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 77 at r1 (raw file):

	// IO overload score greater than what's set. This is typically used in
	// conjunction with IOOverloadMeanThreshold below.
	DefaultIOOverloadThreshold = 0.5

This feels like a random change, what were the motivating experiments? If you're seeing IO overload score thrashing between (0.4,1.x) and we're not reactive enough at 0.8, consider exponential smoothing instead of just lowering the threshold hoping for more sensitivity to a fluctuating signal.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 183 at r2 (raw file):

	// considered as leaseholder targets for a range, regardless of the store IO
	// overload.
	LeaseIOOverloadThresholdNoAction LeaseIOOverloadEnforcementLevel = iota

Lets re-use the same types as above; these are symmetric module the "BlockAll" vs. "Shed" naming but that seems superfluous. We can keep the separate cluster setting to control lease enforcement.

Also [nit] the log only action (here and above) feels pointless -- who's actually looking at these logs? Unless you increment a metric counter which we can scan for in a CC-like environment, this is effectively dead code. Lets get rid of it, keep a metric and log under non-zero verbosity for the "NoAction" variants. These cluster settings feels overly hedgy otherwise.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2317 at r2 (raw file):

}

func (o StoreHealthOptions) transferToStoreIsHealthy(

[same mega nit] Avoid the "healthy" verbiage. Also, this method and leaseStoreIsHealthy are identical and could be made into one by passing in a leaseholder bool parameter, right? Which we can at the callsite?


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2342 at r2 (raw file):

	// The store is only considered unhealthy when the enforcement level is
	// LeaseIOOverloadThresholdNoAction.

The comment does not match the code.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2346 at r2 (raw file):

}

func (o StoreHealthOptions) leaseStoreIsHealthy(

[same mega nit] Avoid the "healthy" verbiage.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1979 at r2 (raw file):

}

func TestAllocatorTransferLeaseTargetHealthCheck(t *testing.T) {

[same mega nit] Avoid the "healthy" verbiage.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 2033 at r2 (raw file):

			leaseholder: 5,
			expected:    3,
			enforcement: LeaseIOOverloadThresholdShed,

How does one read this subtest? The lease is on n5, with IO-overload-score=0.0 and lease-count=400, and we shed the lease to n3 with IO-overload-score=0.5 and lease-count=100?

Copy link
Contributor

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @kvoli)


-- commits line 54 at r2:
We should add a release note as it changes the behavior when leases are transferred. There are a lot of comments on the PR text already, so I'll review again once you update.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2044 at r2 (raw file):

	leaseStoreID roachpb.StoreID,
	storeHealthOpts StoreHealthOptions,
) (healthyTargets []roachpb.ReplicaDescriptor, excludeLeaseRepl bool) {

I think this is a little wrong. If the current store is unhealthy, then we shouldn't exclude any other stores due to health reasons unless they are worse than us. Otherwise if for instance, all the stores in the system are at 0.6, no lease transfers will happen even if the current leaseholder gets severely overloaded.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2198 at r2 (raw file):

	var excludeLeaseReplForHealth bool
	existing = a.ValidLeaseTargets(ctx, storePool, conf, existing, leaseRepl, opts)

Use a different variable name for existing after each check and in the log message below, print out more details about why we ended up with no valid lease targets. Tracking this down is hard even if you have the code, and even harder if you are just looking at the log. It would be nice to see that some stores were rejected because of their overload. Also I'm concerned that if all stores in the system are overloaded, then no lease transfers will happen. I'm not sure what should happen in this case, but it seems we should still correctly handle lease preference movement.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 183 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Lets re-use the same types as above; these are symmetric module the "BlockAll" vs. "Shed" naming but that seems superfluous. We can keep the separate cluster setting to control lease enforcement.

Also [nit] the log only action (here and above) feels pointless -- who's actually looking at these logs? Unless you increment a metric counter which we can scan for in a CC-like environment, this is effectively dead code. Lets get rid of it, keep a metric and log under non-zero verbosity for the "NoAction" variants. These cluster settings feels overly hedgy otherwise.

I agree with the log setting seeing unnecessary. It would only be useful during development/tuning, but at that point, you could change the vmodule setting to get what you want.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2251 at r2 (raw file):

	LeaseEnforcementLevel              LeaseIOOverloadEnforcementLevel
	IOOverloadThreshold                float64
	LeaseIOOverloadThresholdShedBuffer float64

I'm a little concerned about consistency here. We likely could add a ReplicaIOOverloadThresholdShedBuffer also in the future, but there is only one IOOverloadThreshold that is used for both. Short term it might be easier to completely separate out Lease vs Replica values as I could see the need to tune these separately. So in total there would be four values:
ReplicaIOOverloadThreshold (0.8) - value to block replica transfers
LeaseIOOverloadTheshold (0.5) - value to block lease transfers
ReplicaIOOverloadThresholdShedThreshold (2.0) - (future) value to begin shedding replicas - this should be considerably greater than 1.0 since we only want to shed when we are severely overloaded as this is expensive and it is not that bad to have replicas if we are not the leaseholder.
LeaseIOOverloadThresholdShedThreshold (0.9) - value to begin shedding leases - this should be a little less than 1.0 since we want to shed before we begin engaging AC to prevent system slowdowns.

I am not sure if I like the "buffer" vs "threshold" for shed, and I can see the argument for both. I slightly prefer threshold though as it can be bad if someone wants to adjust the threshold for overload a little and that unintentionally pushes the sum above 1.0.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2321 at r2 (raw file):

) bool {
	ioOverloadScore, _ := store.Capacity.IOThreshold.Score()
	if o.LeaseEnforcementLevel == LeaseIOOverloadThresholdNoAction ||

If you are keeping LogOnly don't you need to check it here? As Irfan mentioned, I would rather just ditch that option.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2351 at r2 (raw file):

	threshold := o.IOOverloadThreshold + o.LeaseIOOverloadThresholdShedBuffer
	ioOverloadScore, _ := store.Capacity.IOThreshold.Score()
	if o.LeaseEnforcementLevel == LeaseIOOverloadThresholdNoAction ||

If you are keeping LogOnly don't you need to check it here?

Copy link
Contributor

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 198 at r2 (raw file):

	// range leases i.e. The lease will always transfer to a healthy and valid
	// store if one exists.
	LeaseIOOverloadThresholdShed

nit: The order of these matter since there is a < check when determining whether they are enabled. That is somewhat unexpected and we could easily introduce a bug if that wasn't apparent. I would either add a big warning to note that this order is important or preferably change the code below to not only do equality comparisons on this.

@kvoli kvoli force-pushed the 230222.io-transfers branch from d964813 to 4605c90 Compare February 27, 2023 21:53
@blathers-crl blathers-crl bot requested a review from irfansharif February 27, 2023 21:54
Copy link
Collaborator Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andrewbaptist and @irfansharif)


-- commits line 18 at r2:

Previously, irfansharif (irfan sharif) wrote…

[mega nit] Avoid the "health" verbiage as best you can. Lets make this more precise, "store health" is both vague and ambiguous; we're just checking the store's IO overload score.

Fair enough. I've removed most references to it I could find. My initial reasoning for using "health" was that I envisaged the enforcement level being general and adding more signals that could classify a store as unhealthy from the storeHealthOptions perspective.

This hasn't really come to fruition as we spent time on other stuff and we moved (some of this stuff) into the balancing signal instead, since it is less prone to thrashing.


-- commits line 21 at r2:

Previously, irfansharif (irfan sharif) wrote…

"IO overload" -> "IO overload score"

Updated.


-- commits line 32 at r2:

Previously, irfansharif (irfan sharif) wrote…

Could you explain why the lease transfer threshold has a higher bar? Shouldn't it have a lower bar given lease transfers are cheaper, and we ought to try and stop those first before considering the more heavy weight replica transfers?

None of this code causes replica transfers, only filtering out stores as candidates.

The lease transfer threshold causes a lease transfer.


-- commits line 35 at r2:

Previously, irfansharif (irfan sharif) wrote…

This combination of two cluster settings seems complex -- every time we debug this we'd always have to go look at two things and sum them up. It also raises questions like "Can the delta be -ve?", and knowing the value of just one of the settings is not helpful. Why not just have one specifically for leases?

I made it a buffer rather than a hard setting for the reason you mentioned above - I don't want it to be easy to set the shed threshold lower than the "blocking" threshold. The negative value concern seems relevant to any cluster setting which doesn't have validation.

I'm happy to change it if you feel strongly.


-- commits line 44 at r2:

Previously, irfansharif (irfan sharif) wrote…

This help text feels like it's assuming familiarity with allocator internals with its reference to "candidates" or things we're "excluding". I find it hard to use. Lets instead say something like the following (note the change in cluster setting name and enums):

We can control the lease transfer behaviour in the presence of IO overloaded stores using kv.allocator.io_overload_threshold_lease_enforcement:
- ignore: ignore IO overload scores entirely during lease transfers (effectively disabling this mechanism);
- future_lease_transfers: for future lease transfers only consider stores that aren't IO overloaded (existing leases on IO overloaded stores are left as is);
- shed_current_leases: actively shed leases from IO overloaded stores to less IO overloaded stores (this is a super-set of future_lease_transfers).

I agree the references could be improved. I've updated these. I don't think I agree on using future_lease_transfers or shed_current_leases. The setting itself has lease in the name, it seems unnecessary to repeat that when you enter a value for the cluster setting.

I've updated the commit message to be clearer and changed the setting names to be consistent along the lines of

ignore
block_transfer_to
shed


-- commits line 54 at r2:

Previously, andrewbaptist (Andrew Baptist) wrote…

We should add a release note as it changes the behavior when leases are transferred. There are a lot of comments on the PR text already, so I'll review again once you update.

Added


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2035 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[same mega nit] Please avoid this "health" moniker. We've overused this term in the allocator code, consider this code comment for example:

	// StoreFilterSuspect requests that the returned store list additionally
	// exclude stores that have been suspected as unhealthy. We dont want unhealthy
	// stores to be considered for rebalancing or for lease transfers. i.e. we dont
	// actively shift leases or replicas away from them, but we dont allow them to
	// get any new ones until they get better.

But this has nothing to do with IO overload scores -- it's got to do with liveness heartbeat blips due to intermittent connectivity or write stalls.

Removed all healthy refs


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2044 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

I think this is a little wrong. If the current store is unhealthy, then we shouldn't exclude any other stores due to health reasons unless they are worse than us. Otherwise if for instance, all the stores in the system are at 0.6, no lease transfers will happen even if the current leaseholder gets severely overloaded.

We discussed over slack on this. For other reviewers - the mean check is baked into any call to healthyTargets/leaseIsHealthy. So this shouldn't occur.


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2198 at r2 (raw file):

Also I'm concerned that if all stores in the system are overloaded, then no lease transfers will happen. I'm not sure what should happen in this case, but it seems we should still correctly handle lease preference movement.

Lease preferences get filtered out before calling this fn. So we calc the avg based only on equally preferable and valid replicas. Since a store needs to be >1.1 * mean, if there's just one preferable replica it would never fail the io overload check.

Use a different variable name for existing after each check and in the log message below, print out more details about why we ended up with no valid lease targets. Tracking this down is hard even if you have the code, and even harder if you are just looking at the log.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 77 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This feels like a random change, what were the motivating experiments? If you're seeing IO overload score thrashing between (0.4,1.x) and we're not reactive enough at 0.8, consider exponential smoothing instead of just lowering the threshold hoping for more sensitivity to a fluctuating signal.

I removed this commit. There's still some stuff to do w.r.t smoothing - we should probably smooth at the source and then gossip.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 183 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

I agree with the log setting seeing unnecessary. It would only be useful during development/tuning, but at that point, you could change the vmodule setting to get what you want.

Removed log only.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 198 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

nit: The order of these matter since there is a < check when determining whether they are enabled. That is somewhat unexpected and we could easily introduce a bug if that wasn't apparent. I would either add a big warning to note that this order is important or preferably change the code below to not only do equality comparisons on this.

Removed <> and moved to equality checks.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2251 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

I'm a little concerned about consistency here. We likely could add a ReplicaIOOverloadThresholdShedBuffer also in the future, but there is only one IOOverloadThreshold that is used for both. Short term it might be easier to completely separate out Lease vs Replica values as I could see the need to tune these separately. So in total there would be four values:
ReplicaIOOverloadThreshold (0.8) - value to block replica transfers
LeaseIOOverloadTheshold (0.5) - value to block lease transfers
ReplicaIOOverloadThresholdShedThreshold (2.0) - (future) value to begin shedding replicas - this should be considerably greater than 1.0 since we only want to shed when we are severely overloaded as this is expensive and it is not that bad to have replicas if we are not the leaseholder.
LeaseIOOverloadThresholdShedThreshold (0.9) - value to begin shedding leases - this should be a little less than 1.0 since we want to shed before we begin engaging AC to prevent system slowdowns.

I am not sure if I like the "buffer" vs "threshold" for shed, and I can see the argument for both. I slightly prefer threshold though as it can be bad if someone wants to adjust the threshold for overload a little and that unintentionally pushes the sum above 1.0.

I've roughly adopted this - I haven't added the replica shed threshold.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2317 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[same mega nit] Avoid the "healthy" verbiage. Also, this method and leaseStoreIsHealthy are identical and could be made into one by passing in a leaseholder bool parameter, right? Which we can at the callsite?

removed.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2321 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

If you are keeping LogOnly don't you need to check it here? As Irfan mentioned, I would rather just ditch that option.

Going to ditch the log only.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2342 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

The comment does not match the code.

Updated.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2346 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[same mega nit] Avoid the "healthy" verbiage.

removed.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go line 2351 at r2 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

If you are keeping LogOnly don't you need to check it here?

removed log only.


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 1979 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

[same mega nit] Avoid the "healthy" verbiage.

removed


pkg/kv/kvserver/allocator/allocatorimpl/allocator_test.go line 2033 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

How does one read this subtest? The lease is on n5, with IO-overload-score=0.0 and lease-count=400, and we shed the lease to n3 with IO-overload-score=0.5 and lease-count=100?

The IO score is below the average so it is not excluded as a candidate. It has the least leases of all candidates.

Previously, the allocator would return lease transfer targets without
considering the IO overload of stores involved. When leases would
transfer to the IO overloaded stores, service latency tended to degrade.

This commit adds IO overload checks prior to lease transfers. The IO
overload checks are similar to the IO overload checks for allocating
replicas in cockroachdb#97142.

The checks work by comparing a candidate store against
`kv.allocator.lease_io_overload_threshold` and the mean of other candidates.
If the candidate store is equal to or greater than both these values, it
is considered IO overloaded. The default value is 0.5.

The current leaseholder has to meet a higher bar to be considered IO
overloaded. It must have an IO overload score greater or equal to
`kv.allocator.lease_shed_io_overload_threshold`. The default value is
0.9.

The level of enforcement for IO overload is controlled by
`kv.allocator.lease_io_overload_threshold_enforcement` controls the
action taken when a candidate store for a lease transfer is IO overloaded.

- `ignore`: ignore IO overload scores entirely during lease transfers
  (effectively disabling this mechanism);
- `block_transfer_to`: lease transfers only consider stores that aren't
  IO overloaded (existing leases on IO overloaded stores are left as
  is);
- `shed`: actively shed leases from IO overloaded stores to less IO
  overloaded stores (this is a super-set of block_transfer_to).

The default is `block_transfer_to`.

This commit also updates the existing replica IO overload checks to be
prefixed with `Replica`, to avoid confusion between lease and replica
IO overload checks.

Resolves: cockroachdb#96508

Release note (ops change): Range leases will no longer be transferred to
stores which are IO overloaded.
@kvoli kvoli force-pushed the 230222.io-transfers branch from 4605c90 to 3f1263d Compare February 27, 2023 22:09
@kvoli kvoli requested a review from andrewbaptist February 27, 2023 22:09
@kvoli kvoli changed the title allocator: check store health on lease transfer allocator: check IO overload on lease transfer Mar 6, 2023
Copy link
Contributor

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I'd love to see some of the results of running this on the index backfill test with shed enabled. Once I get back to fixing the TPC-E stuff please rerun and post the results on the story. It might help make a decision for the correct default. Any other tests that cause LSM inversion would be good to run also.

Reviewed 1 of 7 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @kvoli)


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2044 at r2 (raw file):

Previously, kvoli (Austen) wrote…

We discussed over slack on this. For other reviewers - the mean check is baked into any call to healthyTargets/leaseIsHealthy. So this shouldn't occur.

Ack


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 2198 at r2 (raw file):

Previously, kvoli (Austen) wrote…

Also I'm concerned that if all stores in the system are overloaded, then no lease transfers will happen. I'm not sure what should happen in this case, but it seems we should still correctly handle lease preference movement.

Lease preferences get filtered out before calling this fn. So we calc the avg based only on equally preferable and valid replicas. Since a store needs to be >1.1 * mean, if there's just one preferable replica it would never fail the io overload check.

Use a different variable name for existing after each check and in the log message below, print out more details about why we ended up with no valid lease targets. Tracking this down is hard even if you have the code, and even harder if you are just looking at the log.

Ack

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 6, 2023

@irfansharif did you have any other comments?

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 7, 2023

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

👎 Rejected by code reviews

@kvoli kvoli dismissed irfansharif’s stale review March 7, 2023 15:32

Discussed over slack.

@kvoli
Copy link
Collaborator Author

kvoli commented Mar 7, 2023

bors r=andrewbaptist

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 7, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: consider io overload for lease transfers
4 participants