-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919
Conversation
cf5c211
to
5c38983
Compare
"kv.rangefeed.catchup_scan_concurrency", | ||
"number of catchup scans that a single rangefeed can execute concurrently; 0 implies unlimited", | ||
8, | ||
"kv.rangefeed.catchup_scan_duration_estimate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of framing this in terms of a catchup scan estimate, I'd just frame it directly as the rangefeed client rate. It's hard to come up with a general catchup scan estimate, because it depends on a lot of factors, and will often vary even between different rangefeeds on the same cluster.
if err != nil { | ||
return err | ||
} | ||
metrics.RangefeedCatchupRanges.Inc(1) | ||
a.catchupRes = func() { | ||
metrics.RangefeedCatchupRanges.Dec(1) | ||
res.Release() | ||
alloc.Return() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return this? This subverts the rate limiting, since we're saying this request failed and should not count towards the rate limit.
var catchupScanBurst = settings.RegisterIntSetting( | ||
settings.TenantWritable, | ||
"kv.rangefeed.max_catchup_scans", | ||
"maximum number of ranges that may run concurrently; 0 implies unlimited", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really what it means -- this is only true if the catchup scan estimate is equal to or below the actual catchup scan duration, not if it's above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @erikgrinaker, and @rachitgsrivastava)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 69 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Instead of framing this in terms of a catchup scan estimate, I'd just frame it directly as the rangefeed client rate. It's hard to come up with a general catchup scan estimate, because it depends on a lot of factors, and will often vary even between different rangefeeds on the same cluster.
Thoughts on this new name?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 79 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
This isn't really what it means -- this is only true if the catchup scan estimate is equal to or below the actual catchup scan duration, not if it's above.
Naming is darn hard; I wanted to avoid burst; but... you know, I think I'll use that instead. If it sounds too technical, then that's the goal.
Thoughts on this new name?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 714 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Why do we return this? This subverts the rate limiting, since we're saying this request failed and should not count towards the rate limit.
Ahh.. I see; you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 8 files at r1, 2 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 69 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Thoughts on this new name?
Well, this isn't really correct either.
The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.
If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.
The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.
Do we even need or want burst here, instead of just a smooth rate?
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 79 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Naming is darn hard; I wanted to avoid burst; but... you know, I think I'll use that instead. If it sounds too technical, then that's the goal.
Thoughts on this new name?
Well, this isn't really correct either.
The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.
If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we also set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.
The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.
Do we even need or want burst here, instead of just a smooth rate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 79 at r1 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Well, this isn't really correct either.
The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.
If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we also set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.
The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.
Do we even need or want burst here, instead of just a smooth rate?
Whoops, didn't mean to post this twice, was trying to move it to the other thread.
Kinda what I wanted to do...
I don't know if we need burst; Just going to set 100/sec rate, 0 burst. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kinda what I wanted to do...
Ah ok, I thought there might be a misunderstanding about how the rate limiter worked. Anyway, I don't have a good sense of what the optimal strategy is (might have performance implications), but since we want to reduce Go scheduler tail latencies it seems like we want to avoid bursts -- although I'm not sure it matters until we go past O(1000) goroutines anyway, and I guess the burst doesn't really make much of a difference either way in that case. This is simpler though.
Thanks!
Reviewed 1 of 8 files at r1, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)
pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go
line 70 at r4 (raw file):
settings.TenantWritable, "kv.rangefeed.client.stream_startup_rate", "controls the rate the client will initiate new rangefeed stream for a single range; 0 implies unlimited",
nit: per-second rate
The catchup scan limit was added in cockroachdb#77725 in order to attempt to restrict a single rangefeed from consuming all catchup scan slots on the KV side. This client side limit has been largely ineffective. More recently, this semaphore has been coopted in cockroachdb#109346 in order to pace goroutine creation rate on the client. This functionality is important, but the implementation is not very good. Namely, we are interested in controling the rate of new catchup scans being started by the client (and thus, control the rate of goroutine creation). This implementation replaces the old implementation with rate limit based approach. The rate limits are configured using `kv.rangefeed.client.stream_startup_rate` setting which sets the rate on the number of newly established rangefeed streams (default 100). Closes cockroachdb#110439 Epic: CRDB-26372 Release note: None
bors r+ |
Build succeeded: |
Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None
Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None
Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None
111312: kvcoord: Fix mux rangefeed startup deadlock r=miretskiy a=miretskiy Previous PR #110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes #111111 Fixes #111165 Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None
Re-introduce catchup scan semaphore limit, removed by cockroachdb#110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes cockroachdb#113489 Release note: None
113966: kvcoord: Reintroduce catchup scan semaphore for regular rangefeed r=miretskiy a=miretskiy Re-introduce catchup scan semaphore limit, removed by #110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes #113489 Release note: None 114000: colfetcher: disable metamorphic randomization for direct scans r=yuzefovich a=yuzefovich This commit makes it so that we no longer - for now - use metamorphic randomization for the default value of `sql.distsql.direct_columnar_scans.enabled` cluster setting that controls whether the direct columnar scans (aka "KV projection pushdown") is enabled. It appears that we might be missing some memory accounting in the local fast path of this feature, and some backup-related roachtests run into OOMs with binaries with "enabled assertions". Disabling this metamorphization for now seems good to silence failures in case of this now-known issue. Informs: #113816 Epic: None Release note: None 114026: kvnemesis: bump default steps to 100 r=erikgrinaker a=erikgrinaker 50 steps is usually too small to trigger interesting behaviors. Bump it to 100, which is still small enough to be easily debuggable. The nightlies already run with 1000 steps. Epic: none Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
The catchup scan limit was added in #77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side. This client side limit has
been largely ineffective.
More recently, this semaphore has been coopted in #109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.
Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation). This implementation replaces the old implementation
with rate limit based approach. The rate limits are configured using
kv.rangefeed.client.stream_startup_rate
setting which setsthe rate on the number of newly established rangefeed
streams (default 100).
Closes #110439
Epic: CRDB-26372
Release note: None