kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919

miretskiy · 2023-09-19T17:36:34Z

The catchup scan limit was added in #77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side. This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in #109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation). This implementation replaces the old implementation
with rate limit based approach. The rate limits are configured using
kv.rangefeed.client.stream_startup_rate setting which sets
the rate on the number of newly established rangefeed
streams (default 100).

Closes #110439
Epic: CRDB-26372

Release note: None

cockroach-teamcity · 2023-09-19T17:36:44Z

This change is

erikgrinaker · 2023-09-20T08:51:16Z

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go

-	"kv.rangefeed.catchup_scan_concurrency",
-	"number of catchup scans that a single rangefeed can execute concurrently; 0 implies unlimited",
-	8,
+	"kv.rangefeed.catchup_scan_duration_estimate",


Instead of framing this in terms of a catchup scan estimate, I'd just frame it directly as the rangefeed client rate. It's hard to come up with a general catchup scan estimate, because it depends on a lot of factors, and will often vary even between different rangefeeds on the same cluster.

erikgrinaker · 2023-09-20T08:53:57Z

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go

 	if err != nil {
 		return err
 	}
 	metrics.RangefeedCatchupRanges.Inc(1)
 	a.catchupRes = func() {
 		metrics.RangefeedCatchupRanges.Dec(1)
-		res.Release()
+		alloc.Return()


Why do we return this? This subverts the rate limiting, since we're saying this request failed and should not count towards the rate limit.

erikgrinaker · 2023-09-20T08:59:55Z

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go

+var catchupScanBurst = settings.RegisterIntSetting(
+	settings.TenantWritable,
+	"kv.rangefeed.max_catchup_scans",
+	"maximum number of ranges that may run concurrently; 0 implies unlimited",


This isn't really what it means -- this is only true if the catchup scan estimate is equal to or below the actual catchup scan duration, not if it's above.

miretskiy

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @erikgrinaker, and @rachitgsrivastava)

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 69 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Instead of framing this in terms of a catchup scan estimate, I'd just frame it directly as the rangefeed client rate. It's hard to come up with a general catchup scan estimate, because it depends on a lot of factors, and will often vary even between different rangefeeds on the same cluster.

Thoughts on this new name?

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 79 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

This isn't really what it means -- this is only true if the catchup scan estimate is equal to or below the actual catchup scan duration, not if it's above.

Naming is darn hard; I wanted to avoid burst; but... you know, I think I'll use that instead. If it sounds too technical, then that's the goal.
Thoughts on this new name?

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 714 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Why do we return this? This subverts the rate limiting, since we're saying this request failed and should not count towards the rate limit.

Ahh.. I see; you're right.

erikgrinaker

Reviewed 3 of 8 files at r1, 2 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 69 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Thoughts on this new name?

Well, this isn't really correct either.

The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.

If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.

The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.

Do we even need or want burst here, instead of just a smooth rate?

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 79 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Naming is darn hard; I wanted to avoid burst; but... you know, I think I'll use that instead. If it sounds too technical, then that's the goal.
Thoughts on this new name?

Well, this isn't really correct either.

The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.

If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we also set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.

The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.

Do we even need or want burst here, instead of just a smooth rate?

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 79 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Well, this isn't really correct either.

The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.

If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we also set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.

The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.

Do we even need or want burst here, instead of just a smooth rate?

Whoops, didn't mean to post this twice, was trying to move it to the other thread.

miretskiy · 2023-09-21T13:34:10Z

Reviewed 3 of 8 files at r1, 2 of 4 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 69 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…
Well, this isn't really correct either.

The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.

If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.

The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.

Do we even need or want burst here, instead of just a smooth rate?

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 79 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Naming is darn hard; I wanted to avoid burst; but... you know, I think I'll use that instead. If it sounds too technical, then that's the goal.
Thoughts on this new name?

Well, this isn't really correct either.

The rate limiter is a token bucket. The rate specifies the rate at which tokens are added to the bucket. Burst specifies the size of the bucket.

If we set rate to 1 per second, with 0 burst, then we will spawn 60 per minute (1 each second at a fixed rate). If we also set burst to 100 then we'll spawn 100 immediately (without a rate limit), then 1 per second after that (because we never allow the bucket to fill back up, we just immediately consume the tokens as they're added). The rate and burst are independent parameters.

The way it's set up here, we'll spawn 180 immediately, then 60 per second after that.

Kinda what I wanted to do...

Do we even need or want burst here, instead of just a smooth rate?

I don't know if we need burst; Just going to set 100/sec rate, 0 burst.

erikgrinaker

Kinda what I wanted to do...

Ah ok, I thought there might be a misunderstanding about how the rate limiter worked. Anyway, I don't have a good sense of what the optimal strategy is (might have performance implications), but since we want to reduce Go scheduler tail latencies it seems like we want to avoid bursts -- although I'm not sure it matters until we go past O(1000) goroutines anyway, and I guess the burst doesn't really make much of a difference either way in that case. This is simpler though.

Thanks!

Reviewed 1 of 8 files at r1, 4 of 4 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @DarrylWong, @miretskiy, and @rachitgsrivastava)

pkg/kv/kvclient/kvcoord/dist_sender_rangefeed.go line 70 at r4 (raw file):

	settings.TenantWritable,
	"kv.rangefeed.client.stream_startup_rate",
	"controls the rate the client will initiate new rangefeed stream for a single range; 0 implies unlimited",

nit: per-second rate

The catchup scan limit was added in cockroachdb#77725 in order to attempt to restrict a single rangefeed from consuming all catchup scan slots on the KV side. This client side limit has been largely ineffective. More recently, this semaphore has been coopted in cockroachdb#109346 in order to pace goroutine creation rate on the client. This functionality is important, but the implementation is not very good. Namely, we are interested in controling the rate of new catchup scans being started by the client (and thus, control the rate of goroutine creation). This implementation replaces the old implementation with rate limit based approach. The rate limits are configured using `kv.rangefeed.client.stream_startup_rate` setting which sets the rate on the number of newly established rangefeed streams (default 100). Closes cockroachdb#110439 Epic: CRDB-26372 Release note: None

miretskiy · 2023-09-21T16:42:54Z

bors r+

craig · 2023-09-21T17:23:35Z

Build succeeded:

Bazel Essential CI (Cockroach)

Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None

111312: kvcoord: Fix mux rangefeed startup deadlock r=miretskiy a=miretskiy Previous PR #110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes #111111 Fixes #111165 Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>

Previous PR cockroachdb#110919 modified rangefeed startup logic to rely on rate limit, instead of a semaphore. The issue exposed by catchup scan rate limiter is that it allowed many more (100/sec vs 8 prior) catchup scans to be started. If the range resides on a local node, a local "bypass" rpc is created instead (rpc/context.go) and this RPC bypass is implemented via buffered channels. The deadlock occurs when the client attempts to send request to the server, while holding the lock, but blocks (because channel is full -- i.e. we have sent 128 outstanding requests), and the server blocks for the same reason because the client mux goroutine is blocked attempting to acquire the lock to lookup the stream for the rangefeed message. The fix moves the state shared by the sender and the consumer -- namely the stream map -- outside of the sender lock. This fix adds a test that reliably fails under deadlock detection. Fixes cockroachdb#111111 Fixes cockroachdb#111165 Release note: None

Re-introduce catchup scan semaphore limit, removed by cockroachdb#110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes cockroachdb#113489 Release note: None

113966: kvcoord: Reintroduce catchup scan semaphore for regular rangefeed r=miretskiy a=miretskiy Re-introduce catchup scan semaphore limit, removed by #110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes #113489 Release note: None 114000: colfetcher: disable metamorphic randomization for direct scans r=yuzefovich a=yuzefovich This commit makes it so that we no longer - for now - use metamorphic randomization for the default value of `sql.distsql.direct_columnar_scans.enabled` cluster setting that controls whether the direct columnar scans (aka "KV projection pushdown") is enabled. It appears that we might be missing some memory accounting in the local fast path of this feature, and some backup-related roachtests run into OOMs with binaries with "enabled assertions". Disabling this metamorphization for now seems good to silence failures in case of this now-known issue. Informs: #113816 Epic: None Release note: None 114026: kvnemesis: bump default steps to 100 r=erikgrinaker a=erikgrinaker 50 steps is usually too small to trigger interesting behaviors. Bump it to 100, which is still small enough to be easily debuggable. The nightlies already run with 1000 steps. Epic: none Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>

Re-introduce catchup scan semaphore limit, removed by #110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes #113489 Release note: None

miretskiy requested review from erikgrinaker and a team September 19, 2023 17:36

miretskiy requested a review from a team as a code owner September 19, 2023 17:36

miretskiy force-pushed the catchup branch 3 times, most recently from cf5c211 to 5c38983 Compare September 19, 2023 19:08

miretskiy requested a review from a team as a code owner September 19, 2023 19:08

miretskiy requested review from rachitgsrivastava and DarrylWong and removed request for a team September 19, 2023 19:08

erikgrinaker reviewed Sep 20, 2023

View reviewed changes

miretskiy force-pushed the catchup branch from 5c38983 to 91a78db Compare September 20, 2023 13:46

miretskiy requested a review from erikgrinaker September 20, 2023 13:47

miretskiy commented Sep 20, 2023

View reviewed changes

erikgrinaker reviewed Sep 21, 2023

View reviewed changes

miretskiy force-pushed the catchup branch from 91a78db to 7d6dd0d Compare September 21, 2023 13:34

miretskiy force-pushed the catchup branch from 7d6dd0d to 381ea5c Compare September 21, 2023 13:35

erikgrinaker approved these changes Sep 21, 2023

View reviewed changes

miretskiy force-pushed the catchup branch from 381ea5c to cdb27a5 Compare September 21, 2023 14:54

miretskiy force-pushed the catchup branch from cdb27a5 to db12b54 Compare September 21, 2023 14:57

craig bot merged commit 15ad989 into cockroachdb:master Sep 21, 2023

miretskiy mentioned this pull request Sep 25, 2023

roachtest: cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100k/protocol=mux/format=json/sink=null failed #111111

Closed

miretskiy mentioned this pull request Sep 26, 2023

kvcoord: Fix mux rangefeed startup deadlock #111312

Merged

miretskiy mentioned this pull request Nov 7, 2023

kvcoord: Reintroduce catchup scan semaphore for regular rangefeed #113966

Merged

blathers-crl bot mentioned this pull request Nov 8, 2023

release-23.2: kvcoord: Reintroduce catchup scan semaphore for regular rangefeed #114063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919

kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919

miretskiy commented Sep 19, 2023 •

edited

Loading

cockroach-teamcity commented Sep 19, 2023

erikgrinaker Sep 20, 2023

erikgrinaker Sep 20, 2023

erikgrinaker Sep 20, 2023

miretskiy left a comment

erikgrinaker left a comment

erikgrinaker left a comment

miretskiy commented Sep 21, 2023

erikgrinaker left a comment

miretskiy commented Sep 21, 2023

craig bot commented Sep 21, 2023

kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919

kvcoord: Replace rangefeed catchup semaphore with rate limiter #110919

Conversation

miretskiy commented Sep 19, 2023 • edited Loading

cockroach-teamcity commented Sep 19, 2023

erikgrinaker Sep 20, 2023

Choose a reason for hiding this comment

erikgrinaker Sep 20, 2023

Choose a reason for hiding this comment

erikgrinaker Sep 20, 2023

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

miretskiy commented Sep 21, 2023

erikgrinaker left a comment

Choose a reason for hiding this comment

miretskiy commented Sep 21, 2023

craig bot commented Sep 21, 2023

miretskiy commented Sep 19, 2023 •

edited

Loading