Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeed: tune catchup scan semaphores #110439

Closed
erikgrinaker opened this issue Sep 12, 2023 · 4 comments · Fixed by #110919
Closed

rangefeed: tune catchup scan semaphores #110439

erikgrinaker opened this issue Sep 12, 2023 · 4 comments · Fixed by #110919
Assignees
Labels
A-kv-rangefeed Rangefeed infrastructure, server+client C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Sep 12, 2023

We have both client-side and server-side catchup scan semaphores. It isn't clear that these are tuned optimally, or even that we want them both on the client and server side. We should reconsider these.

Jira issue: CRDB-31428

Epic CRDB-26372

@erikgrinaker erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication A-kv-rangefeed Rangefeed infrastructure, server+client labels Sep 12, 2023
@blathers-crl
Copy link

blathers-crl bot commented Sep 12, 2023

cc @cockroachdb/replication

@miretskiy
Copy link
Contributor

I think we should reconsider client side ones -- at least.
We are currently using client side semaphore to pace goroutine creation -- we should instead use something
like quota pool, or just simple timer instead to control the rate.

I think the many changes that happen on KV side -- from lock table, to use of TBI iterator, make the client side
throttling unnecessary for the purpose of limiting the number of concurrent catchup scans.

@erikgrinaker
Copy link
Contributor Author

On the client side, I guess we could read off what the server-side semaphore size is, and have a per-node semaphore of the same size. It might overshoot in the case of many clients, but that doesn't matter as long as the size is small.

@aliher1911
Copy link
Contributor

Reading server side size is a bit iffy. We have server side limiter which is per store. You could also be competing with other changefeeds.
And if we remove client side limits what I think could go wrong is that if we create a rangefeed with lots of ranges, we could block the limiter by lots of queued rangefeed start attempts.

miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 19, 2023
The catchup scan limit was added in cockroachdb#77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side.  This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in cockroachdb#109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation).  This implementation replaces the old implementation
with rate limit based approach.  The rate limits are configured using
two new settings: `kv.rangefeed.max_catchup_scans` which sets
the maximum burst rate of the number of currently running
catchup scans (default 180), and a
`kv.rangefeed.catchup_scan_duration_estimate` setting which
sets the expected duration of a single catchup scan.
Taken together, these settings by default, will limit
the number of catchup scans to ~60/second.

Closes cockroachdb#110439
Epic: CRDB-26372

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 20, 2023
The catchup scan limit was added in cockroachdb#77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side.  This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in cockroachdb#109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation).  This implementation replaces the old implementation
with rate limit based approach.  The rate limits are configured using
two new settings: `kv.rangefeed.client.startup_range_burst` which sets
the maximum burst rate on the number of newly established rangefeed
streams (default 180), and a `kv.rangefeed.client.startup_window` setting
which sets the window period over which "burst" rangefeed connections
may be established.

Closes cockroachdb#110439
Epic: CRDB-26372

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 21, 2023
The catchup scan limit was added in cockroachdb#77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side.  This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in cockroachdb#109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation).  This implementation replaces the old implementation
with rate limit based approach.  The rate limits are configured using
two new settings: `kv.rangefeed.client.startup_range_burst` which sets
the maximum burst rate on the number of newly established rangefeed
streams (default 180), and a `kv.rangefeed.client.startup_window` setting
which sets the window period over which "burst" rangefeed connections
may be established.

Closes cockroachdb#110439
Epic: CRDB-26372

Release note: None
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Sep 21, 2023
The catchup scan limit was added in cockroachdb#77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side.  This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in cockroachdb#109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation).  This implementation replaces the old implementation
with rate limit based approach.  The rate limits are configured using
`kv.rangefeed.client.stream_startup_rate` setting which sets
the rate on the number of newly established rangefeed
streams (default 100).

Closes cockroachdb#110439
Epic: CRDB-26372

Release note: None
craig bot pushed a commit that referenced this issue Sep 21, 2023
110919: kvcoord: Replace rangefeed catchup semaphore with rate limiter r=miretskiy a=miretskiy

The catchup scan limit was added in #77725 in order to attempt
to restrict a single rangefeed from consuming all catchup
scan slots on the KV side.  This client side limit has
been largely ineffective.

More recently, this semaphore has been coopted in #109346 in order
to pace goroutine creation rate on the client. This functionality
is important, but the implementation is not very good.

Namely, we are interested in controling the rate of new catchup scans
being started by the client (and thus, control the rate of goroutine
creation).  This implementation replaces the old implementation
with rate limit based approach.  The rate limits are configured using
`kv.rangefeed.client.stream_startup_rate` setting which sets
the rate on the number of newly established rangefeed
streams (default 100).

Closes #110439
Epic: CRDB-26372

Release note: None

110929: dev: fix cross builds running in git worktrees r=rickystewart a=liamgillies

Beforehand, running a cross build inside a git worktree would always fail. This PR adds code to also mount the main git repo into docker when in a git worktree, so that running cross builds will succeed.

Fixes: #110735
Release note: None

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Liam Gillies <liam@cockroachlabs.com>
@craig craig bot closed this as completed in db12b54 Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-rangefeed Rangefeed infrastructure, server+client C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants