Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission: improvements to KV admission #65957

Closed
3 of 5 tasks
sumeerbhola opened this issue Jun 1, 2021 · 5 comments
Closed
3 of 5 tasks

admission: improvements to KV admission #65957

sumeerbhola opened this issue Jun 1, 2021 · 5 comments
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Jun 1, 2021

Misc TODOs/limitations in KV admission control:

  • Most work initiated from within KV bypasses admission control. This is not appropriate especially for background work like GC, backups etc. Fixing this mainly requires finding the places in the code where the low priority work originates and plumbing the appropriate AdmissionHeader. (Export, GC are now subject to admission control kvserver,backupccl: make export, gc subject to admission control #71109, but there are TODOs regarding priority assignment)
  • There is no KVLowPriWork WorkKind, which means even low-priority KV work will get prioritized over user-facing SQL processing. (Edit: with the elastic classification, this is no longer true, since low-priority KV work goes through a different grant coordinator).
  • The initial scan for RangeFeedRequest is not subject to admission control.
  • Read-only work is throttled when storage engines are overloaded due to writes. We need a way to distinguish work that includes writes, and make it pass through an additional WorkQueue that is hooked up to an Granter that only uses write overload as a resource signal.
  • When there are multiple stores, a single engine (for a particular store) being overloaded due to writes will cause work to be throttled for all stores.

Jira issue: CRDB-7813

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Jun 1, 2021
@ajwerner
Copy link
Contributor

Should we do something about ExportRequest and AddSSTable?

@sumeerbhola
Copy link
Collaborator Author

Should we do something about ExportRequest and AddSSTable?

Yes. Since this is happening through InternalServer.Batch, it should be part of the first bullet.

@ajwerner
Copy link
Contributor

Cool. For context, I'm raise this in the context of the recent issues involving interactions between the bespoke ExportRequestLimiter and its bad interactions with latching. I suspect we wouldn't have gotten ourselves into that pickle if we were thinking about admission control in a more centralized way.

// ExportRequestsLimit is the number of Export requests that can run at once.
// Each extracts data from RocksDB to a temp file and then uploads it to cloud
// storage. In order to not exhaust the disk or memory, or saturate the network,
// limit the number of these that can be run in parallel. This number was chosen
// by a guessing - it could be improved by more measured heuristics. Exported
// here since we check it in in the caller to limit generated requests as well
// to prevent excessive queuing.
var ExportRequestsLimit = settings.RegisterIntSetting(
"kv.bulk_io_write.concurrent_export_requests",
"number of export requests a store will handle concurrently before queuing",
3,
settings.PositiveInt,
)

@sumeerbhola
Copy link
Collaborator Author

interactions between the bespoke ExportRequestLimiter and its bad interactions with latching

Yep, I noticed the silliness with rate limiting after acquiring latches (when it got fixed). I don't think admission control would have helped -- since the CPU would not be saturated, it would allow low priority requests through. It is not aware that latches are a resource too, and that reducing latch contention due to low priority traffic will allow high priority traffic to get higher throughput. Modeling latches as a resource would be hard since we'd need to track contended spans.

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 16, 2021
…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Jul 16, 2021
…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 6, 2021
…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 9, 2021
…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs cockroachdb#65957

Release note: None
craig bot pushed a commit that referenced this issue Aug 9, 2021
67717: admission,server: scope store overload admission control to write ope… r=sumeerbhola a=sumeerbhola

…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs #65957

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
sajjadrizvi pushed a commit to sajjadrizvi/cockroach that referenced this issue Aug 10, 2021
…rations on the overloaded store

Previously KVWork was all admitted through a single WorkQueue, and this
WorkQueue was paired with a GrantCoordinator that took into account
both CPU overload and storage overload. This meant a single overloaded
store (in a multi-store setting) would slow down all KV operations
including reads, and those on stores that were fine.

We now have multiple WorkQueue's, one per store, that KV writes go
through, and then get to the shared WorkQueue that all operations go
through. The per-store WorkQueues are associated with their own
GrantCoordinators that listen to health state for their respective
stores. The per-store queues do not care about CPU and therefore do not
do grant chaining. Since admission through these per-store queues
happens before the shared queue, a store bottleneck will not cause KV
slots in the shared queue to be taken by requests that are still waiting
for admission elsewhere. The reverse can happen, and is considered
acceptable -- per-store tokens, when a store is overloaded, can be used
by requests that are now waiting for admission in the shared WorkQueue
because of a cpu bottleneck.

The code is significantly refactored for the above:
NewGrantCoordinators returns a container called StoreGrantCoordinators
which lazily initializes the relevant per-store GrantCoordinators when
it first fetches Pebble metrics, in addition to the shared
GrantCoordinator. The server code now integrates with both types and
the code in Node.Batch will sometimes subject a request to two
WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the
periodic ticking that fetches these metrics at 1min intervals, and does
1s ticks, is moved to StoreGrantCoordinators. This simplifies the
ioLoadListener which no longer does the periodic ticking and eliminates
a some testing-only abstractions. The per-store WorkQueues share the
same metrics, which represent an aggregate across these queues.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 23, 2021
Admission control will slow down writes to that store without
affecting operations on the other store.

Also fixed some related graphs on the overload dashboard
- the "IO tokens exhausted" graph should have been using a rate
- the kv-stores metrics were missing

Informs cockroachdb#65955,cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 23, 2021
Admission control will slow down writes to that store without
affecting operations on the other store.

Also fixed some related graphs on the overload dashboard
- the "IO tokens exhausted" graph should have been using a rate
- the kv-stores metrics were missing

Informs cockroachdb#65955,cockroachdb#65957

Release note: None
craig bot pushed a commit that referenced this issue Aug 23, 2021
69050: sql: Set txn deadline to session expiry for mt pods. r=ajwerner a=rimadeodhar

For ephemeral SQL pods in the multi-tenant environment,
the transaction deadline should be set to 
min(sqlliveness.Session expiration, leased descriptor duration).
Since a sql instance ID can be reclaimed by another SQL pod once
the session for the old pod has expired, we need to ensure that
all transactions for a SQL pod commit before the session for
that SQL pod expires. In single-tenant environments, the session
expiry will not be used.

Release note: None

Release justification: bug fix for new sqlinstance functionality

Resolves: [67719](#67719)

69241: tests,ui: add a multi-store test with single overloaded store r=sumeerbhola a=sumeerbhola

Admission control will slow down writes to that store without
affecting operations on the other store.

Also fixed some related graphs on the overload dashboard
- the "IO tokens exhausted" graph should have been using a rate
- the kv-stores metrics were missing

Informs #65955,#65957

Release note: None

Co-authored-by: rimadeodhar <rima@cockroachlabs.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 4, 2021
They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.LowPri since these are background activities that
should not be preferred over interactive user traffic.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 5, 2021
They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.LowPri since these are background activities that
should not be preferred over interactive user traffic.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 6, 2021
They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs cockroachdb#65957

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Oct 7, 2021
They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs cockroachdb#65957

Release note: None
craig bot pushed a commit that referenced this issue Oct 13, 2021
71109: kvserver,backupccl: make export, gc subject to admission control r=sumeerbhola a=sumeerbhola

They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs #65957

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
erikgrinaker pushed a commit to erikgrinaker/cockroach that referenced this issue Oct 13, 2021
They are now marked with AdmissionHeader_ROOT_KV, which stops
them from bypassing admission control. The priority is set to
admission.NormalPri since there are cases where they do need
to complete in a very timely manner. There are TODOs in the
integration code to make more sophisticated priority
assignments and use LowPri when possible.

Informs cockroachdb#65957

Release note: None
RaduBerinde pushed a commit to RaduBerinde/cockroach that referenced this issue Nov 9, 2021
Partial backport of:
  tests,ui: add a multi-store test with single overloaded store

  Admission control will slow down writes to that store without
  affecting operations on the other store.

  Also fixed some related graphs on the overload dashboard
  - the "IO tokens exhausted" graph should have been using a rate
  - the kv-stores metrics were missing

  Informs cockroachdb#65955,cockroachdb#65957

Release note: None
@irfansharif irfansharif changed the title admission control: improvements to KV admission admission: improvements to KV admission Oct 3, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 11, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri). Experimentally we observed that during
initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work.

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 24, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 28, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 2, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 3, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
craig bot pushed a commit that referenced this issue Nov 4, 2022
89709: kv,rangefeed: integrate catchup scans with elastic cpu r=irfansharif a=irfansharif

Part of #65957. First commit is from #89656.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In #89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in #86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri). Experimentally we observed that during
initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something #89589 can help with. We leave admission
integration of parallel worker goroutines to future work.

Unlike export requests, rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to use at most 100ms of CPU time (it makes
for an ineffective controller otherwise), and to that end we introduce
the following component used within the rangefeed catchup iterator:

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 24, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
@sumeerbhola
Copy link
Collaborator Author

Closing this old issue since statements like "Most work initiated from within KV bypasses admission control. This is not appropriate especially for background work like GC, backups etc" are stale and have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants