Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvfeed: fix scanRequestScanner.exportScan integration with admission control #88733

Closed
sumeerbhola opened this issue Sep 26, 2022 · 1 comment
Assignees
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Sep 26, 2022

scanRequestScanner.exportScan does txn := p.db.NewTxn(ctx, "changefeed backfill") which creates a txn with (roachpb.AdmissionHeader_OTHER, admissionpb.NormalPri). OTHER bypasses admission control. It should use a variant of NewTxnRootKV to create a txn with (roachpb.AdmissionHeader_ROOT_KV, admissionpb.BulkNormalPrii).

This will likely not fix the performance isolation problems with backfills, as discussed in (a) https://cockroachlabs.slack.com/archives/C9TGBJB44/p1663864200211769?thread_ts=1663855656.539239&cid=C9TGBJB44, (b) the 16MB value of changefeed.backfill.scan_request_size is likely to be too large and fixing will require integrating low priority ScanRequests with the elastic cpu behavior. But it may help a bit, and this change is tiny enough to backport.

@cockroachdb/admission-control

Jira issue: CRDB-19940

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-admission-control labels Sep 26, 2022
@irfansharif irfansharif self-assigned this Oct 18, 2022
@irfansharif
Copy link
Contributor

There's a partial fix in #90093. Integration of rangefeed catch up scans will be done through #89709. For the initial scans I did try integrating the ScanRequests with the elastic CPU limiter but the benefits were negligible. What will be important to do is #90089. Finally, we have a roachtest now that can serve as a testbed for performance the aforementioned issues/PRs and for future work (#89656), so closing this issue.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 24, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Oct 28, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 2, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 3, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Nov 24, 2022
Part of cockroachdb#65957.

Changefeed backfills, given their scan-heavy nature, can be fairly
CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the
latency impact backfills can have on a moderately CPU-saturated cluster.
Similar to what we saw for backups, this CPU heavy nature can elevate Go
scheduling latencies which in turn translates to foreground latency
impact. This commit integrates rangefeed catchup scan with the elastic
CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves
of changefeed backfills. The second half is the initial scan -- scan
requests issued over some keyspan as of some timestamp. For that we
simply rely on the existing slots mechanism but now setting a lower
priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that
during initial scans the encoding routines in changefeedccl are the most
impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission
integration of parallel worker goroutines to future work (cockroachdb#90089).

Unlike export requests rangefeed catchup scans are non-premptible. The
rangefeed RPC is a streaming one, and the catchup scan is done during
stream setup. So we don't have resumption tokens to propagate up to the
caller like we did for backups. We still want CPU-bound work going
through admission control to only use 100ms of CPU time, to exercise
better control over scheduling latencies. To that end, we introduce the
following component used within the rangefeed catchup iterator.

    // Pacer is used in tight loops (CPU-bound) for non-premptible
    // elastic work. Callers are expected to invoke Pace() every loop
    // iteration and Close() once done. Internally this type integrates
    // with elastic CPU work queue, acquiring tokens for the CPU work
    // being done, and blocking if tokens are unavailable. This allows
    // for a form of cooperative scheduling with elastic CPU granters.
    type Pacer struct
    func (p *Pacer) Pace(ctx context.Context) error { ... }
    func (p *Pacer) Close() { ... }

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants