-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bulk/kv: uploading to cloud storage in ExportRequest evaluation considered harmful #66486
Comments
It's worth pointing out that this would compose really badly with evaluation-time rate-limiting of ExportRequests. It should be less severe with #66338, as a few slow Exports will no longer cascade with the possibility of blocking many ranges. |
The If anything, I think we were planning to start making To fix both these, we've been working on instead chasing it so we write directly while iterating: we've changed the API for our external IO to return an It seems like #66485 should help a lot here, shouldn't it? |
We should be able to evaluate the impact of always using |
In the meantime we can/should add a setting to opt into proxying writes to the sql proc, to make it easier to measure the overhead and/or give users who want it -- and whatever that overhead is -- that choice. Opened #66540. |
Related to cockroachdb#66486. Command evaluation is meant to operate in a sandbox. It certainly shouldn't have access to a DB handle.
There was discussion that moved to https://cockroachlabs.slack.com/archives/C2C5FKPPB/p1623782623130900. |
Writes were moved from under the request evaluation to processor. |
In #66338, we found that rate-limiting
ExportRequests
during evaluation could lead to transitive stalls in workload traffic, if interleaved poorly with a Split. That PR moved rate-limiting ofExportRequest
above latching to avoid holding latches when not strictly necessary. We've also discussed ideas around dropping latches earlier for reads, before evaluation, in #66485. That issue seems challenging and large in scope, but promising.However, for now, we need to continue to be careful about the duration that read-only requests hold read latches.
During a code audit earlier today, we found that
ExportRequest
can be configured to upload files to cloud storage directly during evaluation, while holding latches:cockroach/pkg/ccl/storageccl/export.go
Lines 194 to 203 in a61f01d
This seems potentially disastrous, as it means that we will be performing network operations during evaluation. In fact, we'll even retry this upload up to 5 times (
maxUploadRetries
). So it's hard to place any limit on the duration that a givenExport
request may run for. As a result, it's hard to place any limit on the duration that a givenExport
request may transitively block foreground reads and writes.I'd like to learn whether we need this capability and push to get rid of it. Even once we address #66485, it still seems like an abuse to touch the network during request evaluation, which is meant to operate in a sandboxed scope of a replica. That is simply not what the framework is meant for.
Interestingly, we do have a separate code path that avoids this. We have a way to specify that an
ExportRequest
should return an SST (usingReturnSST
) instead of immediately uploading it. We then can perform the upload from the DistSQLbackupProcessor
:cockroach/pkg/ccl/backupccl/backup_processor.go
Lines 437 to 439 in a61f01d
This seems like a much more appropriate way to evaluate a backup. It also seems like it doesn't trade much in terms of performance when the
backupProcessor
is scheduled on the same node as the range's leaseholder. Either way, we're still pulling chunks into memory and then uploading them. The only difference is that we'll pull the chunk up a few levels higher in the stack.Am I understanding all of this correctly? If so, what can we do here?
/cc. @dt @aayushshah15 @andreimatei
The text was updated successfully, but these errors were encountered: