Skip to content

Commit

Permalink
Merge #139191
Browse files Browse the repository at this point in the history
139191: sql: add fast-path to maybeParallelizeLocalScans r=yuzefovich a=yuzefovich

`maybeParallelizeLocalScans` is responsible for deciding whether we want to plan multiple TableReaders on the gateway to serve one set of key spans. This is needed to power the lookup into remote regions in parallel with locality optimized search.

By construction, we will only plan multiple TableReaders if the set of key spans touches multiple ranges, so this commit introduces a quick check to see whether we have at least two ranges involved, and if not, then we can skip a somewhat heavy `PartitionSpans` call.


I temporarily added `SELECT k, v FROM kv WHERE k IN (1, 3)` "kv-read-two" query to `BenchmarkEndToEnd` to highlight the improvement:
```
name                                            old time/op    new time/op    delta
EndToEnd/kv-read-two/vectorize=on/Simple-24        431µs ± 2%     420µs ± 2%  -2.44%  (p=0.000 n=10+10)
EndToEnd/kv-read-two/vectorize=on/Prepared-24      295µs ± 2%     289µs ± 2%  -2.12%  (p=0.001 n=10+10)
EndToEnd/kv-read-two/vectorize=off/Simple-24       408µs ± 2%     411µs ± 3%    ~     (p=0.280 n=10+10)
EndToEnd/kv-read-two/vectorize=off/Prepared-24     278µs ± 2%     279µs ± 2%    ~     (p=0.447 n=10+9)

name                                            old alloc/op   new alloc/op   delta
EndToEnd/kv-read-two/vectorize=on/Prepared-24     30.1kB ± 0%    27.9kB ± 1%  -7.07%  (p=0.000 n=10+10)
EndToEnd/kv-read-two/vectorize=on/Simple-24       43.7kB ± 1%    41.5kB ± 0%  -4.98%  (p=0.000 n=10+9)
EndToEnd/kv-read-two/vectorize=off/Simple-24      40.6kB ± 0%    40.6kB ± 1%    ~     (p=0.399 n=9+9)
EndToEnd/kv-read-two/vectorize=off/Prepared-24    28.9kB ± 0%    29.0kB ± 0%    ~     (p=0.781 n=10+10)

name                                            old allocs/op  new allocs/op  delta
EndToEnd/kv-read-two/vectorize=on/Prepared-24        243 ± 0%       230 ± 0%  -5.19%  (p=0.000 n=8+10)
EndToEnd/kv-read-two/vectorize=on/Simple-24          346 ± 0%       334 ± 0%  -3.58%  (p=0.000 n=6+10)
EndToEnd/kv-read-two/vectorize=off/Simple-24         315 ± 0%       315 ± 0%    ~     (p=1.210 n=10+9)
EndToEnd/kv-read-two/vectorize=off/Prepared-24       223 ± 0%       223 ± 0%    ~     (all equal)
```

Note that `vectorize=off` setup is unaffected because the parallelize local scans feature is disabled when we use row-by-row engine.

Epic: None
Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
  • Loading branch information
craig[bot] and yuzefovich committed Jan 16, 2025
2 parents 51a1527 + 730988a commit 5a59e16
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 3 deletions.
24 changes: 24 additions & 0 deletions pkg/sql/distsql_physical_planner.go
Original file line number Diff line number Diff line change
Expand Up @@ -2247,6 +2247,30 @@ func (dsp *DistSQLPlanner) maybeParallelizeLocalScans(
!prohibitParallelScans &&
dsp.parallelLocalScansSem.ApproximateQuota() > 0 &&
planCtx.spanIter != nil { // This condition can only be false in tests.
// Do a quick check whether we will touch at least two ranges. If we
// only touch a single range, then we'll end up with a single span
// partition, so we can skip the PartitionSpans call below.
it, rangeID, foundTwoRanges := planCtx.spanIter, int64(0), false
for i := 0; i < len(info.spans) && !foundTwoRanges; i++ {
sp := info.spans[i]
for it.Seek(ctx, sp, kvcoord.Ascending); it.Valid(); it.Next(ctx) {
rID := int64(it.Desc().RangeID)
if rangeID == 0 {
rangeID = rID
} else if rangeID != rID {
foundTwoRanges = true
break
}
if sp.EndKey == nil || !it.NeedAnother() {
break
}
}
}
if !foundTwoRanges {
spanPartitions = []SpanPartition{{SQLInstanceID: dsp.gatewaySQLInstanceID, Spans: info.spans}}
parallelizeLocal = false
return spanPartitions, parallelizeLocal
}
parallelizeLocal = true
// Temporarily unset isLocal so that PartitionSpans divides all spans
// according to the respective leaseholders.
Expand Down
9 changes: 6 additions & 3 deletions pkg/sql/opt/exec/execbuilder/testdata/select
Original file line number Diff line number Diff line change
Expand Up @@ -1932,9 +1932,10 @@ SELECT id FROM system.namespace WHERE name='a'

# The span "sending partial batch" means that the scan was parallelized. We're
# seeing duplicate "querying next range" entries because we first use the range
# cache to try to partition the spans in order to have parallel TableReaders (we
# end up with a single partition though), and then we have a single TableReader
# performing the scan of two spans in parallel.
# cache to count the number of ranges affected and then to try to partition the
# spans in order to have parallel TableReaders (we end up with a single
# partition though). Then we have a single TableReader performing the scan of
# two spans in parallel.
# If this test is failing and doesn't have that span, it means that the scanNode
# was improperly configured to add a limit to the ScanRequest batch.
# See #30943 for more details.
Expand All @@ -1946,6 +1947,8 @@ WHERE message LIKE 'querying next range at /Table/$t_id3/1%' OR
querying next range at /Table/127/1/0/0
querying next range at /Table/127/1/10/0
querying next range at /Table/127/1/0/0
querying next range at /Table/127/1/10/0
querying next range at /Table/127/1/0/0
=== SPAN START: kv.DistSender: sending partial batch ===
querying next range at /Table/127/1/10/0

Expand Down

0 comments on commit 5a59e16

Please sign in to comment.