Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: propagate the set of needed columns in table reader spec #56540

Merged
merged 1 commit into from
Nov 11, 2020

Conversation

yuzefovich
Copy link
Member

This commit adds the propagation of the set of needed columns via the
table reader spec and that information is now used when setting up the
ColBatchScans. The row-by-row engine is not affected since it still
needs to set up the ProcOutputHelpers, but that is no longer needed in
the vectorized engine which gives us a couple of percent improvement on
KV microbenchmark.

Addresses: #53893

Release note: None

This commit adds the propagation of the set of needed columns via the
table reader spec and that information is now used when setting up the
ColBatchScans. The row-by-row engine is not affected since it still
needs to set up the ProcOutputHelpers, but that is no longer needed in
the vectorized engine which gives us a couple of percent improvement on
KV microbenchmark.

Release note: None
@yuzefovich yuzefovich requested review from asubiotto and a team November 11, 2020 03:12
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@asubiotto
Copy link
Contributor

a couple of percent improvement on KV microbenchmark.

That's huge. How much?

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/execinfrapb/processors_sql.proto, line 144 at r1 (raw file):

  // Indicates the ordinals of the columns values for which are needed by the
  // post-processing stage and, therefore, are to be populated. It is ignored
  // if is_check is true.

What happens if none are specified? Can that happen?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have called it out too early - further benchmarking showed no difference, and on actual KV95 workload there appears to be a slowdown of 0.2% or so. All the benchmarks I'm currently running have non-negligible degree of variance, so it might be noise since I expect this change to be beneficial because it removes one of the big sources of allocations from the heap profile. Anyway, I'll rerun them shortly.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/execinfrapb/processors_sql.proto, line 144 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

What happens if none are specified? Can that happen?

Then no values will be populated and all columns are projected out by the post-processing spec. This could happen for the query like SELECT count(*) FROM t.

Copy link
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)

@yuzefovich
Copy link
Member Author

Microbenchmarks:

  • full scan of an empty table with filter (what we have on master)
name                                          old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-24     22.5µs ± 6%    22.5µs ± 7%    ~     (p=1.000 n=19+20)
FlowSetup/vectorize=true/distribute=false-24    20.1µs ± 5%    21.4µs ±17%  +6.57%  (p=0.004 n=20+20)

name                                          old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-24     33.7kB ± 1%    32.8kB ± 0%  -2.82%  (p=0.000 n=20+19)
FlowSetup/vectorize=true/distribute=false-24    30.2kB ± 0%    29.3kB ± 0%  -3.08%  (p=0.000 n=19+18)

name                                          old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-24        316 ± 0%       312 ± 0%  -1.27%  (p=0.000 n=19+19)
FlowSetup/vectorize=true/distribute=false-24       285 ± 0%       281 ± 0%  -1.40%  (p=0.000 n=17+16)
  • modified benchmark to be a KV lookup with a single row in the table:
name                                           old time/op    new time/op    delta
FlowSetup/vectorize=true/distribute=true-24      21.9µs ± 9%    21.7µs ± 6%  -1.04%  (p=0.013 n=92+92)
FlowSetup/vectorize=true/distribute=false-24     20.5µs ± 6%    19.6µs ± 6%  -4.36%  (p=0.000 n=95+96)
FlowSetup/vectorize=false/distribute=true-24     20.9µs ± 8%    20.7µs ± 7%  -1.09%  (p=0.011 n=88+90)
FlowSetup/vectorize=false/distribute=false-24    19.6µs ± 5%    19.5µs ± 7%    ~     (p=0.057 n=95+98)

name                                           old alloc/op   new alloc/op   delta
FlowSetup/vectorize=true/distribute=true-24      33.7kB ± 0%    32.8kB ± 0%  -2.72%  (p=0.000 n=87+87)
FlowSetup/vectorize=true/distribute=false-24     30.3kB ± 0%    29.3kB ± 0%  -3.13%  (p=0.000 n=87+87)
FlowSetup/vectorize=false/distribute=true-24     34.3kB ± 0%    34.3kB ± 0%  -0.03%  (p=0.000 n=90+88)
FlowSetup/vectorize=false/distribute=false-24    30.9kB ± 0%    30.9kB ± 0%  +0.02%  (p=0.000 n=87+88)

name                                           old allocs/op  new allocs/op  delta
FlowSetup/vectorize=true/distribute=true-24         317 ± 1%       314 ± 1%  -0.98%  (p=0.000 n=100+100)
FlowSetup/vectorize=true/distribute=false-24        288 ± 0%       284 ± 0%  -1.39%  (p=0.000 n=100+100)
FlowSetup/vectorize=false/distribute=true-24        288 ± 0%       288 ± 0%  +0.11%  (p=0.000 n=100+100)
FlowSetup/vectorize=false/distribute=false-24       257 ± 0%       258 ± 0%  +0.58%  (p=0.000 n=100+100)

As I expected, the KV-like benchmark shows non-negligible improvement, but it has big variance, so I had to significantly increase the run count to filter the noise out.

I will run KV95 and TPCC benchmarks with this PR and see where we stand.

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 11, 2020

Build succeeded:

@craig craig bot merged commit ba363c7 into cockroachdb:master Nov 11, 2020
@yuzefovich yuzefovich deleted the vec-scan branch November 11, 2020 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants