-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colexec: add support for percent_rank and cume_dist window functions #42137
Conversation
c34e65d
to
31ebca3
Compare
31ebca3
to
68d88e6
Compare
Added support for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits.
Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 10 of 10 files at r3, 10 of 10 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/rank_tmpl.go, line 52 at r3 (raw file):
if len(orderingCols) == 0 { if windowFn == execinfrapb.WindowerSpec_PERCENT_RANK { return NewConstOp(allocator, input, coltypes.Float64, float64(0), outputColIdx)
Out of curiosity, could you explain why we return a const op that returns a zero float?
pkg/sql/colexec/window_functions_util.go, line 17 at r1 (raw file):
const columnOmitted = -1 var windowFnNeedsPeersInfo map[execinfrapb.WindowerSpec_WindowFunc]bool
Could we add a comment to this? I personally have very little knowledge of window functions so some explanations are usually helpful.
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
// WindowFnOutputType is a mapping from a window function to its output type. // Only window functions supported by the vectorized engine are present. WindowFnOutputType map[execinfrapb.WindowerSpec_WindowFunc]*types.T
What if instead of doing this you created a function called GetWindowFnInfo()
that returns both the peer info an output type according to a switch statement? I think it would be easier to read. Maybe this should be shared across col and rowexec?
pkg/sql/colexec/window_peer_grouper_tmpl.go, line 33 at r1 (raw file):
// outputColIdx'th column (which is appended if needed) for every tuple that is // the first within its peer group. Peers are tuples that belong to the same // partition and are equal on the ordering columns. If ordCols is empty, than
nit s/than/then
. I would also rename ordCols
to orderingCols
since ord
makes me think of ordinal
rather than ordering
.
pkg/sql/colexec/window_peer_grouper_tmpl.go, line 34 at r1 (raw file):
// the first within its peer group. Peers are tuples that belong to the same // partition and are equal on the ordering columns. If ordCols is empty, than // all tuples within the partition are peers.
Window functions are confusing.
pkg/sql/distsql/columnar_operators_test.go, line 846 at r3 (raw file):
} func TestWindowFunctionsAgainstProcessor(t *testing.T) {
You probably already did this but I want to make sure we stress this test for a bit before merging.
pkg/sql/distsql/columnar_operators_test.go, line 879 at r3 (raw file):
continue } inputTypes := typs[:nCols:nCols]
Why the extra nCols
?
pkg/sql/logictest/testdata/logic_test/window, line 1 at r3 (raw file):
# LogicTest: local local-vec fakedist fakedist-metadata fakedist-vec 5node-local 5node-dist 5node-dist-metadata 5node-dist-disk
Super cool!
f896534
to
edd730f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)
pkg/sql/colexec/rank_tmpl.go, line 52 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Out of curiosity, could you explain why we return a const op that returns a zero float?
When there are no columns in ORDER BY
clause of window functions, then all rows are considered "peers". Percent rank is calculated as (rank - 1) / (total rows - 1)
, and in such scenario all rows' rank = 0
, so percentRank
will also be 0
for all rows.
yuzefovich=# create table temp (a int);
CREATE TABLE
yuzefovich=# insert into temp values (0), (1), (2);
INSERT 0 3
yuzefovich=# select a, percent_rank() over () from temp;
a | percent_rank
---+--------------
0 | 0
1 | 0
2 | 0
(3 rows)
pkg/sql/colexec/window_functions_util.go, line 17 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Could we add a comment to this? I personally have very little knowledge of window functions so some explanations are usually helpful.
Added a comment.
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
What if instead of doing this you created a function called
GetWindowFnInfo()
that returns both the peer info an output type according to a switch statement? I think it would be easier to read. Maybe this should be shared across col and rowexec?
We actually already have such function, I moved it from rowexec
intoexecinfrapb
package. Done.
pkg/sql/colexec/window_peer_grouper_tmpl.go, line 33 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit
s/than/then
. I would also renameordCols
toorderingCols
sinceord
makes me think ofordinal
rather thanordering
.
Done.
pkg/sql/colexec/window_peer_grouper_tmpl.go, line 34 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Window functions are confusing.
Indeed :)
pkg/sql/distsql/columnar_operators_test.go, line 846 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
You probably already did this but I want to make sure we stress this test for a bit before merging.
Yeah, I did 100 runs. Initially, this test confirmed the bug I realized I had in my original implementation of percentRank
when we have more than a single batch of data. (That's why I bumped the number of rows here.)
pkg/sql/distsql/columnar_operators_test.go, line 879 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Why the extra
nCols
?
This is a very subtle issue that I spent a few minutes trying to debug. Here is what happens:
- at the top of the test we instantiate
typs
up tomaxCols
- then, depending on the test spec, we might actually have less columns, so we slice it up, i.e.
inputTypes := typs[:nCols]
- later, we're appending to
inputTypes
the return type of the window function, and here is the catch - if we don't specify the capacity ofinputTypes
, then no new array is allocated and we, instead, overwrite one of thetyps
withreturnType
, but we assume thattyps
are immutable.
pkg/sql/logictest/testdata/logic_test/window, line 1 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Super cool!
Ack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r5, 2 of 2 files at r6, 13 of 13 files at r7, 10 of 11 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @yuzefovich)
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
Previously, yuzefovich wrote…
We actually already have such function, I moved it from
rowexec
intoexecinfrapb
package. Done.
Maybe we still want to do something like this for the peer info?
pkg/sql/colexec/window_functions_util.go, line 28 at r8 (raw file):
// SupportedWindowFns contains all window functions supported by the // vectorized engine. SupportedWindowFns = []execinfrapb.WindowerSpec_WindowFunc{
I would make this a map[wfn]struct{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Maybe we still want to do something like this for the peer info?
rowexec.Windower
doesn't care about this info (it has a separate PeerGroupIndicesHelper
utility struct). In colexec
we have slightly different model: we plan different operators on "top" each other, so we need to know this info during planning, in rowexec
that info is "assessed" dynamically.
I also think (but not sure) that row_number
might be the only window function that doesn't pay attention to "peers". If so, maybe we'll remove this map
later.
pkg/sql/colexec/window_functions_util.go, line 28 at r8 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I would make this a map[wfn]struct{}
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @yuzefovich)
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
Previously, yuzefovich wrote…
rowexec.Windower
doesn't care about this info (it has a separatePeerGroupIndicesHelper
utility struct). Incolexec
we have slightly different model: we plan different operators on "top" each other, so we need to know this info during planning, inrowexec
that info is "assessed" dynamically.I also think (but not sure) that
row_number
might be the only window function that doesn't pay attention to "peers". If so, maybe we'll remove thismap
later.
I guess I meant just a private function used by colexec
instead of a map. But up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @jordanlewis)
pkg/sql/colexec/window_functions_util.go, line 24 at r4 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I guess I meant just a private function used by
colexec
instead of a map. But up to you.
Oh, I see, I guess a function does look better. Done.
This commit adds an operator that populates a boolean column to signify whether the corresponding tuple is the start of a new peer group. Peers are such tuples that belong to the same partition and are equal on the ordering columns. Some window functions must return the same output for all peers in the peer group. Currently this operator is not used. Release note: None
Previously, rank and denseRank operators contained the logic to figure out the boundaries of the peer groups, but now that we have a window peer grouper, it is no longer necessary which simplified the code a little bit. Release note: None
This commit adds the support for PERCENT_RANK window function. This function differs from two other rank variances in that it needs to know the number of tuples in the partition. If there is no PARTITION BY clause, then we have no other choice but to buffer the input fully. If PARTITION BY clause is present, we need to buffer all tuples that belong to each partition before we can populate the output. However, for simplicity, the current implementation of the operator with PARTITION BY clause also fully buffers the whole input before emitting any output. This commit also adds a couple of "vec-on" configs to 'window' logic test. This will increase the test coverage of window functions supported by the vectorized engine. Release note: None
I amended the last commit to plan three streaming window functions with |
This commit adds support for CUME_DIST window function. This function is quite similar to PERCENT_RANK, so it reuses the same template. "percent_rank" things have been renamed to "relative_rank" things. This commit also enables running of `row_number`, `rank`, and `dense_rank` with `vectorize=auto`. The reason is that these window functions are streaming and internally they might a sorter which can fallback to disk if necessary. Release note: None
TFTR! bors r+ |
bors r+ |
Build succeeded |
colexec: add window peer grouper
This commit adds an operator that populates a boolean column to signify
whether the corresponding tuple is the start of a new peer group. Peers
are such tuples that belong to the same partition and are equal on the
ordering columns. Some window functions must return the same output for
all peers in the peer group. Currently this operator is not used.
Release note: None
colexec: minor refactor of rank and denseRank operators
Previously, rank and denseRank operators contained the logic to figure
out the boundaries of the peer groups, but now that we have a window
peer grouper, it is no longer necessary which simplified the code a
little bit.
Release note: None
colexec: add support for PERCENT_RANK window function
This commit adds the support for PERCENT_RANK window function. This
function differs from two other rank variances in that it needs to know
the number of tuples in the partition. If there is no PARTITION BY
clause, then we have no other choice but to buffer the input fully. If
PARTITION BY clause is present, we need to buffer all tuples that belong
to each partition before we can populate the output. However, for
simplicity, the current implementation of the operator with PARTITION BY
clause also fully buffers the whole input before emitting any output.
This commit also adds a couple of "vec-on" configs to 'window' logic
test. This will increase the test coverage of window functions supported
by the vectorized engine.
Addresses: #37035.
Release note: None
colexec: add support for cume_dist window function
This commit adds support for CUME_DIST window function. This function is
quite similar to PERCENT_RANK, so it reuses the same template.
"percent_rank" things have been renamed to "relative_rank" things.
This commit also enables running of
row_number
,rank
, anddense_rank
withvectorize=auto
. The reason is that these windowfunctions are streaming and internally they might a sorter which can
fallback to disk if necessary.
Addresses: #37035.
Release note: None