-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exec: fallback to distsql on aggregator with no aggregate functions #38809
Conversation
Would it be possible to easily support this? Please also expand the commit message. |
Updated the commit message. Here it is: It is possible for an aggregator to be planned without aggregate I started adding support for it but was hitting null pointer errors, so I decided that it was just easier to reject such a query in vectorized. I'll give it another shot right now. |
Isn't this what the various |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constOp
s still have an input operator, and that input decides on when to emit zero-length batch.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference with constOp
is that it is planned when we need constants to interact with other operators, for example, in query like SELECT a + 1 FROM t
. In this case, we get a chain like colBatchScan -> constOp -> projectionOp
.
But in this edge case (SELECT 1 FROM t HAVING true
), 1
is in the render expressions of Post
of aggregator spec, and we need something like singleTupleNoInputOp -> projectionOp
. My hesitation partially comes from not knowing whether that special operator should always emit the first batch of length 1 or some other length (with the consecutive batch of length zero).
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've just pushed a second commit that actually supports this case through vectorize. Please let me know what you think.
Yesterday I ran into a few problems but didn't realize that I needed an operator that outputs a single batch with non-zero length first. I'm still unsure whether there are cases when we want the first batch to have a length greater than 1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
Guys, please take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):
// SELECT 1 FROM t HAVING true. This breaks some of the assumptions, so // we'll kick the query back to DistSQL. return nil, nil, errors.Newf("aggregator with no aggregate functions is unsupported in vectorized")
Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?
We add a special singleTupleNoInputOperator that on the first call to Next() outputs a batch of length 1 with no actual columns and outputs zero-length batches on all consecutive calls. This allows us to execute queries that have only render expressions through vectorized engine. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)
pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):
Previously, jordanlewis (Jordan Lewis) wrote…
Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?
I pinged them in the channel. I'm not sure when this case occurs tbh.
Radu said, "SELECT 1 FROM t HAVING true
returns 1 row regardless of how many rows there are in the table," so aggregation appears to be mandatory here.
38809: exec: fallback to distsql on aggregator with no aggregate functions r=yuzefovich a=yuzefovich It is possible for an aggregator to be planned without aggregate functions, for example, in the query like: SELECT 1 FROM t HAVING true. It is quite tricky to make this work through vectorize, so we should just fall back to distsql. The difficulty arises from the fact that we need to introduce an operator that produces a batch and takes no input (like colBatchScan) but actually does no work (like noop) except for emitting a single batch with non-zero length. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Build succeeded |
It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.
Release note: None