exec: fallback to distsql on aggregator with no aggregate functions #38809

yuzefovich · 2019-07-11T01:46:30Z

It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.

Release note: None

cockroach-teamcity · 2019-07-11T01:46:37Z

This change is

asubiotto · 2019-07-11T14:15:02Z

Would it be possible to easily support this? Please also expand the commit message.

yuzefovich · 2019-07-11T16:08:50Z

Updated the commit message. Here it is:

It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.

I started adding support for it but was hitting null pointer errors, so I decided that it was just easier to reject such a query in vectorized. I'll give it another shot right now.

jordanlewis · 2019-07-11T16:25:41Z

The difficulty arises from the fact that we need to introduce an operator that produces a batch and takes no input (like colBatchScan) but actually does no work (like noop) except for emitting a single batch with non-zero length.

Isn't this what the various constOp implementations do, for the most part?

yuzefovich

constOps still have an input operator, and that input decides on when to emit zero-length batch.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

yuzefovich

The difference with constOp is that it is planned when we need constants to interact with other operators, for example, in query like SELECT a + 1 FROM t. In this case, we get a chain like colBatchScan -> constOp -> projectionOp.

But in this edge case (SELECT 1 FROM t HAVING true), 1 is in the render expressions of Post of aggregator spec, and we need something like singleTupleNoInputOp -> projectionOp. My hesitation partially comes from not knowing whether that special operator should always emit the first batch of length 1 or some other length (with the consecutive batch of length zero).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

yuzefovich

Ok, I've just pushed a second commit that actually supports this case through vectorize. Please let me know what you think.

Yesterday I ran into a few problems but didn't realize that I needed an operator that outputs a single batch with non-zero length first. I'm still unsure whether there are cases when we want the first batch to have a length greater than 1.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

yuzefovich · 2019-07-15T21:22:44Z

Guys, please take another look.

jordanlewis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)

pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):

			// SELECT 1 FROM t HAVING true. This breaks some of the assumptions, so
			// we'll kick the query back to DistSQL.
			return nil, nil, errors.Newf("aggregator with no aggregate functions is unsupported in vectorized")

Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?

We add a special singleTupleNoInputOperator that on the first call to Next() outputs a batch of length 1 with no actual columns and outputs zero-length batches on all consecutive calls. This allows us to execute queries that have only render expressions through vectorized engine. Release note: None

yuzefovich

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?

I pinged them in the channel. I'm not sure when this case occurs tbh.

Radu said, "SELECT 1 FROM t HAVING true returns 1 row regardless of how many rows there are in the table," so aggregation appears to be mandatory here.

38809: exec: fallback to distsql on aggregator with no aggregate functions r=yuzefovich a=yuzefovich It is possible for an aggregator to be planned without aggregate functions, for example, in the query like: SELECT 1 FROM t HAVING true. It is quite tricky to make this work through vectorize, so we should just fall back to distsql. The difficulty arises from the fact that we need to introduce an operator that produces a batch and takes no input (like colBatchScan) but actually does no work (like noop) except for emitting a single batch with non-zero length. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

craig · 2019-07-17T00:55:32Z

Build succeeded

GitHub CI (Cockroach)

yuzefovich requested review from jordanlewis, solongordon and a team July 11, 2019 01:46

yuzefovich force-pushed the exec-no-agg branch from 579ee59 to d56c2a4 Compare July 11, 2019 16:05

yuzefovich commented Jul 11, 2019

View reviewed changes

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jul 11, 2019

rafiss mentioned this pull request Jul 16, 2019

exec: 'index out of range' errors for some aggregations #38750

Closed

jordanlewis approved these changes Jul 16, 2019

View reviewed changes

yuzefovich force-pushed the exec-no-agg branch from 948ad07 to c5d8528 Compare July 16, 2019 22:44

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jul 16, 2019

yuzefovich force-pushed the exec-no-agg branch from c5d8528 to 95bbda7 Compare July 16, 2019 23:10

yuzefovich commented Jul 17, 2019

View reviewed changes

craig bot merged commit 95bbda7 into cockroachdb:master Jul 17, 2019

yuzefovich deleted the exec-no-agg branch July 19, 2019 03:56

knz mentioned this pull request Nov 10, 2019

User-facing changes in 19.2 that were not picked up in release notes cockroachdb/docs#5819

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exec: fallback to distsql on aggregator with no aggregate functions #38809

exec: fallback to distsql on aggregator with no aggregate functions #38809

yuzefovich commented Jul 11, 2019 •

edited

Loading

cockroach-teamcity commented Jul 11, 2019

asubiotto commented Jul 11, 2019

yuzefovich commented Jul 11, 2019

jordanlewis commented Jul 11, 2019

yuzefovich left a comment

yuzefovich left a comment

yuzefovich left a comment

yuzefovich commented Jul 15, 2019

jordanlewis left a comment

yuzefovich left a comment

craig bot commented Jul 17, 2019

exec: fallback to distsql on aggregator with no aggregate functions #38809

exec: fallback to distsql on aggregator with no aggregate functions #38809

Conversation

yuzefovich commented Jul 11, 2019 • edited Loading

cockroach-teamcity commented Jul 11, 2019

asubiotto commented Jul 11, 2019

yuzefovich commented Jul 11, 2019

jordanlewis commented Jul 11, 2019

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

yuzefovich commented Jul 15, 2019

jordanlewis left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

craig bot commented Jul 17, 2019

Build succeeded

yuzefovich commented Jul 11, 2019 •

edited

Loading