Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exec: fallback to distsql on aggregator with no aggregate functions #38809

Merged
merged 1 commit into from
Jul 17, 2019

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Jul 11, 2019

It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.

Release note: None

@yuzefovich yuzefovich requested review from jordanlewis, solongordon and a team July 11, 2019 01:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@asubiotto
Copy link
Contributor

Would it be possible to easily support this? Please also expand the commit message.

@yuzefovich
Copy link
Member Author

Updated the commit message. Here it is:

It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.

I started adding support for it but was hitting null pointer errors, so I decided that it was just easier to reject such a query in vectorized. I'll give it another shot right now.

@jordanlewis
Copy link
Member

The difficulty arises from the fact that we need to introduce an operator that produces a batch and takes no input (like colBatchScan) but actually does no work (like noop) except for emitting a single batch with non-zero length.

Isn't this what the various constOp implementations do, for the most part?

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constOps still have an input operator, and that input decides on when to emit zero-length batch.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference with constOp is that it is planned when we need constants to interact with other operators, for example, in query like SELECT a + 1 FROM t. In this case, we get a chain like colBatchScan -> constOp -> projectionOp.

But in this edge case (SELECT 1 FROM t HAVING true), 1 is in the render expressions of Post of aggregator spec, and we need something like singleTupleNoInputOp -> projectionOp. My hesitation partially comes from not knowing whether that special operator should always emit the first batch of length 1 or some other length (with the consecutive batch of length zero).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've just pushed a second commit that actually supports this case through vectorize. Please let me know what you think.

Yesterday I ran into a few problems but didn't realize that I needed an operator that outputs a single batch with non-zero length first. I'm still unsure whether there are cases when we want the first batch to have a length greater than 1.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Jul 11, 2019
@yuzefovich
Copy link
Member Author

Guys, please take another look.

Copy link
Member

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @solongordon, and @yuzefovich)


pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):

			// SELECT 1 FROM t HAVING true. This breaks some of the assumptions, so
			// we'll kick the query back to DistSQL.
			return nil, nil, errors.Newf("aggregator with no aggregate functions is unsupported in vectorized")

Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?

@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Jul 16, 2019
We add a special singleTupleNoInputOperator that on the first call
to Next() outputs a batch of length 1 with no actual columns and
outputs zero-length batches on all consecutive calls. This allows
us to execute queries that have only render expressions through
vectorized engine.

Release note: None
Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @solongordon)


pkg/sql/distsqlrun/column_exec_setup.go, line 142 at r1 (raw file):

Previously, jordanlewis (Jordan Lewis) wrote…

Sounds good. Can we ask the optimizer team to stop planning aggregators in this case? Is it equivalent to a simple filter?

I pinged them in the channel. I'm not sure when this case occurs tbh.

Radu said, "SELECT 1 FROM t HAVING true returns 1 row regardless of how many rows there are in the table," so aggregation appears to be mandatory here.

craig bot pushed a commit that referenced this pull request Jul 17, 2019
38809: exec: fallback to distsql on aggregator with no aggregate functions r=yuzefovich a=yuzefovich

It is possible for an aggregator to be planned without aggregate
functions, for example, in the query like:
SELECT 1 FROM t HAVING true.
It is quite tricky to make this work through vectorize, so we
should just fall back to distsql. The difficulty arises from the
fact that we need to introduce an operator that produces a batch
and takes no input (like colBatchScan) but actually does no work
(like noop) except for emitting a single batch with non-zero length.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jul 17, 2019

Build succeeded

@craig craig bot merged commit 95bbda7 into cockroachdb:master Jul 17, 2019
@yuzefovich yuzefovich deleted the exec-no-agg branch July 19, 2019 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants