-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator #48854
Conversation
5b48b7e
to
09b04ed
Compare
The implementation for this PR ended up getting too big. I can split this into separate PRs. |
This PR now just covers the |
cc @gengliangwang @cloud-fan this one is ready for review at your convenience. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/pipeOperators.scala
Outdated
Show resolved
Hide resolved
PipeExpression(newChild, isAggregate, clause) | ||
override lazy val replacement: Expression = if (isAggregate) { | ||
// Make sure the child expression contains at least one aggregate function. | ||
var foundAggregate = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
val hasAggFunc = this.exists(_.isInstanceOf[AggregateFunction])
if (isAggregate && !hasAggFunc) error ...
else if (!isAggregate && hasAggFunc) error...
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is simpler. (I had to keep the recursive method in order to stop traversing through window expressions, but this logic is better.)
|
||
-- !query | ||
table t | ||
|> extend * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated to this PR, but do we support SELECT *
in pipe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
respond to code review comments respond to code review comments respond to code review comments
e780314
to
80421df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cloud-fan for your reviews!!
|
||
-- !query | ||
table t | ||
|> extend * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PipeExpression(newChild, isAggregate, clause) | ||
override lazy val replacement: Expression = if (isAggregate) { | ||
// Make sure the child expression contains at least one aggregate function. | ||
var foundAggregate = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is simpler. (I had to keep the recursive method in order to stop traversing through window expressions, but this logic is better.)
thanks, merging to master! |
What changes were proposed in this pull request?
This PR adds SQL pipe syntax support for the EXTEND operator.
This operator preserves the existing input table and adds one or more new computed columns whose values are equal to evaluating the specified expressions. This is equivalent to
SELECT *, <newExpressions>
in the SQL compiler. It is provided as a convenience feature and some functionality overlap exists with lateral column aliases.For example:
Like the
|> SELECT
operator, aggregate functions are not allowed in these expressions. During the course of developing reasonable error messages for this, I found that the SQL pipe syntax research paper also specified that the|> AGGREGATE
operator should require that each non-grouping expression contains at least one aggregate function; I added a check and reasonable error message for this case as well.Why are the changes needed?
The SQL pipe operator syntax will let users compose queries in a more flexible fashion.
Does this PR introduce any user-facing change?
Yes, see above.
How was this patch tested?
This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.
Was this patch authored or co-authored using generative AI tooling?
No