Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator #48854

Closed
wants to merge 11 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Nov 14, 2024

What changes were proposed in this pull request?

This PR adds SQL pipe syntax support for the EXTEND operator.

This operator preserves the existing input table and adds one or more new computed columns whose values are equal to evaluating the specified expressions. This is equivalent to SELECT *, <newExpressions> in the SQL compiler. It is provided as a convenience feature and some functionality overlap exists with lateral column aliases.

For example:

CREATE TABLE t(x INT, y STRING) USING CSV;
INSERT INTO t VALUES (0, 'abc'), (1, 'def');

TABLE t
|> EXTEND x + LENGTH(y) AS z;

+----+-----+-----+
|  x |   y |   z |
+----+-----+-----+
|  0 | abc |   3 |
|  1 | def |   4 |
+----+-----+-----+

Like the |> SELECT operator, aggregate functions are not allowed in these expressions. During the course of developing reasonable error messages for this, I found that the SQL pipe syntax research paper also specified that the |> AGGREGATE operator should require that each non-grouping expression contains at least one aggregate function; I added a check and reasonable error message for this case as well.

Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

Was this patch authored or co-authored using generative AI tooling?

No

commit

commit

commit

commit

commit

commit

commit

fix tests

commit

commit

commit

commit

commit

commit
@dtenedor dtenedor force-pushed the pipe-syntax-projections branch from 5b48b7e to 09b04ed Compare November 18, 2024 07:24
@dtenedor
Copy link
Contributor Author

The implementation for this PR ended up getting too big. I can split this into separate PRs.

@dtenedor dtenedor changed the title [WIP][SPARK-49566][SQL] Implement SQL pipe syntax EXTEND + SET + DROP + AS operators [WIP][SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator Nov 18, 2024
@dtenedor
Copy link
Contributor Author

This PR now just covers the EXTEND operator.

@dtenedor dtenedor changed the title [WIP][SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator Nov 18, 2024
@dtenedor dtenedor changed the title SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator [SPARK-49566][SQL] Implement SQL pipe syntax EXTEND + SET + DROP + AS operators Nov 18, 2024
@dtenedor dtenedor changed the title [SPARK-49566][SQL] Implement SQL pipe syntax EXTEND + SET + DROP + AS operators [SPARK-49566][SQL] Add SQL pipe syntax for the EXTEND operator Nov 18, 2024
@dtenedor dtenedor marked this pull request as ready for review November 18, 2024 23:19
@dtenedor
Copy link
Contributor Author

cc @gengliangwang @cloud-fan this one is ready for review at your convenience.

@dtenedor dtenedor requested a review from cloud-fan November 20, 2024 23:42
PipeExpression(newChild, isAggregate, clause)
override lazy val replacement: Expression = if (isAggregate) {
// Make sure the child expression contains at least one aggregate function.
var foundAggregate = false
Copy link
Contributor

@cloud-fan cloud-fan Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

val hasAggFunc = this.exists(_.isInstanceOf[AggregateFunction])
if (isAggregate && !hasAggFunc) error ...
else if (!isAggregate && hasAggFunc) error...
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is simpler. (I had to keep the recursive method in order to stop traversing through window expressions, but this logic is better.)


-- !query
table t
|> extend *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR, but do we support SELECT * in pipe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

respond to code review comments

respond to code review comments

respond to code review comments
@dtenedor dtenedor force-pushed the pipe-syntax-projections branch from e780314 to 80421df Compare November 21, 2024 21:24
Copy link
Contributor Author

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan for your reviews!!


-- !query
table t
|> extend *
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipeExpression(newChild, isAggregate, clause)
override lazy val replacement: Expression = if (isAggregate) {
// Make sure the child expression contains at least one aggregate function.
var foundAggregate = false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is simpler. (I had to keep the recursive method in order to stop traversing through window expressions, but this logic is better.)

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in ea222a3 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants