Skip to content

Conversation

@mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Apr 16, 2025

What changes were proposed in this pull request?

This is a followup to #43797 and #50461 where hacks were introduced in order to solve the issue of repeated analysis on plans that have a group by ordinal. The latter PR caused a regression so the hack needs to be removed. This PR proposed a move of UnresolvedOrdinal construction before Analyzer runs.

Why are the changes needed?

We are reverting a hack introduced in the previous PRs to improve the behavior of group by ordinal and additionally fix the issue that #50461 was trying to solve.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 16, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch 5 times, most recently from da5bdea to d0a8187 Compare April 16, 2025 19:35
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing some test cases from AggregateResolverSuite because we need to support the new behavior in single-pass analyzer as well, but that might bloat this PR too much. I would like to do that in a followup where I will revert this change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy these test contents to the Jira so we don't forget?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from d0a8187 to daa2aa3 Compare April 16, 2025 19:46
@mihailotim-db mihailotim-db changed the title group by ordinal [SPARK-51820][SQL] Move UnresolvedOrdinal construction before analysis to avoid issue with group by ordinal Apr 16, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from daa2aa3 to 56bbcb1 Compare April 16, 2025 20:02
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you copy these test contents to the Jira so we don't forget?

@cloud-fan
Copy link
Contributor

shall we also handle Spark Connect queries in SparkConnectPlanner?

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from 56bbcb1 to 33ec241 Compare April 17, 2025 07:23
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from 33ec241 to 464d2ab Compare April 17, 2025 07:31
@mihailotim-db
Copy link
Contributor Author

shall we also handle Spark Connect queries in SparkConnectPlanner?

Sure, done!

@vladimirg-db
Copy link
Contributor

Just one thing we need to check - if the view is persisted with ORDER_BY_ORDINAL conf ON, what happens if we read this view with ORDER_BY_ORDINAL conf OFF? This might be an issue, since we moved the conf check to the parser.

The view must keep its confs.

@mihailotim-db
Copy link
Contributor Author

mihailotim-db commented Apr 17, 2025

Just one thing we need to check - if the view is persisted with ORDER_BY_ORDINAL conf ON, what happens if we read this view with ORDER_BY_ORDINAL conf OFF? This might be an issue, since we moved the conf check to the parser.

The view must keep its confs.

Confirmed the correct behavior in the shell. With conf off, the query should error out with MISSING_AGGREGATION and the view should execute correctly
image

@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from 464d2ab to 62729a8 Compare April 17, 2025 08:47
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/new_group_by_ordinal branch from 62729a8 to e62c9d6 Compare April 17, 2025 08:49
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 290502d Apr 18, 2025
cloud-fan pushed a commit that referenced this pull request Apr 23, 2025
### What changes were proposed in this pull request?
This is a followup to #50606 to fix the origin and context of newly created `UnresolvedOrdinal`.

### Why are the changes needed?
Origin and context of `UnresolvedOrdinal` should be the same as for the original `Literal`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50676 from mihailotim-db/mihailotim-db/fix_origin.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Apr 29, 2025
…`order` by ordinal approach

### What changes were proposed in this pull request?
This is a followup to #50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following:

1. Set correct origin in `SparkConnect` and `Dataframe` API
2. Handle remaining cases for order by ordinal in `Dataframe` API
3. Add a test suite for Dataframe use cases since we are missing those

### Why are the changes needed?
To address current issue with `UnresolvedOrdinal`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a new test suite and existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50699 from mihailotim-db/mihailotim-db/fix_ordinal_followup.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 20, 2025
…hen appending grouping to aggregate expressions

### What changes were proposed in this pull request?
Don't add `UnresolvedOrdinal` when appending grouping to aggregate expressions in Spark Connect.

### Why are the changes needed?
Change is needed to fix a regression caused by #50606 where `UnresolvedOrdinal` would end up in aggregate expression and propagate all the way to `CheckAnalysis` where it would throw an error

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a test case for the correct behavior.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #50958 from mihailotim-db/mihailotim-db/fix_unresolved_ordinal.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…`order` by ordinal approach

### What changes were proposed in this pull request?
This is a followup to apache#50606, to address remaining issues in fixed-point analyzer and parser. This PR does the following:

1. Set correct origin in `SparkConnect` and `Dataframe` API
2. Handle remaining cases for order by ordinal in `Dataframe` API
3. Add a test suite for Dataframe use cases since we are missing those

### Why are the changes needed?
To address current issue with `UnresolvedOrdinal`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a new test suite and existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50699 from mihailotim-db/mihailotim-db/fix_ordinal_followup.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
yhuang-db pushed a commit to yhuang-db/spark that referenced this pull request Jun 9, 2025
…hen appending grouping to aggregate expressions

### What changes were proposed in this pull request?
Don't add `UnresolvedOrdinal` when appending grouping to aggregate expressions in Spark Connect.

### Why are the changes needed?
Change is needed to fix a regression caused by apache#50606 where `UnresolvedOrdinal` would end up in aggregate expression and propagate all the way to `CheckAnalysis` where it would throw an error

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a test case for the correct behavior.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50958 from mihailotim-db/mihailotim-db/fix_unresolved_ordinal.

Authored-by: Mihailo Timotic <mihailo.timotic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants