[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR modifies the optimizer rule
OptimizeOneRowRelationSubqueryto always collapse projects and inline non-volatile expressions.Why are the changes needed?
SPARK-39699 made
CollpaseProjectsmore conservative. This has impacted correlated subqueries that Spark used to be able to support. For example, Spark used to be able to execute this correlated subquery:But after SPARK-39699, it will throw an exception
Unexpected operator Join Innerbecause the projects inside the subquery can no longer be collapsed. We should always inline expressions if possible to support a broader range of correlated subqueries and avoid adding expensive domain joins.Does this PR introduce any user-facing change?
Yes. It will allow Spark to execute more types of correlated subqueries.
How was this patch tested?
Unit test.