[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260

allisonwang-db · 2022-10-14T16:06:54Z

What changes were proposed in this pull request?

This PR modifies the optimizer rule OptimizeOneRowRelationSubquery to always collapse projects and inline non-volatile expressions.

Why are the changes needed?

SPARK-39699 made CollpaseProjects more conservative. This has impacted correlated subqueries that Spark used to be able to support. For example, Spark used to be able to execute this correlated subquery:

SELECT (
  SELECT array_sort(a, (i, j) -> rank[i] - rank[j]) AS sorted
  FROM (SELECT MAP('a', 1, 'b', 2) rank)
) FROM t1

But after SPARK-39699, it will throw an exception Unexpected operator Join Inner because the projects inside the subquery can no longer be collapsed. We should always inline expressions if possible to support a broader range of correlated subqueries and avoid adding expensive domain joins.

Does this PR introduce any user-facing change?

Yes. It will allow Spark to execute more types of correlated subqueries.

How was this patch tested?

Unit test.

allisonwang-db · 2022-10-20T04:00:35Z

cc @cloud-fan

cloud-fan · 2022-10-20T04:30:16Z

The test failure seems unrelated, can you retrigger?

LuciferYang · 2022-10-21T02:32:03Z

@allisonwang-db You can rebase the code. I pin the Java version to 8u345 at #38311 for workaround and GA can pass without waiting for #38317

cloud-fan · 2022-10-24T03:12:42Z

thanks, merging to master!

…nSubquery ### What changes were proposed in this pull request? This PR modifies the optimizer rule `OptimizeOneRowRelationSubquery` to always collapse projects and inline non-volatile expressions. ### Why are the changes needed? SPARK-39699 made `CollpaseProjects` more conservative. This has impacted correlated subqueries that Spark used to be able to support. For example, Spark used to be able to execute this correlated subquery: ```sql SELECT ( SELECT array_sort(a, (i, j) -> rank[i] - rank[j]) AS sorted FROM (SELECT MAP('a', 1, 'b', 2) rank) ) FROM t1 ``` But after SPARK-39699, it will throw an exception `Unexpected operator Join Inner` because the projects inside the subquery can no longer be collapsed. We should always inline expressions if possible to support a broader range of correlated subqueries and avoid adding expensive domain joins. ### Does this PR introduce _any_ user-facing change? Yes. It will allow Spark to execute more types of correlated subqueries. ### How was this patch tested? Unit test. Closes apache#38260 from allisonwang-db/spark-40800-inline-expr-subquery. Authored-by: allisonwang-db <allison.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Oct 14, 2022

allisonwang-db force-pushed the spark-40800-inline-expr-subquery branch from 3f0b1a3 to 2e641c9 Compare October 19, 2022 18:40

cloud-fan approved these changes Oct 20, 2022

View reviewed changes

allisonwang-db mentioned this pull request Oct 20, 2022

[SPARK-40851][INFRA ][SQL][TESTS] Make GA run successfully with the latest Java 8/11/17 #38317

Closed

allisonwang-db force-pushed the spark-40800-inline-expr-subquery branch from 2e641c9 to 9a55c95 Compare October 21, 2022 04:03

allisonwang-db added 2 commits October 21, 2022 14:31

inline exprs

33fdef9

rebase and fix tests

8122e23

allisonwang-db force-pushed the spark-40800-inline-expr-subquery branch from 9a55c95 to 8122e23 Compare October 21, 2022 22:03

cloud-fan closed this in 58490da Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260

[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260

Uh oh!

allisonwang-db commented Oct 14, 2022

Uh oh!

allisonwang-db commented Oct 20, 2022

Uh oh!

cloud-fan commented Oct 20, 2022

Uh oh!

LuciferYang commented Oct 21, 2022 •

edited

Loading

Uh oh!

cloud-fan commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260

[SPARK-40800][SQL] Always inline expressions in OptimizeOneRowRelationSubquery #38260

Uh oh!

Conversation

allisonwang-db commented Oct 14, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

allisonwang-db commented Oct 20, 2022

Uh oh!

cloud-fan commented Oct 20, 2022

Uh oh!

LuciferYang commented Oct 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Oct 24, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LuciferYang commented Oct 21, 2022 •

edited

Loading