Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

In CheckAnalysis, we inline CTE relations first and then check the plan. This causes an issue if the CTE relation is not used, as the relation will be removed after inline. Then we will hit the last safeguard in CheckAnalysis:

    plan.foreachUp {
      case o if !o.resolved =>
        failAnalysis(s"unresolved operator ${o.simpleString(SQLConf.get.maxToStringFields)}")
      case _ =>
    }

This produces pretty bad error messages.

To fix this issue, this PR does an extra analysis check for CTE relations that are not used.

Why are the changes needed?

better error message.

Does this PR introduce any user-facing change?

no

How was this patch tested?

new test

@cloud-fan
Copy link
Contributor Author

cc @MaxGekk @srielau

@github-actions github-actions bot added the SQL label Sep 28, 2022
@MaxGekk
Copy link
Member

MaxGekk commented Sep 28, 2022

+1, LGTM. Merging to master.
Thank you, @cloud-fan and @amaliujia for review.

@MaxGekk MaxGekk closed this in 6adda25 Sep 28, 2022
cloud-fan pushed a commit that referenced this pull request Jan 6, 2023
### What changes were proposed in this pull request?

The commit #38029 actually intended to do the right thing: it checks CTE more aggressively even if a CTE is not used, which is ok. However, it triggers an existing issue where a subquery checks itself but in the CTE case if the subquery contains a CTE which is defined outside of the subquery, the check will fail as CTE not found (e.g. key not found).

So it is:

the commit checks more thus in the repro examples, every CTE is checked now (in the past only used CTE is checked).

One of the CTE that is checked after the commit in the example contains subquery.

The subquery contains another CTE which is defined outside of the subquery.

The subquery checks itself thus fail due to CTE not found.

This PR fixes the issue by removing the subquery self-validation on CTE case.

### Why are the changes needed?

This fixed a regression that
```
    val df = sql("""
                   |    WITH
                   |    cte1 as (SELECT 1 col1),
                   |    cte2 as (SELECT (SELECT MAX(col1) FROM cte1))
                   |    SELECT * FROM cte1
                   |""".stripMargin
    )
    checkAnswer(df, Row(1) :: Nil)
```

cannot pass analyzer anymore.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

UT

Closes #39414 from amaliujia/fix_subquery_validate.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request May 7, 2024
### What changes were proposed in this pull request?

This is a follow-up of #38029 . Some custom check rules need to see the entire query plan tree to get some context, but #38029 breaks it as it checks the query plan of dangling CTE relations recursively.

This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan.

### Why are the changes needed?

Revert the breaking change to custom check rules

### Does this PR introduce _any_ user-facing change?

No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules.

### How was this patch tested?

existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46439 from cloud-fan/check.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request May 7, 2024
backport #46439 to 3.5

### What changes were proposed in this pull request?

This is a follow-up of #38029 . Some custom check rules need to see the entire query plan tree to get some context, but #38029 breaks it as it checks the query plan of dangling CTE relations recursively.

This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan.

### Why are the changes needed?

Revert the breaking change to custom check rules

### Does this PR introduce _any_ user-facing change?

No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules.

### How was this patch tested?

existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #46442 from cloud-fan/check2.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
turboFei pushed a commit to turboFei/spark that referenced this pull request Nov 6, 2025
apache#389)

* [SPARK-48173][SQL][3.5] CheckAnalysis should see the entire query plan

backport apache#46439 to 3.5

### What changes were proposed in this pull request?

This is a follow-up of apache#38029 . Some custom check rules need to see the entire query plan tree to get some context, but apache#38029 breaks it as it checks the query plan of dangling CTE relations recursively.

This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan.

### Why are the changes needed?

Revert the breaking change to custom check rules

### Does this PR introduce _any_ user-facing change?

No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules.

### How was this patch tested?

existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#46442 from cloud-fan/check2.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

* fix

---------

Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants