-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-40595][SQL] Improve error message for unused CTE relations #38029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Author
MaxGekk
approved these changes
Sep 28, 2022
amaliujia
approved these changes
Sep 28, 2022
Member
|
+1, LGTM. Merging to master. |
cloud-fan
pushed a commit
that referenced
this pull request
Jan 6, 2023
### What changes were proposed in this pull request? The commit #38029 actually intended to do the right thing: it checks CTE more aggressively even if a CTE is not used, which is ok. However, it triggers an existing issue where a subquery checks itself but in the CTE case if the subquery contains a CTE which is defined outside of the subquery, the check will fail as CTE not found (e.g. key not found). So it is: the commit checks more thus in the repro examples, every CTE is checked now (in the past only used CTE is checked). One of the CTE that is checked after the commit in the example contains subquery. The subquery contains another CTE which is defined outside of the subquery. The subquery checks itself thus fail due to CTE not found. This PR fixes the issue by removing the subquery self-validation on CTE case. ### Why are the changes needed? This fixed a regression that ``` val df = sql(""" | WITH | cte1 as (SELECT 1 col1), | cte2 as (SELECT (SELECT MAX(col1) FROM cte1)) | SELECT * FROM cte1 |""".stripMargin ) checkAnswer(df, Row(1) :: Nil) ``` cannot pass analyzer anymore. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes #39414 from amaliujia/fix_subquery_validate. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun
pushed a commit
that referenced
this pull request
May 7, 2024
### What changes were proposed in this pull request? This is a follow-up of #38029 . Some custom check rules need to see the entire query plan tree to get some context, but #38029 breaks it as it checks the query plan of dangling CTE relations recursively. This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan. ### Why are the changes needed? Revert the breaking change to custom check rules ### Does this PR introduce _any_ user-facing change? No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules. ### How was this patch tested? existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46439 from cloud-fan/check. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun
pushed a commit
that referenced
this pull request
May 7, 2024
backport #46439 to 3.5 ### What changes were proposed in this pull request? This is a follow-up of #38029 . Some custom check rules need to see the entire query plan tree to get some context, but #38029 breaks it as it checks the query plan of dangling CTE relations recursively. This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan. ### Why are the changes needed? Revert the breaking change to custom check rules ### Does this PR introduce _any_ user-facing change? No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules. ### How was this patch tested? existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes #46442 from cloud-fan/check2. Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
turboFei
pushed a commit
to turboFei/spark
that referenced
this pull request
Nov 6, 2025
apache#389) * [SPARK-48173][SQL][3.5] CheckAnalysis should see the entire query plan backport apache#46439 to 3.5 ### What changes were proposed in this pull request? This is a follow-up of apache#38029 . Some custom check rules need to see the entire query plan tree to get some context, but apache#38029 breaks it as it checks the query plan of dangling CTE relations recursively. This PR fixes it by putting back the dangling CTE relation in the main query plan and then check the main query plan. ### Why are the changes needed? Revert the breaking change to custom check rules ### Does this PR introduce _any_ user-facing change? No for most users. This restores the behavior of Spark 3.3 and earlier for custom check rules. ### How was this patch tested? existing tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46442 from cloud-fan/check2. Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> * fix --------- Signed-off-by: Dongjoon Hyun <dhyun@apple.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
In
CheckAnalysis, we inline CTE relations first and then check the plan. This causes an issue if the CTE relation is not used, as the relation will be removed after inline. Then we will hit the last safeguard inCheckAnalysis:This produces pretty bad error messages.
To fix this issue, this PR does an extra analysis check for CTE relations that are not used.
Why are the changes needed?
better error message.
Does this PR introduce any user-facing change?
no
How was this patch tested?
new test