Skip to content

Commit fae2503

Browse files
cloud-fandongjoon-hyun
authored andcommitted
[SPARK-31577][SQL] Fix case-sensitivity and forward name conflict problems when check name conflicts of CTE relations
### What changes were proposed in this pull request? This is a followup of #28318, to make the code more readable, by adding some comments to explain the trick and simplify the code to use a boolean flag instead of 2 string sets. This PR also fixes various problems: 1. the name check should consider case sensitivity 2. forward name conflicts like `with t as (with t2 as ...), t2 as ...` is not a real conflict and we shouldn't fail. ### Why are the changes needed? correct the behavior ### Does this PR introduce any user-facing change? yes, fix the fore-mentioned behaviors. ### How was this patch tested? new tests Closes #28371 from cloud-fan/followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2f4f38b) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent d272482 commit fae2503

File tree

5 files changed

+182
-24
lines changed

5 files changed

+182
-24
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala

Lines changed: 30 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,11 @@
1717

1818
package org.apache.spark.sql.catalyst.analysis
1919

20+
import scala.collection.mutable
21+
2022
import org.apache.spark.sql.AnalysisException
2123
import org.apache.spark.sql.catalyst.expressions.SubqueryExpression
22-
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, With}
24+
import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias, With}
2325
import org.apache.spark.sql.catalyst.rules.Rule
2426
import org.apache.spark.sql.internal.SQLConf
2527
import org.apache.spark.sql.internal.SQLConf.{LEGACY_CTE_PRECEDENCE_POLICY, LegacyBehaviorPolicy}
@@ -41,34 +43,44 @@ object CTESubstitution extends Rule[LogicalPlan] {
4143
}
4244

4345
/**
44-
* Check the plan to be traversed has naming conflicts in nested CTE or not, traverse through
45-
* child, innerChildren and subquery expressions for the current plan.
46+
* Spark 3.0 changes the CTE relations resolution, and inner relations take precedence. This is
47+
* correct but we need to warn users about this behavior change under EXCEPTION mode, when we see
48+
* CTE relations with conflicting names.
49+
*
50+
* Note that, before Spark 3.0 the parser didn't support CTE in the FROM clause. For example,
51+
* `WITH ... SELECT * FROM (WITH ... SELECT ...)` was not supported. We should not fail for this
52+
* case, as Spark versions before 3.0 can't run it anyway. The parameter `startOfQuery` is used
53+
* to indicate where we can define CTE relations before Spark 3.0, and we should only check
54+
* name conflicts when `startOfQuery` is true.
4655
*/
4756
private def assertNoNameConflictsInCTE(
4857
plan: LogicalPlan,
49-
outerCTERelationNames: Set[String] = Set.empty,
50-
namesInSubqueries: Set[String] = Set.empty): Unit = {
58+
outerCTERelationNames: Seq[String] = Nil,
59+
startOfQuery: Boolean = true): Unit = {
60+
val resolver = SQLConf.get.resolver
5161
plan match {
52-
case w @ With(child, relations) =>
53-
val newNames = relations.map {
54-
case (cteName, _) =>
55-
if (outerCTERelationNames.contains(cteName)) {
56-
throw new AnalysisException(s"Name $cteName is ambiguous in nested CTE. " +
62+
case With(child, relations) =>
63+
val newNames = mutable.ArrayBuffer.empty[String]
64+
newNames ++= outerCTERelationNames
65+
relations.foreach {
66+
case (name, relation) =>
67+
if (startOfQuery && outerCTERelationNames.exists(resolver(_, name))) {
68+
throw new AnalysisException(s"Name $name is ambiguous in nested CTE. " +
5769
s"Please set ${LEGACY_CTE_PRECEDENCE_POLICY.key} to CORRECTED so that name " +
5870
"defined in inner CTE takes precedence. If set it to LEGACY, outer CTE " +
5971
"definitions will take precedence. See more details in SPARK-28228.")
60-
} else {
61-
cteName
6272
}
63-
}.toSet ++ namesInSubqueries
64-
assertNoNameConflictsInCTE(child, outerCTERelationNames, newNames)
65-
w.innerChildren.foreach(assertNoNameConflictsInCTE(_, newNames, newNames))
73+
// CTE relation is defined as `SubqueryAlias`. Here we skip it and check the child
74+
// directly, so that `startOfQuery` is set correctly.
75+
assertNoNameConflictsInCTE(relation.child, newNames)
76+
newNames += name
77+
}
78+
assertNoNameConflictsInCTE(child, newNames, startOfQuery = false)
6679

6780
case other =>
68-
other.subqueries.foreach(
69-
assertNoNameConflictsInCTE(_, namesInSubqueries, namesInSubqueries))
81+
other.subqueries.foreach(assertNoNameConflictsInCTE(_, outerCTERelationNames))
7082
other.children.foreach(
71-
assertNoNameConflictsInCTE(_, outerCTERelationNames, namesInSubqueries))
83+
assertNoNameConflictsInCTE(_, outerCTERelationNames, startOfQuery = false))
7284
}
7385
}
7486

sql/core/src/test/resources/sql-tests/inputs/cte-nested.sql

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,3 +111,28 @@ WHERE c IN (
111111
WITH t(c) AS (SELECT 2)
112112
SELECT * FROM t
113113
);
114+
115+
-- forward name conflict is not a real conflict
116+
WITH
117+
t AS (
118+
WITH t2 AS (SELECT 1)
119+
SELECT * FROM t2
120+
),
121+
t2 AS (SELECT 2)
122+
SELECT * FROM t;
123+
124+
-- case insensitive name conflicts: in other CTE relations
125+
WITH
126+
abc AS (SELECT 1),
127+
t AS (
128+
WITH aBc AS (SELECT 2)
129+
SELECT * FROM aBC
130+
)
131+
SELECT * FROM t;
132+
133+
-- case insensitive name conflicts: in subquery expressions
134+
WITH abc AS (SELECT 1)
135+
SELECT (
136+
WITH aBc AS (SELECT 2)
137+
SELECT * FROM aBC
138+
);

sql/core/src/test/resources/sql-tests/results/cte-legacy.sql.out

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 13
2+
-- Number of queries: 16
33

44

55
-- !query
@@ -179,3 +179,43 @@ WHERE c IN (
179179
struct<c:int>
180180
-- !query output
181181
1
182+
183+
184+
-- !query
185+
WITH
186+
t AS (
187+
WITH t2 AS (SELECT 1)
188+
SELECT * FROM t2
189+
),
190+
t2 AS (SELECT 2)
191+
SELECT * FROM t
192+
-- !query schema
193+
struct<1:int>
194+
-- !query output
195+
1
196+
197+
198+
-- !query
199+
WITH
200+
abc AS (SELECT 1),
201+
t AS (
202+
WITH aBc AS (SELECT 2)
203+
SELECT * FROM aBC
204+
)
205+
SELECT * FROM t
206+
-- !query schema
207+
struct<1:int>
208+
-- !query output
209+
1
210+
211+
212+
-- !query
213+
WITH abc AS (SELECT 1)
214+
SELECT (
215+
WITH aBc AS (SELECT 2)
216+
SELECT * FROM aBC
217+
)
218+
-- !query schema
219+
struct<scalarsubquery():int>
220+
-- !query output
221+
1

sql/core/src/test/resources/sql-tests/results/cte-nested.sql.out

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 13
2+
-- Number of queries: 16
33

44

55
-- !query
@@ -64,10 +64,9 @@ WITH
6464
)
6565
SELECT * FROM t2
6666
-- !query schema
67-
struct<>
67+
struct<scalarsubquery():int>
6868
-- !query output
69-
org.apache.spark.sql.AnalysisException
70-
Name t is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;
69+
2
7170

7271

7372
-- !query
@@ -186,3 +185,45 @@ struct<>
186185
-- !query output
187186
org.apache.spark.sql.AnalysisException
188187
Name t is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;
188+
189+
190+
-- !query
191+
WITH
192+
t AS (
193+
WITH t2 AS (SELECT 1)
194+
SELECT * FROM t2
195+
),
196+
t2 AS (SELECT 2)
197+
SELECT * FROM t
198+
-- !query schema
199+
struct<1:int>
200+
-- !query output
201+
1
202+
203+
204+
-- !query
205+
WITH
206+
abc AS (SELECT 1),
207+
t AS (
208+
WITH aBc AS (SELECT 2)
209+
SELECT * FROM aBC
210+
)
211+
SELECT * FROM t
212+
-- !query schema
213+
struct<>
214+
-- !query output
215+
org.apache.spark.sql.AnalysisException
216+
Name aBc is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;
217+
218+
219+
-- !query
220+
WITH abc AS (SELECT 1)
221+
SELECT (
222+
WITH aBc AS (SELECT 2)
223+
SELECT * FROM aBC
224+
)
225+
-- !query schema
226+
struct<>
227+
-- !query output
228+
org.apache.spark.sql.AnalysisException
229+
Name aBc is ambiguous in nested CTE. Please set spark.sql.legacy.ctePrecedencePolicy to CORRECTED so that name defined in inner CTE takes precedence. If set it to LEGACY, outer CTE definitions will take precedence. See more details in SPARK-28228.;

sql/core/src/test/resources/sql-tests/results/cte-nonlegacy.sql.out

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
-- Automatically generated by SQLQueryTestSuite
2-
-- Number of queries: 13
2+
-- Number of queries: 16
33

44

55
-- !query
@@ -179,3 +179,43 @@ WHERE c IN (
179179
struct<c:int>
180180
-- !query output
181181

182+
183+
184+
-- !query
185+
WITH
186+
t AS (
187+
WITH t2 AS (SELECT 1)
188+
SELECT * FROM t2
189+
),
190+
t2 AS (SELECT 2)
191+
SELECT * FROM t
192+
-- !query schema
193+
struct<1:int>
194+
-- !query output
195+
1
196+
197+
198+
-- !query
199+
WITH
200+
abc AS (SELECT 1),
201+
t AS (
202+
WITH aBc AS (SELECT 2)
203+
SELECT * FROM aBC
204+
)
205+
SELECT * FROM t
206+
-- !query schema
207+
struct<2:int>
208+
-- !query output
209+
2
210+
211+
212+
-- !query
213+
WITH abc AS (SELECT 1)
214+
SELECT (
215+
WITH aBc AS (SELECT 2)
216+
SELECT * FROM aBC
217+
)
218+
-- !query schema
219+
struct<scalarsubquery():int>
220+
-- !query output
221+
2

0 commit comments

Comments
 (0)