-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32528][SQL][TEST] The analyze method should make sure the plan is analyzed #29349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #127055 has finished for PR 29349 at commit
|
| class AnalysisSuite extends AnalysisTest with Matchers { | ||
| import org.apache.spark.sql.catalyst.analysis.TestRelations._ | ||
|
|
||
| test("fail for unresolved plan") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add a JIRA prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very general and not a regression test, maybe not needed?
| EliminateSubqueryAliases(analysis.SimpleAnalyzer.execute(logicalPlan)) | ||
| def analyze: LogicalPlan = { | ||
| val analyzed = analysis.SimpleAnalyzer.execute(logicalPlan) | ||
| analysis.SimpleAnalyzer.checkAnalysis(analyzed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this change can find many miswritten tests..., nice.
| comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer) | ||
| } | ||
|
|
||
| test("join condition pushdown: deterministic and non-deterministic") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added in #14815
It's not valid anymore, as we don't allow nondeterministic join condition now.
|
Test build #127142 has finished for PR 29349 at commit
|
|
It's ready for review. @viirya @maropu @dongjoon-hyun |
| comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer.analyze, | ||
| checkAnalysis = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set checkAnalysis as false so didn't find it is not resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's true that sometimes we need to construct a special unresolved plan to test some special branches of the optimizer rule. But it's not the case here.
|
|
||
| def analyze: LogicalPlan = | ||
| EliminateSubqueryAliases(analysis.SimpleAnalyzer.execute(logicalPlan)) | ||
| def analyze: LogicalPlan = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks nice. This seems only used in test.
| val originalQuery = nonNullableRelation | ||
| .where(EqualTo(fieldA1, fieldA2)) | ||
| .analyze | ||
| val originalQuery = nonNullableRelation.where(EqualTo(fieldA1, fieldA2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we construct the resolved plan directly, because we need to reply on case insensitive and .analyze use case sensitive analyzer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I go to look at this test and its original PR. fieldA1 and fieldA2 are not different in letter case, but the name in GetStructField. That is why above comment is GetStructField with different names are semantically equal.
| val originalQuery = EventTimeWatermark('b, interval, testRelation) | ||
| .where('a === 5 && 'b === 10 && 'c === 5) | ||
| val originalQuery = EventTimeWatermark('b, interval, relation) | ||
| .where('a === 5 && 'b === new java.sql.Timestamp(0) && 'c === 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watermark column must be timestamp.
| t1.join(t2, Inner, Some(nameToAttr("t1.k-1-2") === nameToAttr("t2.k-1-5"))) | ||
| .hint("broadcast") | ||
| .join(t4, Inner, Some(nameToAttr("t4.v-1-10") === nameToAttr("t3.v-1-100"))) | ||
| .join(t4, Inner, Some(nameToAttr("t1.k-1-2") === nameToAttr("t4.k-1-2"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the join condition was wrong written. t3 is not accessible here.
| val scalarSubquery = | ||
| testRelation | ||
| .where(ScalarSubquery(subPlan)) | ||
| .where(ScalarSubquery(subPlan) === 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where takes a boolean expression
| val input = LocalRelation('a.array(ArrayType(IntegerType, true))) | ||
| val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze | ||
| val attr = input.output.head | ||
| val plan = input.select(attr.cast(ArrayType(IntegerType, false)).as("casted")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we are testing something that can't pass analysis, just to make sure the optimizer rule is robust.
|
thanks for review, merging to master! |
|
+1 Looks good! |
|
+1, late LGTM. |
… is analyzed This PR updates the `analyze` method to make sure the plan can be resolved. It also fixes some miswritten optimizer tests. It's error-prone if the `analyze` method can return an unresolved plan. no test only Closes apache#29349 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… plan is analyzed ### What changes were proposed in this pull request? backport #29349 to 3.0. This PR updates the `analyze` method to make sure the plan can be resolved. It also fixes some miswritten optimizer tests. ### Why are the changes needed? It's error-prone if the `analyze` method can return an unresolved plan. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? test only Closes #29400 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR updates the
analyzemethod to make sure the plan can be resolved. It also fixes some miswritten optimizer tests.Why are the changes needed?
It's error-prone if the
analyzemethod can return an unresolved plan.Does this PR introduce any user-facing change?
no
How was this patch tested?
test only