Skip to content

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Aug 4, 2020

What changes were proposed in this pull request?

This PR updates the analyze method to make sure the plan can be resolved. It also fixes some miswritten optimizer tests.

Why are the changes needed?

It's error-prone if the analyze method can return an unresolved plan.

Does this PR introduce any user-facing change?

no

How was this patch tested?

test only

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127055 has finished for PR 29349 at commit 90c75fb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

class AnalysisSuite extends AnalysisTest with Matchers {
import org.apache.spark.sql.catalyst.analysis.TestRelations._

test("fail for unresolved plan") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a JIRA prefix?

Copy link
Contributor Author

@cloud-fan cloud-fan Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very general and not a regression test, maybe not needed?

EliminateSubqueryAliases(analysis.SimpleAnalyzer.execute(logicalPlan))
def analyze: LogicalPlan = {
val analyzed = analysis.SimpleAnalyzer.execute(logicalPlan)
analysis.SimpleAnalyzer.checkAnalysis(analyzed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this change can find many miswritten tests..., nice.

comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer)
}

test("join condition pushdown: deterministic and non-deterministic") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added in #14815

It's not valid anymore, as we don't allow nondeterministic join condition now.

@SparkQA
Copy link

SparkQA commented Aug 6, 2020

Test build #127142 has finished for PR 29349 at commit 5a71545.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

It's ready for review. @viirya @maropu @dongjoon-hyun

Comment on lines -1181 to -1182
comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer.analyze,
checkAnalysis = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set checkAnalysis as false so didn't find it is not resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's true that sometimes we need to construct a special unresolved plan to test some special branches of the optimizer rule. But it's not the case here.


def analyze: LogicalPlan =
EliminateSubqueryAliases(analysis.SimpleAnalyzer.execute(logicalPlan))
def analyze: LogicalPlan = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice. This seems only used in test.

val originalQuery = nonNullableRelation
.where(EqualTo(fieldA1, fieldA2))
.analyze
val originalQuery = nonNullableRelation.where(EqualTo(fieldA1, fieldA2))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we construct the resolved plan directly, because we need to reply on case insensitive and .analyze use case sensitive analyzer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go to look at this test and its original PR. fieldA1 and fieldA2 are not different in letter case, but the name in GetStructField. That is why above comment is GetStructField with different names are semantically equal.

val originalQuery = EventTimeWatermark('b, interval, testRelation)
.where('a === 5 && 'b === 10 && 'c === 5)
val originalQuery = EventTimeWatermark('b, interval, relation)
.where('a === 5 && 'b === new java.sql.Timestamp(0) && 'c === 5)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watermark column must be timestamp.

t1.join(t2, Inner, Some(nameToAttr("t1.k-1-2") === nameToAttr("t2.k-1-5")))
.hint("broadcast")
.join(t4, Inner, Some(nameToAttr("t4.v-1-10") === nameToAttr("t3.v-1-100")))
.join(t4, Inner, Some(nameToAttr("t1.k-1-2") === nameToAttr("t4.k-1-2")))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the join condition was wrong written. t3 is not accessible here.

val scalarSubquery =
testRelation
.where(ScalarSubquery(subPlan))
.where(ScalarSubquery(subPlan) === 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where takes a boolean expression

val input = LocalRelation('a.array(ArrayType(IntegerType, true)))
val plan = input.select('a.cast(ArrayType(IntegerType, false)).as("casted")).analyze
val attr = input.output.head
val plan = input.select(attr.cast(ArrayType(IntegerType, false)).as("casted"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we are testing something that can't pass analysis, just to make sure the optimizer rule is robust.

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master!

@cloud-fan cloud-fan closed this in d5682c1 Aug 7, 2020
@HyukjinKwon
Copy link
Member

+1 Looks good!

@dongjoon-hyun
Copy link
Member

+1, late LGTM.
Can we have this nice patch in branch-3.0 too, @cloud-fan ?

cloud-fan added a commit to cloud-fan/spark that referenced this pull request Aug 10, 2020
… is analyzed

This PR updates the `analyze` method to make sure the plan can be resolved. It also fixes some miswritten optimizer tests.

It's error-prone if the `analyze` method can return an unresolved plan.

no

test only

Closes apache#29349 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
dongjoon-hyun pushed a commit that referenced this pull request Aug 10, 2020
… plan is analyzed

### What changes were proposed in this pull request?

backport #29349 to 3.0.
This PR updates the `analyze` method to make sure the plan can be resolved. It also fixes some miswritten optimizer tests.

### Why are the changes needed?

It's error-prone if the `analyze` method can return an unresolved plan.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test only

Closes #29400 from cloud-fan/backport.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants