Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused by InferFiltersFromConstraints #19149

Closed
wants to merge 1 commit into from

Conversation

gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Sep 6, 2017

What changes were proposed in this pull request?

The optimizer rule InferFiltersFromConstraints could trigger our batch Operator Optimizations exceeds the max iteration limit (i.e., 100) so that the final plan might not be properly optimized. The rule InferFiltersFromConstraints could conflict with the other Filter/Join predicate reduction rules. Thus, we need to separate InferFiltersFromConstraints from the other rules.

This PR is to separate InferFiltersFromConstraints from the main batch Operator Optimizations .

How was this patch tested?

The existing test cases.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81469 has finished for PR 19149 at commit 9c57c8b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class NettyMemoryMetrics implements MetricSet

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81470 has finished for PR 19149 at commit 41785dd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile force-pushed the inferFilterRule branch 2 times, most recently from 0be3b67 to 6fc7140 Compare September 6, 2017 20:08
@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81475 has finished for PR 19149 at commit 0be3b67.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 6, 2017

Test build #81476 has finished for PR 19149 at commit 6fc7140.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-21652][SQL][FOLLOW-UP] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between InferFiltersFromConstraints and ConstantPropagation Sep 6, 2017
@SparkQA
Copy link

SparkQA commented Sep 7, 2017

Test build #81483 has finished for PR 19149 at commit e5501e1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 7, 2017

Test build #81495 has finished for PR 19149 at commit 1a22533.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 7, 2017

Test build #81502 has finished for PR 19149 at commit 1a22533.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 8, 2017

Hi, @gatorsmile .
According to the PR description, it's about PruneFilters. Do we need a test case because SPARK-21652 is about ConstantPropagation, not PruneFilters. Also, PR title needs to be updated like description.

@dongjoon-hyun
Copy link
Member

Except that, Isolation of InferFiltersFromConstraints looks good to me.

@yhuai
Copy link
Contributor

yhuai commented Sep 29, 2017

Can we add a test?

@adrian-ionescu
Copy link
Contributor

I think the idea is good, but maybe the fix can be refined by identifying which are the rules that conflict with InferFilterFromConstraints and only rerunning those ones? Seems like a waste right now to run rules like CombineUnions or RewriteCorrelatedScalarSubquery twice (before and after), when they're obviously not impacted by InferFilterFromConstraints.

@adrian-ionescu
Copy link
Contributor

At least part of the issue was solved by #19201.
Unless we have a repro case, let's not pursue this.

@gatorsmile
Copy link
Member Author

gatorsmile commented Oct 3, 2017

Actually, the root issue is not resolve by #19201. Will add a unit test case later.

@gatorsmile gatorsmile changed the title [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between InferFiltersFromConstraints and ConstantPropagation [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused by InferFiltersFromConstraints Dec 17, 2017
@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85015 has finished for PR 19149 at commit 9b6fe36.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 17, 2017

Test build #85018 has finished for PR 19149 at commit 9b6fe36.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

PushDownPredicate,
LimitPushDown,
ColumnPruning,
InferFiltersFromConstraints,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still have it in the big batch?

Copy link
Member Author

@gatorsmile gatorsmile Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is filtered out in the first and the third batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why not remove it from the big batch?

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85095 has finished for PR 19149 at commit 9b6fe36.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

1 similar comment
@cloud-fan
Copy link
Contributor

retest this please

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Dec 19, 2017

Test build #85108 has finished for PR 19149 at commit 9b6fe36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merged to master.

@asfgit asfgit closed this in ef10f45 Dec 19, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Jan 17, 2018
## What changes were proposed in this pull request?

Previously, PR apache#19201 fix the problem of non-converging constraints.
After that PR apache#19149 improve the loop and constraints is inferred only once.
So the problem of non-converging constraints is gone.

However, the case below will fail.

```

spark.range(5).write.saveAsTable("t")
val t = spark.read.table("t")
val left = t.withColumn("xid", $"id" + lit(1)).as("x")
val right = t.withColumnRenamed("id", "xid").as("y")
val df = left.join(right, "xid").filter("id = 3").toDF()
checkAnswer(df, Row(4, 3))

```

Because `aliasMap` replace all the aliased child. See the test case in PR for details.

This PR is to fix this bug by removing useless code for preventing non-converging constraints.
It can be also fixed with apache#20270, but this is much simpler and clean up the code.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes apache#20278 from gengliangwang/FixConstraintSimple.
asfgit pushed a commit that referenced this pull request Jan 17, 2018
## What changes were proposed in this pull request?

Previously, PR #19201 fix the problem of non-converging constraints.
After that PR #19149 improve the loop and constraints is inferred only once.
So the problem of non-converging constraints is gone.

However, the case below will fail.

```

spark.range(5).write.saveAsTable("t")
val t = spark.read.table("t")
val left = t.withColumn("xid", $"id" + lit(1)).as("x")
val right = t.withColumnRenamed("id", "xid").as("y")
val df = left.join(right, "xid").filter("id = 3").toDF()
checkAnswer(df, Row(4, 3))

```

Because `aliasMap` replace all the aliased child. See the test case in PR for details.

This PR is to fix this bug by removing useless code for preventing non-converging constraints.
It can be also fixed with #20270, but this is much simpler and clean up the code.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes #20278 from gengliangwang/FixConstraintSimple.

(cherry picked from commit 8598a98)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants