[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

ericl · 2016-07-31T05:56:13Z

#14425 rebased for branch-2.0

…sets of partitions This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation. The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying. cc rxin Unit tests. Author: Eric Liang <ekl@databricks.com> Closes apache#14425 from ericl/spark-16818. Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala

rxin · 2016-07-31T05:58:12Z

LGTM pending Jenkins.

rxin · 2016-07-31T05:58:50Z

...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala

+  }
+
+  test("[SPARK-16818] exchange reuse respects differences in partition pruning") {
+    spark.conf.set("spark.sql.exchange.reuse", true)


ah actually just realized we could've improved with by using "withSQLConf" -- it makes sure the configs get reset after the test case finishes running.

Oh, I assumed the test already did so since I've seen this pattern
elsewhere. If it affects more than just the suite, I can submit a follow-up
fix.

On Sat, Jul 30, 2016, 10:59 PM Reynold Xin notifications@github.com wrote:

In
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
#14427 (comment):

spark.range(100)

.selectExpr("id", "id as b")

.write

.partitionBy("id")

.parquet(tempDir)

val df = spark.read.parquet(tempDir)

def getPlan(df: DataFrame): SparkPlan = {

df.queryExecution.executedPlan

}

assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2"))))

assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 3"))))

}

}

test("[SPARK-16818] exchange reuse respects differences in partition pruning") {

spark.conf.set("spark.sql.exchange.reuse", true)

ah actually just realized we could've improved with by using "withSQLConf"
-- it makes sure the configs get reset after the test case finishes running.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/apache/spark/pull/14427/files/ef60367331fb3097040cfb0849bdc314c8d399ea#r72898521,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6Sun3Ai6lEtCs9dcjaxlAtO3Y_a2Qks5qbDnQgaJpZM4JY91Z
.

yea i think those places were not correctly using the confs either.

SparkQA · 2016-07-31T07:37:52Z

Test build #63053 has finished for PR 14427 at commit ef60367.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-08-02T02:45:47Z

Merging in branch-2.0.

rxin · 2016-08-02T02:45:58Z

@ericl can you close this?

…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <ekl@databricks.com> Closes #14427 from ericl/spark-16818-br-2.

ericl · 2016-08-02T02:47:43Z

Done

ericl mentioned this pull request Jul 31, 2016

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14425

Closed

rxin reviewed Jul 31, 2016
View reviewed changes

asfgit pushed a commit that referenced this pull request Aug 2, 2016

[SPARK-16818] Exchange reuse incorrectly reuses scans over different …

5fbf5f9

…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <ekl@databricks.com> Closes #14427 from ericl/spark-16818-br-2.

ericl closed this Aug 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

Uh oh!

ericl commented Jul 31, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

rxin Jul 31, 2016

Uh oh!

ericl Jul 31, 2016

Uh oh!

rxin Jul 31, 2016

Uh oh!

SparkQA commented Jul 31, 2016

Uh oh!

rxin commented Aug 2, 2016

Uh oh!

rxin commented Aug 2, 2016

Uh oh!

ericl commented Aug 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427

Uh oh!

Conversation

ericl commented Jul 31, 2016

Uh oh!

rxin commented Jul 31, 2016

Uh oh!

rxin Jul 31, 2016

Choose a reason for hiding this comment

Uh oh!

ericl Jul 31, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Jul 31, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 31, 2016

Uh oh!

rxin commented Aug 2, 2016

Uh oh!

rxin commented Aug 2, 2016

Uh oh!

ericl commented Aug 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants