Skip to content

Conversation

@ericl
Copy link
Contributor

@ericl ericl commented Jul 31, 2016

#14425 rebased for branch-2.0

…sets of partitions

This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation.

The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying.

cc rxin

Unit tests.

Author: Eric Liang <ekl@databricks.com>

Closes apache#14425 from ericl/spark-16818.

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
@rxin
Copy link
Contributor

rxin commented Jul 31, 2016

LGTM pending Jenkins.

}

test("[SPARK-16818] exchange reuse respects differences in partition pruning") {
spark.conf.set("spark.sql.exchange.reuse", true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually just realized we could've improved with by using "withSQLConf" -- it makes sure the configs get reset after the test case finishes running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I assumed the test already did so since I've seen this pattern
elsewhere. If it affects more than just the suite, I can submit a follow-up
fix.

On Sat, Jul 30, 2016, 10:59 PM Reynold Xin notifications@github.com wrote:

In
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
#14427 (comment):

  •  spark.range(100)
    
  •    .selectExpr("id", "id as b")
    
  •    .write
    
  •    .partitionBy("id")
    
  •    .parquet(tempDir)
    
  •  val df = spark.read.parquet(tempDir)
    
  •  def getPlan(df: DataFrame): SparkPlan = {
    
  •    df.queryExecution.executedPlan
    
  •  }
    
  •  assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2"))))
    
  •  assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 3"))))
    
  • }
  • }
  • test("[SPARK-16818] exchange reuse respects differences in partition pruning") {
  • spark.conf.set("spark.sql.exchange.reuse", true)

ah actually just realized we could've improved with by using "withSQLConf"
-- it makes sure the configs get reset after the test case finishes running.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/apache/spark/pull/14427/files/ef60367331fb3097040cfb0849bdc314c8d399ea#r72898521,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6Sun3Ai6lEtCs9dcjaxlAtO3Y_a2Qks5qbDnQgaJpZM4JY91Z
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i think those places were not correctly using the confs either.

@SparkQA
Copy link

SparkQA commented Jul 31, 2016

Test build #63053 has finished for PR 14427 at commit ef60367.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Aug 2, 2016

Merging in branch-2.0.

@rxin
Copy link
Contributor

rxin commented Aug 2, 2016

@ericl can you close this?

asfgit pushed a commit that referenced this pull request Aug 2, 2016
…sets of partitions

#14425 rebased for branch-2.0

Author: Eric Liang <ekl@databricks.com>

Closes #14427 from ericl/spark-16818-br-2.
@ericl
Copy link
Contributor Author

ericl commented Aug 2, 2016

Done

@ericl ericl closed this Aug 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants