-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16818] Exchange reuse incorrectly reuses scans over different sets of partitions #14427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…sets of partitions This fixes a bug wherethe file scan operator does not take into account partition pruning in its implementation of `sameResult()`. As a result, executions may be incorrect on self-joins over the same base file relation. The patch here is minimal, but we should reconsider relying on `metadata` for implementing sameResult() in the future, as string representations may not be uniquely identifying. cc rxin Unit tests. Author: Eric Liang <ekl@databricks.com> Closes apache#14425 from ericl/spark-16818. Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
|
LGTM pending Jenkins. |
| } | ||
|
|
||
| test("[SPARK-16818] exchange reuse respects differences in partition pruning") { | ||
| spark.conf.set("spark.sql.exchange.reuse", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah actually just realized we could've improved with by using "withSQLConf" -- it makes sure the configs get reset after the test case finishes running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I assumed the test already did so since I've seen this pattern
elsewhere. If it affects more than just the suite, I can submit a follow-up
fix.
On Sat, Jul 30, 2016, 10:59 PM Reynold Xin notifications@github.com wrote:
In
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala
#14427 (comment):
spark.range(100).selectExpr("id", "id as b").write.partitionBy("id").parquet(tempDir)val df = spark.read.parquet(tempDir)def getPlan(df: DataFrame): SparkPlan = {df.queryExecution.executedPlan}assert(getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 2"))))assert(!getPlan(df.where("id = 2")).sameResult(getPlan(df.where("id = 3"))))- }
- }
- test("[SPARK-16818] exchange reuse respects differences in partition pruning") {
- spark.conf.set("spark.sql.exchange.reuse", true)
ah actually just realized we could've improved with by using "withSQLConf"
-- it makes sure the configs get reset after the test case finishes running.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/apache/spark/pull/14427/files/ef60367331fb3097040cfb0849bdc314c8d399ea#r72898521,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAA6Sun3Ai6lEtCs9dcjaxlAtO3Y_a2Qks5qbDnQgaJpZM4JY91Z
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i think those places were not correctly using the confs either.
|
Test build #63053 has finished for PR 14427 at commit
|
|
Merging in branch-2.0. |
|
@ericl can you close this? |
…sets of partitions #14425 rebased for branch-2.0 Author: Eric Liang <ekl@databricks.com> Closes #14427 from ericl/spark-16818-br-2.
|
Done |
#14425 rebased for branch-2.0