Update SortMergeJoinExec.scala #18836

BoleynSu · 2017-08-03T17:57:19Z

fix a bug in outputOrdering

What changes were proposed in this pull request?

Change case Inner to case _: InnerLike so that Cross will be handled properly.

How was this patch tested?

No unit tests are needed.

Please review http://spark.apache.org/contributing.html before opening a pull request.

fix a bug in outputOrdering

srowen · 2017-08-03T17:58:33Z

You didn't read the link above, I take it?
http://spark.apache.org/contributing.html

AmplabJenkins · 2017-08-03T18:02:08Z

Can one of the admins verify this patch?

gatorsmile · 2017-08-03T18:15:03Z

Thanks for fixing this. Please follow the contribution guideline.

Also, you need to add a test case. You can follow what we did in this PR: #17339

BoleynSu · 2017-08-03T18:15:15Z

A test case to make the existing code fail.
@srowen I am sorry that this pull request is not well formatted but I just want to help.

import org.apache.spark.sql.SparkSession

object Test extends App {
  val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
  import spark.sqlContext.implicits._
  case class T(i: Int)
  spark.sparkContext.parallelize(List(T(1), T(3), T(3))).toDF.createOrReplaceTempView("T")
  val in = "select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2"
  val sql = spark.sql(in)
  sql.queryExecution.executedPlan.children.map { x =>
    x.children.map { x =>
      x.children.map { x =>
        x.children.map { x =>
          x.children.map { x =>
            x.children.map { x =>
              println(x.outputOrdering)
            }
          }
        }
      }
    }
  }
}

gatorsmile · 2017-08-03T18:16:45Z

@BoleynSu Do you want to continue the PR? or you want us to take it over?

BoleynSu · 2017-08-03T18:19:45Z

@gatorsmile I am not familiar with the PR process, it is great that you can take it over. Thanks.

gatorsmile · 2017-08-03T18:21:26Z

@BoleynSu Sure, I can do it. Will give all the credits to you. Please continue to help us report new issues or fixes. Thanks!

hvanhovell · 2017-08-03T19:33:06Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala

  override def outputOrdering: Seq[SortOrder] = joinType match {
    // For inner join, orders of both sides keys should be kept.
-    case Inner =>
+    case _: InnerLike =>


Can someone explain to me what is being fixed here? The other InnerLike variant, Cross, does not get planned using a SortMergeJoin.

I think we can get a SortMergeJoin paln with Cross, e.g. select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2.

Even worse, this could cause an exception

val df = Seq((1, 1)).toDF("i", "j") df.createOrReplaceTempView("T") withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " + "cross join T t2 where t2.i = t1.i").explain(true) }

It will return the following error:

SortMergeJoinExec should not take Cross as the JoinType java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100) at org.apache.spark.sql.execution.ProjectExec

We need to backport it to 2.2

Yeah that makes sense

### What changes were proposed in this pull request? author: BoleynSu closes #18836 ```Scala val df = Seq((1, 1)).toDF("i", "j") df.createOrReplaceTempView("T") withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " + "cross join T t2 where t2.i = t1.i").explain(true) } ``` The above code could cause the following exception: ``` SortMergeJoinExec should not take Cross as the JoinType java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100) ``` Our SortMergeJoinExec supports CROSS. We should not hit such an exception. This PR is to fix the issue. ### How was this patch tested? Modified the two existing test cases. Author: Xiao Li <gatorsmile@gmail.com> Author: Boleyn Su <boleyn.su@gmail.com> Closes #18863 from gatorsmile/pr-18836. (cherry picked from commit bbfd6b5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? author: BoleynSu closes apache#18836 ```Scala val df = Seq((1, 1)).toDF("i", "j") df.createOrReplaceTempView("T") withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") { sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " + "cross join T t2 where t2.i = t1.i").explain(true) } ``` The above code could cause the following exception: ``` SortMergeJoinExec should not take Cross as the JoinType java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100) ``` Our SortMergeJoinExec supports CROSS. We should not hit such an exception. This PR is to fix the issue. ### How was this patch tested? Modified the two existing test cases. Author: Xiao Li <gatorsmile@gmail.com> Author: Boleyn Su <boleyn.su@gmail.com> Closes apache#18863 from gatorsmile/pr-18836. (cherry picked from commit bbfd6b5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

Update SortMergeJoinExec.scala

54ba6be

fix a bug in outputOrdering

hvanhovell reviewed Aug 3, 2017

View reviewed changes

gatorsmile mentioned this pull request Aug 7, 2017

[SPARK-21647] [SQL] Fix SortMergeJoin when using CROSS #18863

Closed

asfgit closed this in bbfd6b5 Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update SortMergeJoinExec.scala #18836

Update SortMergeJoinExec.scala #18836

Uh oh!

BoleynSu commented Aug 3, 2017

Uh oh!

srowen commented Aug 3, 2017

Uh oh!

AmplabJenkins commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

BoleynSu commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

BoleynSu commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

hvanhovell Aug 3, 2017

Uh oh!

BoleynSu Aug 4, 2017

Uh oh!

gatorsmile Aug 7, 2017

Uh oh!

hvanhovell Aug 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Update SortMergeJoinExec.scala #18836

Update SortMergeJoinExec.scala #18836

Uh oh!

Conversation

BoleynSu commented Aug 3, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Aug 3, 2017

Uh oh!

AmplabJenkins commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

BoleynSu commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

BoleynSu commented Aug 3, 2017

Uh oh!

gatorsmile commented Aug 3, 2017

Uh oh!

hvanhovell Aug 3, 2017

Choose a reason for hiding this comment

Uh oh!

BoleynSu Aug 4, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Aug 7, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell Aug 7, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants