Skip to content

Conversation

@BoleynSu
Copy link

@BoleynSu BoleynSu commented Aug 3, 2017

fix a bug in outputOrdering

What changes were proposed in this pull request?

Change case Inner to case _: InnerLike so that Cross will be handled properly.

How was this patch tested?

No unit tests are needed.

Please review http://spark.apache.org/contributing.html before opening a pull request.

fix a bug in outputOrdering
@srowen
Copy link
Member

srowen commented Aug 3, 2017

You didn't read the link above, I take it?
http://spark.apache.org/contributing.html

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@gatorsmile
Copy link
Member

Thanks for fixing this. Please follow the contribution guideline.

Also, you need to add a test case. You can follow what we did in this PR: #17339

@BoleynSu
Copy link
Author

BoleynSu commented Aug 3, 2017

A test case to make the existing code fail.
@srowen I am sorry that this pull request is not well formatted but I just want to help.

import org.apache.spark.sql.SparkSession

object Test extends App {
  val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
  import spark.sqlContext.implicits._
  case class T(i: Int)
  spark.sparkContext.parallelize(List(T(1), T(3), T(3))).toDF.createOrReplaceTempView("T")
  val in = "select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2"
  val sql = spark.sql(in)
  sql.queryExecution.executedPlan.children.map { x =>
    x.children.map { x =>
      x.children.map { x =>
        x.children.map { x =>
          x.children.map { x =>
            x.children.map { x =>
              println(x.outputOrdering)
            }
          }
        }
      }
    }
  }
}

@gatorsmile
Copy link
Member

@BoleynSu Do you want to continue the PR? or you want us to take it over?

@BoleynSu
Copy link
Author

BoleynSu commented Aug 3, 2017

@gatorsmile I am not familiar with the PR process, it is great that you can take it over. Thanks.

@gatorsmile
Copy link
Member

@BoleynSu Sure, I can do it. Will give all the credits to you. Please continue to help us report new issues or fixes. Thanks!

override def outputOrdering: Seq[SortOrder] = joinType match {
// For inner join, orders of both sides keys should be kept.
case Inner =>
case _: InnerLike =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can someone explain to me what is being fixed here? The other InnerLike variant, Cross, does not get planned using a SortMergeJoin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get a SortMergeJoin paln with Cross, e.g. select distinct a.i + 1,a.* from T a cross join T t where a.i > 1 and t.i = a.i group by a.i having a.i > 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even worse, this could cause an exception

val df = Seq((1, 1)).toDF("i", "j")
df.createOrReplaceTempView("T")
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
  sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " +
    "cross join T t2 where t2.i = t1.i").explain(true)
}

It will return the following error:

SortMergeJoinExec should not take Cross as the JoinType
java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType
	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100)
	at org.apache.spark.sql.execution.ProjectExec

We need to backport it to 2.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense

asfgit pushed a commit that referenced this pull request Aug 7, 2017
### What changes were proposed in this pull request?
author: BoleynSu
closes #18836

```Scala
val df = Seq((1, 1)).toDF("i", "j")
df.createOrReplaceTempView("T")
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
  sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " +
    "cross join T t2 where t2.i = t1.i").explain(true)
}
```
The above code could cause the following exception:
```
SortMergeJoinExec should not take Cross as the JoinType
java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType
	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100)
```

Our SortMergeJoinExec supports CROSS. We should not hit such an exception. This PR is to fix the issue.

### How was this patch tested?
Modified the two existing test cases.

Author: Xiao Li <gatorsmile@gmail.com>
Author: Boleyn Su <boleyn.su@gmail.com>

Closes #18863 from gatorsmile/pr-18836.

(cherry picked from commit bbfd6b5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in bbfd6b5 Aug 7, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
### What changes were proposed in this pull request?
author: BoleynSu
closes apache#18836

```Scala
val df = Seq((1, 1)).toDF("i", "j")
df.createOrReplaceTempView("T")
withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
  sql("select * from (select a.i from T a cross join T t where t.i = a.i) as t1 " +
    "cross join T t2 where t2.i = t1.i").explain(true)
}
```
The above code could cause the following exception:
```
SortMergeJoinExec should not take Cross as the JoinType
java.lang.IllegalArgumentException: SortMergeJoinExec should not take Cross as the JoinType
	at org.apache.spark.sql.execution.joins.SortMergeJoinExec.outputOrdering(SortMergeJoinExec.scala:100)
```

Our SortMergeJoinExec supports CROSS. We should not hit such an exception. This PR is to fix the issue.

### How was this patch tested?
Modified the two existing test cases.

Author: Xiao Li <gatorsmile@gmail.com>
Author: Boleyn Su <boleyn.su@gmail.com>

Closes apache#18863 from gatorsmile/pr-18836.

(cherry picked from commit bbfd6b5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants