Skip to content

Conversation

@da-liii
Copy link
Contributor

@da-liii da-liii commented Aug 29, 2018

What changes were proposed in this pull request?

For SPARK-5775 read array from partitioned_parquet_with_key_and_complextypes:

scala2.12

scala> (1 to 10).toString
res4: String = Range 1 to 10

scala2.11

scala> (1 to 10).toString
res2: String = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

And

  def prepareAnswer(answer: Seq[Row], isSorted: Boolean): Seq[Row] = {
    val converted: Seq[Row] = answer.map(prepareRow)
    if (!isSorted) converted.sortBy(_.toString()) else converted
  }

sortBy _.toString is not a good idea.

Other failures are caused by

Array(Int.box(1)).toSeq == Array(Double.box(1.0)).toSeq

It is false in 2.12.2 + and is true in 2.11.x , 2.12.0, 2.12.1

How was this patch tested?

This is a patch on a specific unit test.

@da-liii da-liii changed the title [WIP][SPARK-25044][SQL] Plan mismatch errors in Hive tests in 2.12 [WIP][SPARK-25256][SQL] Plan mismatch errors in Hive tests in 2.12 Aug 29, 2018
@srowen
Copy link
Member

srowen commented Aug 29, 2018

Hah yeah I hope this is all there is to it!
Are you able to run the Scala 2.12 build locally to see if this resolves it? The PR builder here will check 2.11

@da-liii
Copy link
Contributor Author

da-liii commented Aug 29, 2018

@srowen Still work in progress. I'm running the Scala 2.12 build locally. Before working on this one, I summited another PR #22260 to fix the failing compiling.

There are three unit test failures in total.

The first one is fixed.

And this is the cause of the second one:

Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala> import collection.mutable.WrappedArray
import collection.mutable.WrappedArray

scala> new WrappedArray.ofRef(Array(1, 1).map(_.asInstanceOf[AnyRef]))
res0: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = WrappedArray(1, 1)

scala> new WrappedArray.ofRef(Array(1.0, 1.0).map(_.asInstanceOf[AnyRef]))
res1: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = WrappedArray(1.0, 1.0)

scala> res0 == res1
res2: Boolean = true
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112).
Type in expressions for evaluation. Or try :help.

scala> new WrappedArray.ofRef(Array(1, 1).map(_.asInstanceOf[AnyRef]))
res18: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = WrappedArray(1, 1)

scala> new WrappedArray.ofRef(Array(1.0, 1.0).map(_.asInstanceOf[AnyRef]))
res19: scala.collection.mutable.WrappedArray.ofRef[AnyRef] = WrappedArray(1.0, 1.0)

scala> res18.getClass
res20: Class[_ <: scala.collection.mutable.WrappedArray.ofRef[AnyRef]] = class scala.collection.mutable.WrappedArray$ofRef

scala> res19.getClass
res21: Class[_ <: scala.collection.mutable.WrappedArray.ofRef[AnyRef]] = class scala.collection.mutable.WrappedArray$ofRef

scala> res18 == res19
res22: Boolean = false

A behavior change introduced by scala/scala#5551 and effectively this merged one scala/scala#5607 .

I will continue my work later.

@da-liii
Copy link
Contributor Author

da-liii commented Aug 29, 2018

To simplify:

Array(Int.box(1)).toSeq == Array(Double.box(1.0)).toSeq

is false in 2.12.2 + and is true in 2.11.x , 2.12.0, 2.12.1

@srowen
Copy link
Member

srowen commented Aug 29, 2018

Thanks a lot for your help, and the lead on what's happening here. You're not saying it's a scala issue, right? some behavior changed but it's arguably a fix?

The ideal outcome in Spark is some implementation that works in both 2.11 and 2.12 with the same semantics. Let me know how far you get or if I can help test. This might be the last line of issues for 2.12 support.

@da-liii da-liii changed the title [WIP][SPARK-25256][SQL] Plan mismatch errors in Hive tests in 2.12 [SPARK-25256][SQL] Plan mismatch errors in Hive tests in 2.12 Aug 30, 2018
@da-liii
Copy link
Contributor Author

da-liii commented Aug 30, 2018

@srowen please review, and this PR should be rebased on #22260 and then tested.

@da-liii da-liii closed this Aug 30, 2018
@da-liii da-liii reopened this Aug 30, 2018
@da-liii da-liii changed the title [SPARK-25256][SQL] Plan mismatch errors in Hive tests in 2.12 [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive tests in 2.12 Aug 30, 2018
@da-liii
Copy link
Contributor Author

da-liii commented Aug 30, 2018

The fix works for both 2.11 and 2.12.

And I reported a bug: scala/bug#11123

@maropu
Copy link
Member

maropu commented Aug 30, 2018

in 2.12 -> in Scala 2.12 in the title?

@da-liii da-liii changed the title [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive tests in 2.12 [SPARK-25256][SQL][TEST] Plan mismatch errors in Hive tests in Scala 2.12 Aug 30, 2018
@da-liii
Copy link
Contributor Author

da-liii commented Aug 30, 2018

Comment resolved, please review. @srowen @maropu

@SparkQA
Copy link

SparkQA commented Aug 30, 2018

Test build #4304 has finished for PR 22264 at commit d7f2e37.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 30, 2018

Test build #4309 has finished for PR 22264 at commit d7f2e37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Aug 30, 2018

@sadhen this looks plausible. The fact that it only touches tests makes it low-risk to merge. My only larger concern is: is there a behavior change that will impact user code, that we are merely working around in tests here?

I'm OK with getting to a passing 2.12 build that we can release as a 'beta' with some known issues, so this is probably fine to merge as it won't affect 2.11 core functionality.

@da-liii
Copy link
Contributor Author

da-liii commented Aug 31, 2018

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
      /_/

Using Scala version 2.12.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val rowOfDoubleWrappedArray = spark.sql("select array(cast (1.0 as double) , cast(1.0 as double))").collect.head
rowOfDoubleWrappedArray: org.apache.spark.sql.Row = [WrappedArray(1.0, 1.0)]

scala> val rowOfIntWrappedArray = spark.sql("select array(1, 1)").collect.head
rowOfIntWrappedArray: org.apache.spark.sql.Row = [WrappedArray(1, 1)]

scala> rowOfDoubleWrappedArray == rowOfIntWrappedArray
res5: Boolean = false

And for Scala 2.11, the result is true.

@da-liii
Copy link
Contributor Author

da-liii commented Aug 31, 2018

scala/bug#11123 had been added into https://github.com/scala/bug/milestone/93 .

I will spare some time working on it.

@srowen
Copy link
Member

srowen commented Aug 31, 2018

Yeah, OK. I think this is acceptable as a potential "known issue" for Scala 2.12 support, which we can accept for a beta release of 2.12 support with Spark 2.4. I think I'd merge this and then see where we are.

@asfgit asfgit closed this in f29c2b5 Aug 31, 2018
@da-liii
Copy link
Contributor Author

da-liii commented Aug 31, 2018

@srowen A PR for this "bug" is proposed: scala/scala#7156

Hopefully, Scala 2.12.7 will fix it.

@da-liii da-liii deleted the SPARK25256 branch August 31, 2018 08:09
fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Aug 31, 2018
…2.12

## What changes were proposed in this pull request?

### For `SPARK-5775 read array from partitioned_parquet_with_key_and_complextypes`:

scala2.12
```
scala> (1 to 10).toString
res4: String = Range 1 to 10
```

scala2.11
```
scala> (1 to 10).toString
res2: String = Range(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
```
And

```
  def prepareAnswer(answer: Seq[Row], isSorted: Boolean): Seq[Row] = {
    val converted: Seq[Row] = answer.map(prepareRow)
    if (!isSorted) converted.sortBy(_.toString()) else converted
  }
```
sortBy `_.toString` is not a good idea.

### Other failures are caused by

```
Array(Int.box(1)).toSeq == Array(Double.box(1.0)).toSeq
```

It is false in 2.12.2 + and is true in 2.11.x , 2.12.0, 2.12.1

## How was this patch tested?

This is a  patch on a specific unit test.

Closes apache#22264 from sadhen/SPARK25256.

Authored-by: 忍冬 <rendong@wacai.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants