-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30009][CORE][SQL] Support different floating-point Ordering for Scala 2.12 / 2.13 #26654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #114364 has finished for PR 26654 at commit
|
|
Test build #114367 has finished for PR 26654 at commit
|
|
It seems reasonable to me. |
|
We can make a Jenkins job later; it's not nearly passing yet. I am testing locally on 2.13. There is at least one large change that I want to hold until after branch-3.0 is cut, and at least one we can't resolve yet. These changes are relatively easy ones with little or no impact on 2.12. |
|
Shall we remove |
|
Yep, sounds like there's no objection to this approach. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
Thank you, @srowen and @gengliangwang !
|
How about the code gen results? Are they consistent? |
|
@gatorsmile good question. This change won't affect Scala 2.12 at all. How it affects 2.13 is still unknown; it isn't compilable just yet, so can't run the tests. The compile changes are large enough that I'm tackling them iteratively, but, at the end we may revise the 2.13 changes to ensure correctness, etc. This is a good example. |
|
In codegen we use |
|
Yeah, that method does what TotalOrdering does. We could adopt it instead of using Scala 2.13 TotalOrdering; it won't matter. But the two sites that changed here used Scala 2.12's Ordering.Double, which does not work like that method. For the SorterSuite? won't matter. For numerics.scala, it could matter. We can either leave this in place to maintain 2.12 behavior, or if we think that the current 2.12 behavior in numerics.scala is a bit wrong and inconsistent, we can instead standardize on the Utils method everywhere and remove the need for OrderingUtil. (There will be other reasons we need the parallel build tree though.) WDYT? |
|
Some extra context. Here's the key difference between the two 2.13 orderings: The weird thing is Scala 2.12's implementation of
Actually, do we need So, back to the point: the question is whether this difference matters in the current build. The difference arises in a few places:
I don't know if these two occurrences' ordering actually matters, as on a glance, it looks like these objects do not use them for ordering. I don't know the details enough to figure whether there is a subtle issue there. I didn't immediately see a test which exercises something like sorting NaNs with non-NaN values, and am not sure how to write it to check code gen vs interpreted. But that would tell us one way or the other whether there is an issue. In any event, it seems like:
|
| * It functions like Ordering.Double.TotalOrdering in Scala 2.13, which matches java.lang.Double | ||
| * rather than Scala 2.12's Ordering.Double in handling of NaN. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, doesn't Scala 2.12's Ordering.Double.compare delegates to java.lang.Double.compare?
why this matches java.lang.Double, but not Scala 2.12's Ordering.Double?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, for compare. The superclass Ordering then defines operations like lt, lteq, etc in terms of compare. But 2.12 Ordering.Double overrides them to use operations like <, <=. As far as I can tell it presents a consistent total ordering via compare already (as it appears java.lang.Double.compare does), but its comparisons aren't consistent with how NaNs behave in lt, etc. Then again... perhaps neither is Java. Its Comparators.natural() would work consistently, but the comparisons don't match the Java operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the question is, when to the Orderings get used by Spark? it's not immediately clear if they are, even. If they're used for sorting, all is well... I think, as sorting would use compare and all of the impls in question behave the same way.
If it's used to evaluate how doubles compare somewhere, then, should those answers be consistent with the sort ordering? or Java/Scala operators? I'd presume the former, but, that's not how it works right now. And the choice to use TotalOrdering changes that in 2.13.
- If we think the current behavior is correct, and matters, then 2.12 is OK and then we use
IeeeOrderingin 2.13 to be conservative - If the current behavior doesn't matter, it doesn't matter what we choose.
TotalOrderingfeels more logical. - If the current behavior is wrong, we can patch 2.12 to work like 2.13's
TotalOrdering. Then the 2.13 choice is already correct,TotalOrdering.
I actually suspect it doesn't matter, doesn't get used.
You mean current master? I compared Utils.nanSafeCompareDoubles with DoubleExactNumeric.compare: |
|
Yeah, that's to be expected; the |
|
The ANSI/SQL standard seems to define the order, Oracle and Spark currently follow this, too. So, in terms of the SQL behaviours, |
|
@maropu yep that's good, I don't expect that changes. What about the equivalent of |
|
Yea, that should be consisntent. Anyway, the fix in this pr looks pretty reasonable to me. |
| */ | ||
| private[spark] object OrderingUtil { | ||
|
|
||
| def compareDouble(x: Double, y: Double): Int = Ordering.Double.TotalOrdering.compare(x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after another thought, shall we just use java.lang.Double.compare? This is how Scala 2.13 implements Ordering.Double.TotalOrdering:
https://github.com/scala/scala/blob/d0bd8241bb60bebc2bf0cbd2e9b01212fd1de93b/src/library/scala/math/Ordering.scala#L452
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also how Scala 2.12 implements Ordering.Double.compare:
https://github.com/scala/scala/blob/2.12.x/src/library/scala/math/Ordering.scala#L295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we don't need to have branches between Scala 2.12 and 2.13. We can also use java.lang.Double.compare to replace Utils.nanSafeCompareDoubles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that sounds fine and simpler. If compare is the only functionality being used, this collapses to something simpler. I can make a follow-up. (We'll need the separate source trees for other reasons though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #26761
…afeCompare{Doubles,Floats} and use java.lang.{Double,Float}.compare directly
### What changes were proposed in this pull request?
Follow up on #26654 (comment)
Instead of OrderingUtil or Utils.nanSafeCompare{Doubles,Floats}, just use java.lang.{Double,Float}.compare directly. All work identically w.r.t. NaN when used to `compare`.
### Why are the changes needed?
Simplification of the previous change, which existed to support Scala 2.13 migration.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests
Closes #26761 from srowen/SPARK-30009.2.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…r Scala 2.12 / 2.13 ### What changes were proposed in this pull request? Make separate source trees for Scala 2.12/2.13 in order to accommodate mutually-incompatible support for Ordering of double, float. Note: This isn't the last change that will need a split source tree for 2.13. But this particular change could go several ways: - (Split source tree) - Inline the Scala 2.12 implementation - Reflection For this change alone any are possible, and splitting the source tree is a bit overkill. But if it will be necessary for other JIRAs (see umbrella SPARK-25075), then it might be the easiest way to implement this. ### Why are the changes needed? Scala 2.13 split Ordering.Double into Ordering.Double.TotalOrdering and Ordering.Double.IeeeOrdering. Neither can be used in a single build that supports 2.12 and 2.13. TotalOrdering works like java.lang.Double.compare. IeeeOrdering works like Scala 2.12 Ordering.Double. They differ in how NaN is handled - compares always above other values? or always compares as 'false'? In theory they have different uses: TotalOrdering is important if floating-point values are sorted. IeeeOrdering behaves like 2.12 and JVM comparison operators. I chose TotalOrdering as I think we care more about stable sorting, and because elsewhere we rely on java.lang comparisons. It is also possible to support with two methods. ### Does this PR introduce any user-facing change? Pending tests, will see if it obviously affects any sort order. We need to see if it changes NaN sort order. ### How was this patch tested? Existing tests so far. Closes apache#26654 from srowen/SPARK-30009. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…afeCompare{Doubles,Floats} and use java.lang.{Double,Float}.compare directly
### What changes were proposed in this pull request?
Follow up on apache#26654 (comment)
Instead of OrderingUtil or Utils.nanSafeCompare{Doubles,Floats}, just use java.lang.{Double,Float}.compare directly. All work identically w.r.t. NaN when used to `compare`.
### Why are the changes needed?
Simplification of the previous change, which existed to support Scala 2.13 migration.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests
Closes apache#26761 from srowen/SPARK-30009.2.
Authored-by: Sean Owen <sean.owen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Make separate source trees for Scala 2.12/2.13 in order to accommodate mutually-incompatible support for Ordering of double, float.
Note: This isn't the last change that will need a split source tree for 2.13. But this particular change could go several ways:
For this change alone any are possible, and splitting the source tree is a bit overkill. But if it will be necessary for other JIRAs (see umbrella SPARK-25075), then it might be the easiest way to implement this.
Why are the changes needed?
Scala 2.13 split Ordering.Double into Ordering.Double.TotalOrdering and Ordering.Double.IeeeOrdering. Neither can be used in a single build that supports 2.12 and 2.13.
TotalOrdering works like java.lang.Double.compare. IeeeOrdering works like Scala 2.12 Ordering.Double. They differ in how NaN is handled - compares always above other values? or always compares as 'false'? In theory they have different uses: TotalOrdering is important if floating-point values are sorted. IeeeOrdering behaves like 2.12 and JVM comparison operators.
I chose TotalOrdering as I think we care more about stable sorting, and because elsewhere we rely on java.lang comparisons. It is also possible to support with two methods.
Does this PR introduce any user-facing change?
Pending tests, will see if it obviously affects any sort order. We need to see if it changes NaN sort order.
How was this patch tested?
Existing tests so far.