Skip to content

Conversation

@nrchandan
Copy link

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@davies
Copy link
Contributor

davies commented Aug 6, 2014

lgtm

@rxin
Copy link
Contributor

rxin commented Aug 6, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1787. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17968/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1787:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17968/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a documented Scala bug? If so - can we link to it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have the Scala bug raised today. Will update it once done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K - thanks! if we cast link to an issue or something, would be great.

@nrchandan
Copy link
Author

Added Scala bug ID. Fixed the coding convention. Ready to retest. Cc @davies @rxin

@pwendell
Copy link
Contributor

pwendell commented Aug 6, 2014

Jenkins, test this please.

1 similar comment
@pwendell
Copy link
Contributor

pwendell commented Aug 6, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1787. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18084/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1787:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18084/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 7, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA tests have started for PR 1787. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18088/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 7, 2014

QA results for PR 1787:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18088/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 8, 2014

This seems to be legitimately breaking a test.

@nrchandan
Copy link
Author

Sorry couldn't look into this yesterday. I plan to fix this today.

@nrchandan
Copy link
Author

Scala's NumericRange and the range generated using the 'to' method should produce similar results but there are rounding differences. I need to spend more time investigating the cause. Meanwhile, the Scala bug is documented at https://issues.scala-lang.org/browse/SI-8782.

@srowen
Copy link
Member

srowen commented Aug 8, 2014

Yeah, the problem is now this, essentially:

scala> (1.0 to (2.0, 1.0/10.0)).toArray
res1: Array[Double] = Array(1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7000000000000002, 1.8, 1.9, 2.0)

The problem is that naively adding the increment makes for increasing rounding errors as the range goes on. At 100 slices:

scala> (1.0 to (2.0, 1.0/100.0)).takeRight(10)
res4: scala.collection.immutable.IndexedSeq[Double] = Vector(1.9100000000000008, 1.9200000000000008, 1.9300000000000008, 1.9400000000000008, 1.9500000000000008, 1.9600000000000009, 1.9700000000000009, 1.9800000000000009, 1.9900000000000009, 2.000000000000001)

Here's an attempt to write a version that is both more accurate and forces the end of the range to be correct:

def range(min: Double, max: Double, steps: Int): IndexedSeq[Double] = {
  val span = max - min
  Range.Int(0, steps, 1).map(s => min + (s * span) / steps) :+ max
}

A little bit inelegant, and still not perfect, but better and fixes the original problem still:

scala> range(1.0, 2.0, 10)
res5: IndexedSeq[Double] = Vector(1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0)

scala> range(1.0, 2.0, 100).takeRight(10)
res7: IndexedSeq[Double] = Vector(1.9100000000000001, 1.92, 1.9300000000000002, 1.94, 1.95, 1.96, 1.97, 1.98, 1.99, 2.0)

What do you guys think? @nrchandan

@nrchandan
Copy link
Author

@srowen Your approach should work. I will give it a try tomorrow.

@nrchandan
Copy link
Author

@pwendell @srowen This version passes all test cases. Also added a new test case (the one specified in JIRA #SPARK-2862)

@pwendell
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA tests have started for PR 1787. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18462/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA results for PR 1787:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18462/consoleFull

@nrchandan nrchandan changed the title [SPARK-2862] Use shorthand range notation to avoid Scala bug [SPARK-2862] histogram method fails on some choices of bucketCount Aug 14, 2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save some FLOPs by computing the stepSize first:

val stepSize = (max - min) / steps
Range.Int(0, steps, 1).map(s => min + s * stepSize) :+ max

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the order of operations would be important to get the result, so wrote it that way on purpose, but, it's actually fine this way too. I think the win here is not repeatedly adding 1/steps alone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both approaches should give reasonable results because the last one is skipped. The edge case is that both min and max are very very small values (customRange1 computes stepSize first):

scala> customRange(1e-322, 2e-322, 10)
res15: IndexedSeq[Double] = Vector(1.0E-322, 1.1E-322, 1.2E-322, 1.3E-322, 1.4E-322, 1.5E-322, 1.58E-322, 1.7E-322, 1.8E-322, 1.9E-322, 2.0E-322)

scala> customRange1(1e-322, 2e-322, 10)
res14: IndexedSeq[Double] = Vector(1.0E-322, 1.1E-322, 1.2E-322, 1.3E-322, 1.4E-322, 1.5E-322, 1.58E-322, 1.7E-322, 1.8E-322, 1.9E-322, 2.0E-322)

scala> customRange(1e-323, 2e-323, 10)
res12: IndexedSeq[Double] = Vector(1.0E-323, 1.0E-323, 1.0E-323, 1.5E-323, 1.5E-323, 1.5E-323, 1.5E-323, 1.5E-323, 2.0E-323, 2.0E-323, 2.0E-323)

scala> customRange1(1e-323, 2e-323, 10)
res13: IndexedSeq[Double] = Vector(1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 1.0E-323, 2.0E-323)

I think we can ignore this edge case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

customRange1 actually fails the original test case. Continuing with customRange code by @srowen

scala> customRange1(1.0, 2.0, 10)
res2: IndexedSeq[Double] = Vector(1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7000000000000002, 1.8, 1.9, 2.0)

scala> customRange(1.0, 2.0, 10)
res0: IndexedSeq[Double] = Vector(1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually both are correct in this case. This is why we should tolerate small numeric difference in tests. @srowen 's version looks good.

[SPARK-2862] Fix a typo, add a test case, modify a test case
@mengxr
Copy link
Contributor

mengxr commented Aug 18, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have started for PR 1787 at commit a76bbf6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 18, 2014

QA tests have finished for PR 1787 at commit a76bbf6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Aug 18, 2014

LGTM. Merged into master and branch-1.1. Thanks!!

@asfgit asfgit closed this in f45efbb Aug 18, 2014
asfgit pushed a commit that referenced this pull request Aug 18, 2014
Author: Chandan Kumar <chandan.kumar@imaginea.com>

Closes #1787 from nrchandan/spark-2862 and squashes the following commits:

a76bbf6 [Chandan Kumar] [SPARK-2862] Fix for a broken test case and add new test cases
4211eea [Chandan Kumar] [SPARK-2862] Add Scala bug id
13854f1 [Chandan Kumar] [SPARK-2862] Use shorthand range notation to avoid Scala bug

(cherry picked from commit f45efbb)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Author: Chandan Kumar <chandan.kumar@imaginea.com>

Closes apache#1787 from nrchandan/spark-2862 and squashes the following commits:

a76bbf6 [Chandan Kumar] [SPARK-2862] Fix for a broken test case and add new test cases
4211eea [Chandan Kumar] [SPARK-2862] Add Scala bug id
13854f1 [Chandan Kumar] [SPARK-2862] Use shorthand range notation to avoid Scala bug
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants