Skip to content

Conversation

@naftaliharris
Copy link
Contributor

This avoids basically doing 1 - 1, for example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>

This avoids basically doing 1 - 1, for example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jul 30, 2014

Y'know, there's a similar issue in LogisticGradient.scala, in lines like:

math.log(1 + math.exp(margin))

For -40, this gives 0.0, when really it should be about math.exp(-40) = 4.248354255291589e-18, since log(1+x) ~= x for very small x. This one can be fixed up with

math.log1p(math.exp(margin))

I'll have a look for other instances beyond the 4 I see and open a JIRA? I could mention this PR too to bring it under one umbrella.

@srowen
Copy link
Member

srowen commented Jul 30, 2014

See also https://issues.apache.org/jira/browse/SPARK-2748 and #1659 . This could be considered part of SPARK-2748.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

Jenkins, add to whitelist.

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA tests have started for PR 1652. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

asfgit pushed a commit that referenced this pull request Jul 30, 2014
…Math.exp, Math.log

In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.

Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.

While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.

Also note the related PR for Python: #1652

Author: Sean Owen <srowen@gmail.com>

Closes #1659 from srowen/SPARK-2748 and squashes the following commits:

c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
@SparkQA
Copy link

SparkQA commented Jul 30, 2014

QA results for PR 1652:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17453/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 30, 2014

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in e3d85b7 Jul 30, 2014
@naftaliharris
Copy link
Contributor Author

Awesome, thank you! :-)

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
…Math.exp, Math.log

In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists.

Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method.

While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible.

Also note the related PR for Python: apache#1652

Author: Sean Owen <srowen@gmail.com>

Closes apache#1659 from srowen/SPARK-2748 and squashes the following commits:

c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This avoids basically doing 1 - 1, for example:

```python
>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin) / (1 + exp(margin))
4.248354255291589e-18
>>>
```

Author: Naftali Harris <naftaliharris@gmail.com>

Closes apache#1652 from naftaliharris/patch-2 and squashes the following commits:

0d55a9f [Naftali Harris] Avoid numerical instability
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants