Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jul 19, 2014

to avoid overflow in exp(x) if x is large.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 19, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 19, 2014

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16847/consoleFull

@mengxr
Copy link
Contributor Author

mengxr commented Jul 19, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 19, 2014

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16849/consoleFull

@rxin
Copy link
Contributor

rxin commented Jul 20, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 20, 2014

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16867/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 20, 2014

QA results for PR 1493:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16867/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better would be

prob = exp(margin)/(1 + exp(margin))

because for very small probabilities this computation is essentially doing 1 - 1. For example:

>>> from math import exp
>>> margin = -40
>>> 1 - 1 / (1 + exp(margin))
0.0
>>> exp(margin)/(1 + exp(margin))
4.248354255291589e-18
>>>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is definitely better. Could you submit a PR? We don't need a JIRA for small changes. Btw, please cache exp(margin) instead of computing it twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, pull request here! #1652

Thanks a lot! :-)

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
to avoid overflow in `exp(x)` if `x` is large.

Author: Xiangrui Meng <meng@databricks.com>

Closes apache#1493 from mengxr/py-logistic and squashes the following commits:

259e863 [Xiangrui Meng] stabilize logistic function in pyspark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants