[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

mengxr · 2014-07-19T08:53:17Z

to avoid overflow in exp(x) if x is large.

mengxr · 2014-07-19T09:16:59Z

Jenkins, test this please.

SparkQA · 2014-07-19T09:23:05Z

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16847/consoleFull

mengxr · 2014-07-19T17:46:10Z

Jenkins, retest this please.

SparkQA · 2014-07-19T17:48:23Z

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16849/consoleFull

rxin · 2014-07-20T08:15:24Z

Jenkins, retest this please.

SparkQA · 2014-07-20T08:18:04Z

QA tests have started for PR 1493. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16867/consoleFull

SparkQA · 2014-07-20T09:57:20Z

QA results for PR 1493:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16867/consoleFull

naftaliharris · 2014-07-29T18:20:59Z

python/pyspark/mllib/classification.py

Better would be

prob = exp(margin)/(1 + exp(margin))

because for very small probabilities this computation is essentially doing 1 - 1. For example:

>>> from math import exp >>> margin = -40 >>> 1 - 1 / (1 + exp(margin)) 0.0 >>> exp(margin)/(1 + exp(margin)) 4.248354255291589e-18 >>>

Yes, that is definitely better. Could you submit a PR? We don't need a JIRA for small changes. Btw, please cache exp(margin) instead of computing it twice.

Sure, pull request here! #1652

Thanks a lot! :-)

to avoid overflow in `exp(x)` if `x` is large. Author: Xiangrui Meng <meng@databricks.com> Closes apache#1493 from mengxr/py-logistic and squashes the following commits: 259e863 [Xiangrui Meng] stabilize logistic function in pyspark

stabilize logistic function in pyspark

259e863

asfgit closed this in b86db51 Jul 21, 2014

mateiz mentioned this pull request Jul 29, 2014

Check if margin > 0, not if prob > 0.5 #1057

Closed

naftaliharris reviewed Jul 29, 2014
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

Uh oh!

mengxr commented Jul 19, 2014

Uh oh!

mengxr commented Jul 19, 2014

Uh oh!

SparkQA commented Jul 19, 2014

Uh oh!

mengxr commented Jul 19, 2014

Uh oh!

SparkQA commented Jul 19, 2014

Uh oh!

rxin commented Jul 20, 2014

Uh oh!

SparkQA commented Jul 20, 2014

Uh oh!

SparkQA commented Jul 20, 2014

Uh oh!

naftaliharris Jul 29, 2014

Uh oh!

mengxr Jul 30, 2014

Uh oh!

naftaliharris Jul 30, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

[SPARK-2552][MLLIB] stabilize logistic function in pyspark #1493

Uh oh!

Conversation

mengxr commented Jul 19, 2014

Uh oh!

mengxr commented Jul 19, 2014

Uh oh!

SparkQA commented Jul 19, 2014

Uh oh!

mengxr commented Jul 19, 2014

Uh oh!

SparkQA commented Jul 19, 2014

Uh oh!

rxin commented Jul 20, 2014

Uh oh!

SparkQA commented Jul 20, 2014

Uh oh!

SparkQA commented Jul 20, 2014

Uh oh!

naftaliharris Jul 29, 2014

Choose a reason for hiding this comment

Uh oh!

mengxr Jul 30, 2014

Choose a reason for hiding this comment

Uh oh!

naftaliharris Jul 30, 2014

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants