Skip to content

Conversation

@jkbradley
Copy link
Member

Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set.

CC: @mengxr @dbtsai @feynmanliang

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wy not provide implementation in trait since they are the same in the concrete classes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would require overriding still for Java compatibility. This way, we at least don't copy the doc.

@feynmanliang
Copy link
Contributor

Just curious, any reason we have to mix in HasThreshold and physically keep the threshold around instead of just providing the threshold methods which wraps thresholds?

@jkbradley
Copy link
Member Author

That's what I originally intended, but as Xiangrui pointed out, if users are using the Param threshold directly, then we'd break their code. This is better for legacy, though it complicates the API a bit.

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40323 has finished for PR 8079 at commit 3d7501f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member Author

jenkins test this please

1 similar comment
@jkbradley
Copy link
Member Author

jenkins test this please

@SparkQA
Copy link

SparkQA commented Aug 10, 2015

Test build #40341 has finished for PR 8079 at commit fbf9e39.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@feynmanliang
Copy link
Contributor

Makes sense, LGTM

@jkbradley
Copy link
Member Author

Just realized I neglected PySpark. I'll add that since the 2 changes should go together.

@jkbradley
Copy link
Member Author

@feynmanliang Would you mind taking another look? Thanks!

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40359 has finished for PR 8079 at commit da7dd04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we are preferring thresholds now, and I agree that it will be more flexible for multi-classes problem. However, users may set both of them with different values mistakenly, or for a old param, users use setThreshold, but now he switched to setThresholds which may cause some confusion. How about we throw Exception when they are not agreed? Also, whenever one is set, we just make the other unset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like failing whenever they disagree. I'll do that.

I'd prefer not to modify one when the other gets set. I think that's more confusing: If a user sets both, the user may not know what they are doing and should get a failure/warning. It's also unclear what the semantics should be if the user passes both Param values together in a single ParamMap.

@dbtsai
Copy link
Member

dbtsai commented Aug 11, 2015

LGTM except few comments. Thanks.

@jkbradley
Copy link
Member Author

OK, that update should enforce equivalence when both are set.

@jkbradley
Copy link
Member Author

jenkins test this please

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40484 has finished for PR 8079 at commit 4ea4f2f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we throw IllegalArgumentException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the IllegalArgumentException is thrown when we checkThresholdConsistency not here. Should we document it that the exception will be thrown when the code is run not in the setting time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@jkbradley
Copy link
Member Author

Change of plans: I realized that, since we do not allow users to call Params.clear, I'll need to change the semantics to what you suggested: Whenever threshold or thresholds is set, it clears the other's value. (Otherwise, when fit is called with a threshold, then the resulting model cannot have its threshold changed very easily.)

@jkbradley
Copy link
Member Author

Uh oh, my proposal won't work currently, but I think it can be fixed. Explanation here: [https://issues.apache.org/jira/browse/SPARK-9847]

@SparkQA
Copy link

SparkQA commented Aug 11, 2015

Test build #40511 has finished for PR 8079 at commit 1a4b922.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of

@dbtsai
Copy link
Member

dbtsai commented Aug 11, 2015

Then the most easy way is probably just overwriting the other when one is set. :)

@jkbradley
Copy link
Member Author

That's true for the setter methods, but it doesn't work for fit() or transform() given ParamMaps. fit and transform don't use the setter methods, so we don't have a chance to unset the other parameter. I'll wait on this PR until the other patch gets merged.

@jkbradley jkbradley force-pushed the logreg-reinstate-threshold branch from 1a4b922 to 8a0c3e0 Compare August 12, 2015 18:22
@jkbradley
Copy link
Member Author

Rebased... though I guess I could have merged. The copyValues fix should fix the unit tests.

@dbtsai
Copy link
Member

dbtsai commented Aug 12, 2015

LGTM. Waiting for tests

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40651 has finished for PR 8079 at commit 8a0c3e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40655 has finished for PR 8079 at commit af3d07a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of

@jkbradley
Copy link
Member Author

@dbtsai Thanks for checking this! I'll go ahead and merge it with master and branch-1.5

asfgit pushed a commit that referenced this pull request Aug 12, 2015
Reinstated LogisticRegression.threshold Param for binary compatibility.  Param thresholds overrides threshold, if set.

CC: mengxr dbtsai feynmanliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #8079 from jkbradley/logreg-reinstate-threshold.

(cherry picked from commit 551def5)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 551def5 Aug 12, 2015
@jkbradley jkbradley deleted the logreg-reinstate-threshold branch August 12, 2015 22:29
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
Reinstated LogisticRegression.threshold Param for binary compatibility.  Param thresholds overrides threshold, if set.

CC: mengxr dbtsai feynmanliang

Author: Joseph K. Bradley <joseph@databricks.com>

Closes apache#8079 from jkbradley/logreg-reinstate-threshold.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants