-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9789] [ML] Added logreg threshold param back #8079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wy not provide implementation in trait since they are the same in the concrete classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would require overriding still for Java compatibility. This way, we at least don't copy the doc.
|
Just curious, any reason we have to mix in |
|
That's what I originally intended, but as Xiangrui pointed out, if users are using the Param threshold directly, then we'd break their code. This is better for legacy, though it complicates the API a bit. |
|
Test build #40323 has finished for PR 8079 at commit
|
|
jenkins test this please |
1 similar comment
|
jenkins test this please |
|
Test build #40341 has finished for PR 8079 at commit
|
|
Makes sense, LGTM |
|
Just realized I neglected PySpark. I'll add that since the 2 changes should go together. |
|
@feynmanliang Would you mind taking another look? Thanks! |
|
Test build #40359 has finished for PR 8079 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that we are preferring thresholds now, and I agree that it will be more flexible for multi-classes problem. However, users may set both of them with different values mistakenly, or for a old param, users use setThreshold, but now he switched to setThresholds which may cause some confusion. How about we throw Exception when they are not agreed? Also, whenever one is set, we just make the other unset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like failing whenever they disagree. I'll do that.
I'd prefer not to modify one when the other gets set. I think that's more confusing: If a user sets both, the user may not know what they are doing and should get a failure/warning. It's also unclear what the semantics should be if the user passes both Param values together in a single ParamMap.
|
LGTM except few comments. Thanks. |
|
OK, that update should enforce equivalence when both are set. |
|
jenkins test this please |
|
Test build #40484 has finished for PR 8079 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we throw IllegalArgumentException?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, the IllegalArgumentException is thrown when we checkThresholdConsistency not here. Should we document it that the exception will be thrown when the code is run not in the setting time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
|
Change of plans: I realized that, since we do not allow users to call Params.clear, I'll need to change the semantics to what you suggested: Whenever threshold or thresholds is set, it clears the other's value. (Otherwise, when fit is called with a threshold, then the resulting model cannot have its threshold changed very easily.) |
|
Uh oh, my proposal won't work currently, but I think it can be fixed. Explanation here: [https://issues.apache.org/jira/browse/SPARK-9847] |
|
Test build #40511 has finished for PR 8079 at commit
|
|
Then the most easy way is probably just overwriting the other when one is set. :) |
|
That's true for the setter methods, but it doesn't work for fit() or transform() given ParamMaps. fit and transform don't use the setter methods, so we don't have a chance to unset the other parameter. I'll wait on this PR until the other patch gets merged. |
1a4b922 to
8a0c3e0
Compare
|
Rebased... though I guess I could have merged. The copyValues fix should fix the unit tests. |
|
LGTM. Waiting for tests |
|
Test build #40651 has finished for PR 8079 at commit
|
|
Test build #40655 has finished for PR 8079 at commit
|
|
@dbtsai Thanks for checking this! I'll go ahead and merge it with master and branch-1.5 |
Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set. CC: mengxr dbtsai feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8079 from jkbradley/logreg-reinstate-threshold. (cherry picked from commit 551def5) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set. CC: mengxr dbtsai feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#8079 from jkbradley/logreg-reinstate-threshold.
Reinstated LogisticRegression.threshold Param for binary compatibility. Param thresholds overrides threshold, if set.
CC: @mengxr @dbtsai @feynmanliang