-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19336][ML][Pyspark]: LinearSVC Python API #16694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @hhbyyh Thanks! |
|
Test build #71945 has finished for PR 16694 at commit
|
|
Test build #71946 has finished for PR 16694 at commit
|
|
Test build #71948 has finished for PR 16694 at commit
|
jkbradley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Small comments only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to change this. Most other algorithms use "set" not "sets"
python/pyspark/ml/classification.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried generating the docs? Check out other examples to see how to do links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will fix it. Thanks!
python/pyspark/ml/classification.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename bdf -> df
python/pyspark/ml/classification.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd simplify this example since it is going to be part of the documentation:
- Remove "weight"
- Just use dense vectors to make the doc clearer. Sparse vectors are tested elsewhere for Python and should be tested in Scala for LinearSVC (for which I'll make a JIRA).
- Make the feature vectors be length 2 or 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will modify it.
python/pyspark/ml/classification.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to test sparse vectors here
python/pyspark/ml/classification.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this in a unit test (tests.py), not here in the doc tests (though I also don't think you really need this test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow the LogisticRegression to create this test. I will remove it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, there are some not great examples to follow. It'd be nice to clean those out sometime...
|
Test build #72080 has finished for PR 16694 at commit
|
|
LGTM, thank you! |
## What changes were proposed in this pull request? Add Python API for the newly added LinearSVC algorithm. ## How was this patch tested? Add new doc string test. Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16694 from wangmiao1981/ser.
What changes were proposed in this pull request?
Add Python API for the newly added LinearSVC algorithm.
How was this patch tested?
Add new doc string test.