-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5986][MLLib] Add save/load for k-means #4951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #28393 has started for PR 4951 at commit
|
|
Test build #28394 has started for PR 4951 at commit
|
|
Test build #28393 has finished for PR 4951 at commit
|
|
Test PASSed. |
|
Test build #28394 has finished for PR 4951 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We don't need
wrapper.serializefor vectors. - We need to store cluster indices. If the centers are saved to more than one partitions, we cannot easily load them back in the original order.
|
Test build #28408 has started for PR 4951 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using Int here to represent indexes? I think there is no need to use Long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Int is sufficient. This class should be private. Please also check the scope of other classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this private. The class name could be changed to case class Cluster(id: Int, center: Vector). IndexedPoint is too general.
|
Test build #28408 has finished for PR 4951 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be specific about the imports. You only need Vector.
|
Test build #28453 has started for PR 4951 at commit
|
|
Test build #28453 has finished for PR 4951 at commit
|
|
Test PASSed. |
|
LGTM. Merged into master. Thanks! |
This PR adds save/load for K-means as described in SPARK-5986. Python version will be added in another PR.