-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5987] [MLlib] Save/load for GaussianMixtureModels #4986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc: @mengxr @jkbradley |
|
Test build #28481 has started for PR 4986 at commit
|
|
Test build #28481 has finished for PR 4986 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it should be private, but then it should be private in all other files as well.
|
Test FAILed. |
|
jenkins, test this please |
|
@mengxr I am not sure if we should flatten it or not, would it be worth if the number of clusters is large? Also I think it would be better if we deal with MatrixUDT after this PR is done with. wdyt? |
|
Test build #28584 has started for PR 4986 at commit
|
|
The number of clusters won't be very large. Flattening an |
|
Test FAILed. |
|
jenkins, test this please |
|
Test build #28588 has started for PR 4986 at commit
|
|
Test build #28588 has finished for PR 4986 at commit
|
|
Test PASSed. |
|
@mengxr I thing I have addressed your comments. sigmas is now stored as an Array of Doubles, Do you have any more comments? Thanks! |
|
Test build #28607 has started for PR 4986 at commit
|
|
Test build #28607 has finished for PR 4986 at commit
|
|
Test PASSed. |
|
@mengxr I rebased over master and used MatrixUDT. Please review! :) |
|
Test build #28937 has started for PR 4986 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not efficient because it may trigger multiple passes to the parquet file. Let's call collect() directly.
|
@mengxr fixed ! |
|
Test build #29101 has started for PR 4986 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also update the Java example.
|
Test build #29101 has finished for PR 4986 at commit
|
|
Test PASSed. |
|
@mengxr I have addressed your comments. Please have a look ! |
|
Test build #29148 has started for PR 4986 at commit
|
|
Test build #29148 has finished for PR 4986 at commit
|
|
Test PASSed. |
|
LGTM. Merged into master. Thanks!! |
|
@mengxr thanks for the merge! For supporting this in PySpark, we would need support for MatrixUDT, which would need support for sparse matrices right? I could not find any existing JIRA related to sparse matrix support, if you are able to please link me to it. |
Should be self explanatory.