-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15643][DOC][ML] Add breaking changes to ML migration guide #13924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Will be merged once #13378 is merged. |
|
Test build #61303 has finished for PR 13924 at commit
|
|
I just merged #13378 |
28e0412 to
6ef09a3
Compare
|
@yanboliang @jkbradley @mengxr updated. |
|
Test build #61408 has finished for PR 13924 at commit
|
docs/mllib-guide.md
Outdated
|
|
||
| **Linear algebra classes for DataFrame-based APIs** | ||
|
|
||
| Spark's linear algebra dependencies were moved to a new project, `spark-mllib-local` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "mllib-local" (no "spark-")
|
Done with review pass. Thanks for the PR! |
docs/mllib-guide.md
Outdated
|
|
||
| # convert DataFrame columns | ||
| convertedVecDF = MLUtils.convertVectorColumnsToML(vecDF) | ||
| convertedMatrxDF = MLUtils.convertMatrixColumnsToML(matrixDF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, it looks like we don't have single instance conversion methods asML / fromML in Python linalg classes (I commented on SPARK-15944.
Not sure if this is intended or we just missed them. One can do newVec = Vectors.dense(oldVec) (or vice versa for sparse) in Python directly, so if that is the expected way to do things I can add that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may have just been overlooked, but that's a good point that there is already a decent way to do the conversion. Could you please just note that way here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley Ah sorry - I mispoke. It happens to work for dense vectors because it effectively calls np.array(DenseVector), but not for sparse. Workaround is fairly ugly: mlSV = NewVectors.sparse(mllibSV.size, zip(mllibSV.indices, mllibSV.values)), or something similar.
I'd say we should have some convenience methods like in Scala/Java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created SPARK-16328 and #13997.
|
Test build #61464 has finished for PR 13924 at commit
|
|
The changes look good, so just the Python item remains. Thanks! |
|
@jkbradley updated Python example assuming #13997 will get merged - refer #13924 (comment). |
|
Test build #61545 has finished for PR 13924 at commit
|
|
Test build #61546 has finished for PR 13924 at commit
|
|
LGTM |
This PR adds the breaking changes from [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) to the migration guide. ## How was this patch tested? Built docs locally. Author: Nick Pentreath <nickp@za.ibm.com> Closes #13924 from MLnick/SPARK-15643-migration-guide. (cherry picked from commit 4a981dc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
This PR adds the breaking changes from SPARK-14810 to the migration guide.
How was this patch tested?
Built docs locally.