-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9408] [PySpark] [MLlib] Refactor linalg.py to /linalg #7746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyspark/sql/types.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies If I don't add str(...), I got the following exception in ml/clustering.py (because it returns a unicode string):
File "/Users/meng/src/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 636, in fromJson
m = __import__(pyModule, globals(), locals(), [pyClass])
TypeError: ("Item in ``from list'' not a string", <function _parse_datatype_json_string at 0x10436a7d0>, (u'{"type":"struct","fields":[{"name":"features","type":{"type":"udt","class":"org.apache.spark.mllib.linalg.VectorUDT","pyClass":"pyspark.mllib.linalg.VectorUDT","sqlType":{"type":"struct","fields":[{"name":"type","type":"byte","nullable":false,"metadata":{}},{"name":"size","type":"integer","nullable":true,"metadata":{}},{"name":"indices","type":{"type":"array","elementType":"integer","containsNull":false},"nullable":true,"metadata":{}},{"name":"values","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}}]}},"nullable":true,"metadata":{}},{"name":"prediction","type":"integer","nullable":true,"metadata":{}}]}',))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's strange, is it possible that you pyspark.zip is outdate? (remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried again and saw the same error without str(...). This is the same issue as reported in https://bugs.python.org/issue21720.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O, I see, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment for it?
|
Test build #38817 has finished for PR 7746 at commit
|
|
Test build #38904 has finished for PR 7746 at commit
|
|
test this please |
|
Test build #38918 has finished for PR 7746 at commit
|
|
LGTM |
|
Great, but how were you able to figure this out? |
|
I googled the error message. |
|
Test build #39055 has finished for PR 7746 at commit
|
|
test this please |
|
Test build #39059 has finished for PR 7746 at commit
|
|
test this please |
|
Test build #39085 has finished for PR 7746 at commit
|
|
test this please |
|
Test build #39105 has finished for PR 7746 at commit
|
|
test this please |
|
All failed tests are unrelated to this PR. 1135551 passed Jenkins and the the only change in the latest commit is to add a comment to |
|
Merged into master. |
|
Test build #39124 has finished for PR 7746 at commit
|
mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.
mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark. (cherry picked from commit 34dcf10) Signed-off-by: Xiangrui Meng <meng@databricks.com>
This is based on @MechCoder 's PR #7731. Hopefully it could pass tests. @MechCoder I tried to make minimal changes. If this passes Jenkins, we can merge this one first and then try to move
__init__.pytolocal.pyin a separate PR.Closes #7731