[SPARK-9408] [PySpark] [MLlib] Refactor linalg.py to /linalg #7746

mengxr · 2015-07-29T07:03:28Z

This is based on @MechCoder 's PR #7731. Hopefully it could pass tests. @MechCoder I tried to make minimal changes. If this passes Jenkins, we can merge this one first and then try to move __init__.py to local.py in a separate PR.

Closes #7731

mengxr · 2015-07-29T07:06:16Z

python/pyspark/sql/types.py

@davies If I don't add str(...), I got the following exception in ml/clustering.py (because it returns a unicode string):

File "/Users/meng/src/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 636, in fromJson m = __import__(pyModule, globals(), locals(), [pyClass]) TypeError: ("Item in ``from list'' not a string", <function _parse_datatype_json_string at 0x10436a7d0>, (u'{"type":"struct","fields":[{"name":"features","type":{"type":"udt","class":"org.apache.spark.mllib.linalg.VectorUDT","pyClass":"pyspark.mllib.linalg.VectorUDT","sqlType":{"type":"struct","fields":[{"name":"type","type":"byte","nullable":false,"metadata":{}},{"name":"size","type":"integer","nullable":true,"metadata":{}},{"name":"indices","type":{"type":"array","elementType":"integer","containsNull":false},"nullable":true,"metadata":{}},{"name":"values","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}}]}},"nullable":true,"metadata":{}},{"name":"prediction","type":"integer","nullable":true,"metadata":{}}]}',))

That's strange, is it possible that you pyspark.zip is outdate? (remove it)

Tried again and saw the same error without str(...). This is the same issue as reported in https://bugs.python.org/issue21720.

O, I see, LGTM

Could you add a comment for it?

SparkQA · 2015-07-29T09:20:02Z

Test build #38817 has finished for PR 7746 at commit c48cae0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-07-29T23:17:21Z

Test build #38904 has finished for PR 7746 at commit 28b543f.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-07-30T00:04:47Z

test this please

SparkQA · 2015-07-30T02:15:18Z

Test build #38918 has finished for PR 7746 at commit 1135551.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2015-07-30T06:06:05Z

LGTM

MechCoder · 2015-07-30T09:48:05Z

Great, but how were you able to figure this out?

mengxr · 2015-07-30T14:48:08Z

I googled the error message.

SparkQA · 2015-07-30T15:26:37Z

Test build #39055 has finished for PR 7746 at commit 0e05a3b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-07-30T15:31:31Z

test this please

SparkQA · 2015-07-30T17:37:27Z

Test build #39059 has finished for PR 7746 at commit 0e05a3b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-07-30T18:57:04Z

test this please

SparkQA · 2015-07-30T21:02:41Z

Test build #39085 has finished for PR 7746 at commit 0e05a3b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-07-30T21:31:08Z

test this please

SparkQA · 2015-07-30T23:32:28Z

Test build #39105 has finished for PR 7746 at commit 0e05a3b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-07-30T23:49:55Z

test this please

mengxr · 2015-07-30T23:56:49Z

All failed tests are unrelated to this PR. 1135551 passed Jenkins and the the only change in the latest commit is to add a comment to str(json[...]). So I'm going to merge this PR to unblock two follow-up PRs.

mengxr · 2015-07-30T23:58:14Z

Merged into master.

SparkQA · 2015-07-31T01:55:22Z

Test build #39124 has finished for PR 7746 at commit 0e05a3b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.

mengxr This adds the `BlockMatrix` to PySpark. I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished. Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits: 27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes. ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix. b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation. c014002 [Mike Dusenberry] Using properties for better documentation. 3bda6ab [Mike Dusenberry] Adding documentation. 8fb3095 [Mike Dusenberry] Small cleanup. e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark. (cherry picked from commit 34dcf10) Signed-off-by: Xiangrui Meng <meng@databricks.com>

mengxr added 2 commits July 28, 2015 23:10

move linalg.py to linalg/__init__.py

173a805

update tests

c48cae0

mengxr reviewed Jul 29, 2015
View reviewed changes

add a comment for str(...)

1135551

mengxr force-pushed the SPARK-9408 branch from 28b543f to 1135551 Compare July 29, 2015 23:42

dusenberrymw mentioned this pull request Jul 29, 2015

[SPARK-6486] [MLlib] [Python] Add BlockMatrix to PySpark. #7761

Closed

merge master

0e05a3b

mengxr mentioned this pull request Jul 30, 2015

[SPARK-9277] [MLLIB] SparseVector constructor must throw an error when declared number of elements less than array length #7794

Closed

asfgit closed this in ca71cc8 Jul 30, 2015

[SPARK-9408] [PySpark] [MLlib] Refactor linalg.py to /linalg #7746

[SPARK-9408] [PySpark] [MLlib] Refactor linalg.py to /linalg #7746

Uh oh!

Conversation

mengxr commented Jul 29, 2015

Uh oh!

mengxr Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

davies Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

davies Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

davies Jul 29, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 29, 2015

Uh oh!

SparkQA commented Jul 29, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

davies commented Jul 30, 2015

Uh oh!

MechCoder commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

mengxr commented Jul 30, 2015

Uh oh!

SparkQA commented Jul 31, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants