Skip to content

Conversation

@mengxr
Copy link
Contributor

@mengxr mengxr commented Jul 29, 2015

This is based on @MechCoder 's PR #7731. Hopefully it could pass tests. @MechCoder I tried to make minimal changes. If this passes Jenkins, we can merge this one first and then try to move __init__.py to local.py in a separate PR.

Closes #7731

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies If I don't add str(...), I got the following exception in ml/clustering.py (because it returns a unicode string):

      File "/Users/meng/src/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 636, in fromJson
        m = __import__(pyModule, globals(), locals(), [pyClass])
    TypeError: ("Item in ``from list'' not a string", <function _parse_datatype_json_string at 0x10436a7d0>, (u'{"type":"struct","fields":[{"name":"features","type":{"type":"udt","class":"org.apache.spark.mllib.linalg.VectorUDT","pyClass":"pyspark.mllib.linalg.VectorUDT","sqlType":{"type":"struct","fields":[{"name":"type","type":"byte","nullable":false,"metadata":{}},{"name":"size","type":"integer","nullable":true,"metadata":{}},{"name":"indices","type":{"type":"array","elementType":"integer","containsNull":false},"nullable":true,"metadata":{}},{"name":"values","type":{"type":"array","elementType":"double","containsNull":false},"nullable":true,"metadata":{}}]}},"nullable":true,"metadata":{}},{"name":"prediction","type":"integer","nullable":true,"metadata":{}}]}',))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's strange, is it possible that you pyspark.zip is outdate? (remove it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried again and saw the same error without str(...). This is the same issue as reported in https://bugs.python.org/issue21720.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O, I see, LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment for it?

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38817 has finished for PR 7746 at commit c48cae0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 29, 2015

Test build #38904 has finished for PR 7746 at commit 28b543f.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

test this please

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #38918 has finished for PR 7746 at commit 1135551.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jul 30, 2015

LGTM

@MechCoder
Copy link
Contributor

Great, but how were you able to figure this out?

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

I googled the error message.

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39055 has finished for PR 7746 at commit 0e05a3b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

test this please

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39059 has finished for PR 7746 at commit 0e05a3b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

test this please

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39085 has finished for PR 7746 at commit 0e05a3b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

test this please

@SparkQA
Copy link

SparkQA commented Jul 30, 2015

Test build #39105 has finished for PR 7746 at commit 0e05a3b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

test this please

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

All failed tests are unrelated to this PR. 1135551 passed Jenkins and the the only change in the latest commit is to add a comment to str(json[...]). So I'm going to merge this PR to unblock two follow-up PRs.

@mengxr
Copy link
Contributor Author

mengxr commented Jul 30, 2015

Merged into master.

@asfgit asfgit closed this in ca71cc8 Jul 30, 2015
@SparkQA
Copy link

SparkQA commented Jul 31, 2015

Test build #39124 has finished for PR 7746 at commit 0e05a3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 5, 2015
mengxr This adds the `BlockMatrix` to PySpark.  I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished.

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits:

27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes.
ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix.
b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation.
c014002 [Mike Dusenberry] Using properties for better documentation.
3bda6ab [Mike Dusenberry] Adding documentation.
8fb3095 [Mike Dusenberry] Small cleanup.
e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.
asfgit pushed a commit that referenced this pull request Aug 5, 2015
mengxr This adds the `BlockMatrix` to PySpark.  I have the conversions to `IndexedRowMatrix` and `CoordinateMatrix` ready as well, so once PR #7554 is completed (which relies on PR #7746), this PR can be finished.

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes #7761 from dusenberrymw/SPARK-6486_Add_BlockMatrix_to_PySpark and squashes the following commits:

27195c2 [Mike Dusenberry] Adding one more check to _convert_to_matrix_block_tuple, and a few minor documentation changes.
ae50883 [Mike Dusenberry] Minor update: BlockMatrix should inherit from DistributedMatrix.
b8acc1c [Mike Dusenberry] Moving BlockMatrix to pyspark.mllib.linalg.distributed, updating the logic to match that of the other distributed matrices, adding conversions, and adding documentation.
c014002 [Mike Dusenberry] Using properties for better documentation.
3bda6ab [Mike Dusenberry] Adding documentation.
8fb3095 [Mike Dusenberry] Small cleanup.
e17af2e [Mike Dusenberry] Adding BlockMatrix to PySpark.

(cherry picked from commit 34dcf10)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants