Skip to content

Conversation

@MechCoder
Copy link
Contributor

Should be self explanatory.

@MechCoder
Copy link
Contributor Author

cc: @mengxr @jkbradley

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28481 has started for PR 4986 at commit 4898d57.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28481 has finished for PR 4986 at commit 4898d57.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(weights: Array[Double], mus: Array[Vector], sigmas: Array[Array[Double]])

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28481/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it should be private, but then it should be private in all other files as well.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28583/
Test FAILed.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@MechCoder
Copy link
Contributor Author

@mengxr I am not sure if we should flatten it or not, would it be worth if the number of clusters is large? Also I think it would be better if we deal with MatrixUDT after this PR is done with. wdyt?

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28584 has started for PR 4986 at commit 9aaa535.

  • This patch merges cleanly.

@mengxr
Copy link
Contributor

mengxr commented Mar 13, 2015

The number of clusters won't be very large. Flattening an Array[Array[Double]] doesn't copy the data, so there is no overhead. The content of parquet file is easy to inspect if we list each center as a record. I think we should just use Array[Double] instead being blocked by MatrixUDT. GMM models are usually dense.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28584/
Test FAILed.

@shaneknapp
Copy link
Contributor

jenkins, test this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28588 has started for PR 4986 at commit 9aaa535.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28588 has finished for PR 4986 at commit 9aaa535.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(weights: Array[Double], mus: Array[Vector], sigmas: Array[Array[Double]])

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28588/
Test PASSed.

@MechCoder
Copy link
Contributor Author

@mengxr I thing I have addressed your comments. sigmas is now stored as an Array of Doubles, Do you have any more comments? Thanks!

@SparkQA
Copy link

SparkQA commented Mar 14, 2015

Test build #28607 has started for PR 4986 at commit 4321743.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 14, 2015

Test build #28607 has finished for PR 4986 at commit 4321743.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(weights: Array[Double], mus: Array[Vector], sigmas: Array[Double])

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28607/
Test PASSed.

@MechCoder
Copy link
Contributor Author

@mengxr I rebased over master and used MatrixUDT. Please review! :)

@SparkQA
Copy link

SparkQA commented Mar 21, 2015

Test build #28937 has started for PR 4986 at commit 23d707e.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not efficient because it may trigger multiple passes to the parquet file. Let's call collect() directly.

@MechCoder
Copy link
Contributor Author

@mengxr fixed !

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29101 has started for PR 4986 at commit e7a14cb.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the Java example.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29101 has finished for PR 4986 at commit e7a14cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(weight: Double, mu: Vector, sigma: Matrix)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29101/
Test PASSed.

@MechCoder
Copy link
Contributor Author

@mengxr I have addressed your comments. Please have a look !

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29148 has started for PR 4986 at commit 7d2cd56.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29148 has finished for PR 4986 at commit 7d2cd56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class Data(weight: Double, mu: Vector, sigma: Matrix)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29148/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Mar 25, 2015

LGTM. Merged into master. Thanks!!

@asfgit asfgit closed this in 4fc4d03 Mar 25, 2015
@MechCoder MechCoder deleted the spark-5987 branch March 26, 2015 03:01
@MechCoder
Copy link
Contributor Author

@mengxr thanks for the merge! For supporting this in PySpark, we would need support for MatrixUDT, which would need support for sparse matrices right? I could not find any existing JIRA related to sparse matrix support, if you are able to please link me to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants