Skip to content

Conversation

@yinxusen
Copy link
Contributor

@yinxusen yinxusen commented Mar 9, 2015

This PR adds save/load for K-means as described in SPARK-5986. Python version will be added in another PR.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28393 has started for PR 4951 at commit dce7055.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28394 has started for PR 4951 at commit b144216.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28393 has finished for PR 4951 at commit dce7055.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28393/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28394 has finished for PR 4951 at commit b144216.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28394/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We don't need wrapper.serialize for vectors.
  • We need to store cluster indices. If the centers are saved to more than one partitions, we cannot easily load them back in the original order.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28408 has started for PR 4951 at commit cd390fd.

  • This patch merges cleanly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Int here to represent indexes? I think there is no need to use Long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Int is sufficient. This class should be private. Please also check the scope of other classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private. The class name could be changed to case class Cluster(id: Int, center: Vector). IndexedPoint is too general.

@SparkQA
Copy link

SparkQA commented Mar 9, 2015

Test build #28408 has finished for PR 4951 at commit cd390fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable
    • case class IndexedPoint(id: Int, point: Vector)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28408/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be specific about the imports. You only need Vector.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28453 has started for PR 4951 at commit 6dd74a0.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28453 has finished for PR 4951 at commit 6dd74a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28453/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Mar 11, 2015

LGTM. Merged into master. Thanks!

@asfgit asfgit closed this in 2d4e00e Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants