Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented May 5, 2016

What changes were proposed in this pull request?

Python example for ml.kmeans already exists, but not included in user guide.
1,small changes like: example_on example_off
2,add it to user guide
3,update examples to directly read datafile

How was this patch tested?

manual tests
`./bin/spark-submit examples/src/main/python/ml/kmeans_example.py

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57857 has finished for PR 12925 at commit 221ea4d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57859 has finished for PR 12925 at commit 5fa7f5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@dongjoon-hyun This one as well. Do you mind if I ask your thoughts on the component in the title? Making good examples for PRs will help all other contributers.

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented May 5, 2016

@HyukjinKwon ok. I will change them to [EXAMPLE]

@zhengruifeng zhengruifeng changed the title [SPARK-15149][DOC] include python example for kmeans [SPARK-15149][EXAMPLE] include python example for kmeans May 5, 2016
@MLnick
Copy link
Contributor

MLnick commented May 5, 2016

@zhengruifeng I prefer the style of bisecting_k_means_example.py ie working with data = spark.read.text("data/mllib/kmeans_data.txt"). Could we harmonize this with that one?

I will comment on #11844 too about harmonzing the Scala examples.

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented May 5, 2016

@MLnick Ok. I will update this examples to read the datafile

@zhengruifeng
Copy link
Contributor Author

@MLnick updated. Thanks for your comments.

@zhengruifeng
Copy link
Contributor Author

Oh, I need to update the JavaKMeansExample and KMeansExample

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57885 has finished for PR 12925 at commit 059a739.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57888 has finished for PR 12925 at commit f9ff25a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor

sethah commented May 5, 2016

Ah, I had a PR ready for this but didn't see you had created a Jira for it. I can review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you see my comments about this on #11844 and let me know?

Copy link
Contributor Author

@zhengruifeng zhengruifeng May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it. I will make KMeans examples keep in line with BiKMeans ones

@zhengruifeng zhengruifeng changed the title [SPARK-15149][EXAMPLE] include python example for kmeans [SPARK-15149][EXAMPLE][DOC] update kmeans example May 7, 2016
@zhengruifeng
Copy link
Contributor Author

data/mllib/sample_kmeans_data.txt was created in BisectingKMeans examples

@SparkQA
Copy link

SparkQA commented May 8, 2016

Test build #58084 has finished for PR 12925 at commit d91cbe9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 9, 2016

Test build #58145 has finished for PR 12925 at commit d5f02c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


import numpy as np
# $example on$
from pyspark.ml.clustering import KMeans, KMeansModel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need to import KMeansModel here.

@sethah
Copy link
Contributor

sethah commented May 9, 2016

LGTM other than one minor comment and pending #11844

Run with:
bin/spark-submit examples/src/main/python/ml/kmeans_example.py <input> <k>
This example requires NumPy (http://www.numpy.org/).
Copy link
Contributor

@holdenk holdenk May 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: So I believe this example still requires NumPy even though it isn't explicitly imported (see inside of def toArray called inside of clusterCenters which says it returns a NumPy array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will revert this removal.

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58307 has finished for PR 12925 at commit 5020773.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

@MLnick Thanks. Updated

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58309 has finished for PR 12925 at commit f2ff8d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented May 11, 2016

LGTM. I'll merge this once #11844 is merged.

@MLnick
Copy link
Contributor

MLnick commented May 11, 2016

Merged to master and branch-2.0. Thanks!

asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?
Python example for ml.kmeans already exists, but not included in user guide.
1,small changes like: `example_on` `example_off`
2,add it to user guide
3,update examples to directly read datafile

## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/kmeans_example.py

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12925 from zhengruifeng/km_pe.

(cherry picked from commit 8beae59)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
@asfgit asfgit closed this in 8beae59 May 11, 2016
@zhengruifeng zhengruifeng deleted the km_pe branch May 11, 2016 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants