Skip to content

Conversation

@harsha2010
Copy link

Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33176 has finished for PR 6296 at commit bb9dbfa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](api/scala/index.html#org.apache.spark.ml.classifier.OneVsRest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • ``OneVsRestis anEstimator` takes as base classifier instances of `Classifier` and creates a binary classification problem for each of the k classes. The classifier for class i is trained to predict whether the label is i or not, distinguishing class i from all other classes.`
    • [LIBSVM data file](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/), parse it as an RDD ofLabeledPointand perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

@SparkQA
Copy link

SparkQA commented May 20, 2015

Test build #33179 has finished for PR 6296 at commit 13bed9c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please match ml-features.md:

**Table of Contents**

* This will become a table of contents (this text will be scraped).
{:toc}

@jkbradley
Copy link
Member

I like adding the multiclass dataset.

Could you please add a Java example?

I'm testing the Scala example now...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indent

@harsha2010
Copy link
Author

@jkbradley thanks for the review. I'll add the Java Example and fix some indentation issues and review comments and update the PR tonight

@SparkQA
Copy link

SparkQA commented May 21, 2015

Test build #33180 timed out for PR 6296 at commit 4b7d1a6 after a configured wait of 150m.

@jkbradley
Copy link
Member

Could you please remove the use of MetadataUtils? That will no longer be public because of this PR: [https://github.com//pull/6322]

@harsha2010
Copy link
Author

Thanks for noticing this, will do

On May 21, 2015, at 5:14 PM, jkbradley notifications@github.com wrote:

Could you please remove the use of MetadataUtils? That will no longer be public because of this PR: [#6322]


Reply to this email directly or view it on GitHub.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33313 has finished for PR 6296 at commit c026613.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • [LIBSVM data file](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/), parse it as a DataFrame and perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have commented before: This import is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line too long. Can it be split? For an example, a foreach with a println inside the loop might be easiest to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use DataFrame.randomSplit here too to make the Scala & Java examples as similar as possible?

@jkbradley
Copy link
Member

@harsha2010 Thanks for adding the Java example. Just had a couple of small comments about clarity

Ram Sriharsha added 2 commits May 22, 2015 11:42
@jkbradley
Copy link
Member

@harsha2010 Thank you! LGTM pending tests

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33346 has finished for PR 6296 at commit ebdf103.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • [LIBSVM data file](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/), parse it as a DataFrame and perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33347 has finished for PR 6296 at commit 2f76295.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • [Iris dataset](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale), parse it as a DataFrame and perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

@jkbradley
Copy link
Member

Merging with master and branch-1.4

asfgit pushed a commit that referenced this pull request May 22, 2015
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #6296 from harsha2010/SPARK-7574 and squashes the following commits:

645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest

(cherry picked from commit 509d55a)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 509d55a May 22, 2015
@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33355 has finished for PR 6296 at commit 46c41b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • [Iris dataset](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale), parse it as a DataFrame and perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

@SparkQA
Copy link

SparkQA commented May 22, 2015

Test build #33356 has finished for PR 6296 at commit 645427c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • [OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently.
    • [Iris dataset](http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale), parse it as a DataFrame and perform multiclass classification usingOneVsRest. The test error is calculated to measure the algorithm accuracy.

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes apache#6296 from harsha2010/SPARK-7574 and squashes the following commits:

645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes apache#6296 from harsha2010/SPARK-7574 and squashes the following commits:

645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes apache#6296 from harsha2010/SPARK-7574 and squashes the following commits:

645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants