Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented May 5, 2016

What changes were proposed in this pull request?

1, Add python example for OneVsRest
2, remove args-parsing

How was this patch tested?

manual tests
./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py

@zhengruifeng
Copy link
Contributor Author

There seems no similar API to directly get numClass like scala
val numClasses = MetadataUtils.getNumClasses(predictionColSchema).get

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57841 has finished for PR 12920 at commit 474e252.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57842 has finished for PR 12920 at commit 44875ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 5, 2016

Test build #57843 has finished for PR 12920 at commit 63d90ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

I am not sure if fixing examples can have the component [DOC] in the title. I saw [EXAMPLE] component was used by @dongjoon-hyun. This is a pretty minor but making good examples of PRs will help all other contributors.

@dongjoon-hyun Do you mind If I ask your thoughts please?

@zhengruifeng zhengruifeng changed the title [SPARK-15141][DOC] Add python example for OneVsRest [SPARK-15141][EXAMPLE] Add python example for OneVsRest May 5, 2016
@MLnick
Copy link
Contributor

MLnick commented May 5, 2016

@HyukjinKwon I get your concern, but these examples are actually directly included in the HTML documentation for Spark, via include_example, so they are both examples and docs. For this kind of thing that touches the HTML doc I think [DOC] is important to include.

@MLnick
Copy link
Contributor

MLnick commented May 5, 2016

@zhengruifeng I think the OneVsRest examples are a little bloated. Could we turn them into simpler versions (essentially just the run part of the example) similar to say GradientBoostedTreeClassifierExample. Same applies for Java.

Frankly, the way the example looks now in HTML doc is very confusing with the reference to params that makes the self-contained example harder to follow.

@zhengruifeng
Copy link
Contributor Author

@MLnick Agreed. I will remove the args-parsing blocks in the three example files.

@yanboliang
Copy link
Contributor

yanboliang commented May 5, 2016

+1 @MLnick

There are two types of example code under examples/:

  • Example applications (the ones with command-line parsing). Those are served as template code for users to build their own standalone applications. We should create those only for important algorithms.
  • Tutorial code. Those are for the user guide, usually without options to play with. We should keep one example per algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use SparkSession instead of SQLContext. See #12809 for details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhengruifeng zhengruifeng changed the title [SPARK-15141][EXAMPLE] Add python example for OneVsRest [SPARK-15141][EXAMPLE][DOC] Add python example for OneVsRest May 6, 2016
@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57948 has finished for PR 12920 at commit a8d2681.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #57952 has finished for PR 12920 at commit 06cab7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

@MLnick Args-Parsing was removed in those examples

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prefer to use ml.evaluation.MulticlassClassificationEvaluator here as it's a DataFrame API example. You may have to change the metric used - see DecisionTreeClassificationExample for example

@MLnick
Copy link
Contributor

MLnick commented May 7, 2016

@zhengruifeng made a few comments - most importantly it's better to use the ml evaluators throughout as these are DataFrame API examples.

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented May 8, 2016

@MLnick

I have updated codes according to your comments.
Because MulticlassClassificationEvaluator do not support confusionMatrix, I just remove the computaion of confusionMatrix, and use Test Error instead.

@SparkQA
Copy link

SparkQA commented May 8, 2016

Test build #58082 has finished for PR 12920 at commit af95019.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng zhengruifeng changed the title [SPARK-15141][EXAMPLE][DOC] Add python example for OneVsRest [SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples May 8, 2016
@MLnick
Copy link
Contributor

MLnick commented May 8, 2016

Just use accuracy I think similar to the decision tree example

On Sun, 8 May 2016 at 04:21, Ruifeng Zheng notifications@github.com wrote:

@MLnick https://github.com/MLnick MulticlassClassificationEvaluator do
not support confusionMatrix not. So I will just remove the computaion of
confusionMatrix.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#12920 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: println

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@SparkQA
Copy link

SparkQA commented May 8, 2016

Test build #58100 has finished for PR 12920 at commit 16c2e74.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import is no longer required I believe

@MLnick
Copy link
Contributor

MLnick commented May 10, 2016

@zhengruifeng made a few comments. Pending those I think this is ready

@zhengruifeng
Copy link
Contributor Author

@MLnick Thanks. Updated

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58306 has finished for PR 12920 at commit 77ff733.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

OneVsRest ovr = new OneVsRest().setClassifier(classifier);

// train the multiclass model
// train the multiclass model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one last thing - here we use train.cache() but we don't do that in the other examples. Actually in general we don't seem to do that in any other examples from a quick look. So perhaps remove that and just do ovr.fit(train);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I will fix it

@SparkQA
Copy link

SparkQA commented May 11, 2016

Test build #58344 has finished for PR 12920 at commit 9049002.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented May 11, 2016

LGTM, merged to master and branch-2.0. Thanks!

asfgit pushed a commit that referenced this pull request May 11, 2016
## What changes were proposed in this pull request?
1, Add python example for OneVsRest
2, remove args-parsing

## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py`

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #12920 from zhengruifeng/ovr_pe.

(cherry picked from commit ad1a846)
Signed-off-by: Nick Pentreath <nickp@za.ibm.com>
@asfgit asfgit closed this in ad1a846 May 11, 2016
@zhengruifeng zhengruifeng deleted the ovr_pe branch May 11, 2016 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants