[SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions #17032

wangmiao1981 · 2017-02-23T00:50:55Z

What changes were proposed in this pull request?

Replace iris dataset with Titanic or other dataset in example and document.

How was this patch tested?

Manual and existing test

SparkQA · 2017-02-23T01:25:58Z

Test build #73304 has finished for PR 17032 at commit 1f57467.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-02-23T04:59:57Z

cc @felixcheung

felixcheung · 2017-02-23T18:57:42Z

R/pkg/R/mllib_tree.R

felixcheung · 2017-02-23T18:58:27Z

R/pkg/R/mllib_tree.R

hmm, is it make sense to model this with Sex as the label? that seems a bit strange

Just want to demonstrate with a category variable. I can change it to Survived. Is it ok?

Right - there are still a few examples with Sex ~ - do you think we should change them too?

I will change them all to survived. Thanks!

felixcheung · 2017-02-23T19:00:28Z

R/pkg/vignettes/sparkr-vignettes.Rmd

would be good to check if the regParam value make sense in the generated output?

I will check.

felixcheung · 2017-02-23T19:01:53Z

examples/src/main/r/ml/bisectingKmeans.R

summary should print without having to do a head here?

In this case, summary returns a DataFrame. It won't print out the contents of the DataFrame.

ok :) so we could add a print.summary.bisectingKMeansModel like other models :)

Will do in follow-up PR.

felixcheung · 2017-02-23T19:06:16Z

examples/src/main/r/ml/glm.R

I think this example is a bit weird - it takes the same data to build the model and then predict with it.
I suspect we are really limited in terms of how much data we have here, but we should consider building a better example which include doing a randomSplit into training and test set etc..

ditto for binomial here or kmeans.R, ml.R

Yes. I agree. I saw other examples using the same dataset as testing. How about fixing them all in another follow-up PR? We only focus on fixing the iris dataset replacement in this PR. Thanks!

we could but as I've mentioned, Titanic is really small - it might not work properly if we are split that further, so it might be something we need to change again to add the split

Since this is the example not the vignettes. We can use datasets in the data/mllib directory.

wangmiao1981 · 2017-02-24T00:47:48Z

examples/src/main/r/ml/kmeans.R

kmeans_data.txt and sample_kmeans_data.txt have fewer data points than Titanic. So in this case, I am still using the Titanic dataset.

SparkQA · 2017-02-24T00:54:12Z

Test build #73373 has finished for PR 17032 at commit 233ebec.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-02-24T00:58:11Z

R/pkg/vignettes/sparkr-vignettes.Rmd

cv.glmnet tested.

SparkQA · 2017-02-24T01:19:46Z

Test build #73375 has finished for PR 17032 at commit 0c05309.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-02-24T08:46:06Z

examples/src/main/r/ml/glm.R

sample could end up having the same row in both training and test set.
I think we should use randomSplit instead.

OK. I will change it. Thanks!

SparkQA · 2017-02-24T19:50:31Z

Test build #73440 has finished for PR 17032 at commit 5beca69.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2017-02-27T06:24:02Z

@felixcheung I have made the changes per our review discussion. Thanks!

SparkQA · 2017-02-27T17:26:27Z

Test build #73509 has finished for PR 17032 at commit b0585aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-28T21:02:12Z

Test build #73608 has finished for PR 17032 at commit 905ffde.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2017-03-01T06:31:51Z

merged to master.

srowen approved these changes Feb 23, 2017

View reviewed changes

felixcheung reviewed Feb 23, 2017

View reviewed changes

wangmiao1981 commented Feb 24, 2017

View reviewed changes

R/pkg/vignettes/sparkr-vignettes.Rmd Outdated

Copy link

Contributor Author

wangmiao1981 Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv.glmnet tested.

felixcheung requested changes Feb 24, 2017

View reviewed changes

wangmiao1981 force-pushed the example branch from 5beca69 to b0585aa Compare February 27, 2017 06:29

wangmiao1981 added 5 commits February 28, 2017 11:41

remove iris dataset in example and document

985ab1b

address review comments

5a69ac6

split dataset

0204d7e

change to randomSplit

5e2a3ab

address review comments

905ffde

wangmiao1981 force-pushed the example branch from b0585aa to 905ffde Compare February 28, 2017 20:24

asfgit closed this in 89cd384 Mar 1, 2017

[SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions #17032

[SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions #17032

Uh oh!

Conversation

wangmiao1981 commented Feb 23, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 23, 2017

Uh oh!

wangmiao1981 commented Feb 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

felixcheung Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 24, 2017

Uh oh!

wangmiao1981 commented Feb 27, 2017

Uh oh!

SparkQA commented Feb 27, 2017

Uh oh!

SparkQA commented Feb 28, 2017

Uh oh!

felixcheung commented Mar 1, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

felixcheung Feb 24, 2017 •

edited

Loading