-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-19460][SparkR]:Update dataset used in R documentation, examples to reduce warning noise and confusions #17032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73304 has finished for PR 17032 at commit
|
|
cc @felixcheung |
R/pkg/R/mllib_tree.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gener?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gender
R/pkg/R/mllib_tree.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it make sense to model this with Sex as the label? that seems a bit strange
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to demonstrate with a category variable. I can change it to Survived. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - there are still a few examples with Sex ~ - do you think we should change them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change them all to survived. Thanks!
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to check if the regParam value make sense in the generated output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
summary should print without having to do a head here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, summary returns a DataFrame. It won't print out the contents of the DataFrame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok :) so we could add a print.summary.bisectingKMeansModel like other models :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in follow-up PR.
examples/src/main/r/ml/glm.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is a bit weird - it takes the same data to build the model and then predict with it.
I suspect we are really limited in terms of how much data we have here, but we should consider building a better example which include doing a randomSplit into training and test set etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for binomial here or kmeans.R, ml.R
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I agree. I saw other examples using the same dataset as testing. How about fixing them all in another follow-up PR? We only focus on fixing the iris dataset replacement in this PR. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could but as I've mentioned, Titanic is really small - it might not work properly if we are split that further, so it might be something we need to change again to add the split
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the example not the vignettes. We can use datasets in the data/mllib directory.
examples/src/main/r/ml/kmeans.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kmeans_data.txt and sample_kmeans_data.txt have fewer data points than Titanic. So in this case, I am still using the Titanic dataset.
|
Test build #73373 has finished for PR 17032 at commit
|
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cv.glmnet tested.
|
Test build #73375 has finished for PR 17032 at commit
|
examples/src/main/r/ml/glm.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample could end up having the same row in both training and test set.
I think we should use randomSplit instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will change it. Thanks!
|
Test build #73440 has finished for PR 17032 at commit
|
|
@felixcheung I have made the changes per our review discussion. Thanks! |
5beca69 to
b0585aa
Compare
|
Test build #73509 has finished for PR 17032 at commit
|
b0585aa to
905ffde
Compare
|
Test build #73608 has finished for PR 17032 at commit
|
|
merged to master. |
What changes were proposed in this pull request?
Replace
irisdataset withTitanicor other dataset in example and document.How was this patch tested?
Manual and existing test