Skip to content

Conversation

@wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

Replace iris dataset with Titanic or other dataset in example and document.

How was this patch tested?

Manual and existing test

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73304 has finished for PR 17032 at commit 1f57467.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981
Copy link
Contributor Author

cc @felixcheung

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gender

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is it make sense to model this with Sex as the label? that seems a bit strange

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to demonstrate with a category variable. I can change it to Survived. Is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - there are still a few examples with Sex ~ - do you think we should change them too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change them all to survived. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to check if the regParam value make sense in the generated output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary should print without having to do a head here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, summary returns a DataFrame. It won't print out the contents of the DataFrame.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok :) so we could add a print.summary.bisectingKMeansModel like other models :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example is a bit weird - it takes the same data to build the model and then predict with it.
I suspect we are really limited in terms of how much data we have here, but we should consider building a better example which include doing a randomSplit into training and test set etc..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for binomial here or kmeans.R, ml.R

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I agree. I saw other examples using the same dataset as testing. How about fixing them all in another follow-up PR? We only focus on fixing the iris dataset replacement in this PR. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could but as I've mentioned, Titanic is really small - it might not work properly if we are split that further, so it might be something we need to change again to add the split

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the example not the vignettes. We can use datasets in the data/mllib directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kmeans_data.txt and sample_kmeans_data.txt have fewer data points than Titanic. So in this case, I am still using the Titanic dataset.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73373 has finished for PR 17032 at commit 233ebec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv.glmnet tested.

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73375 has finished for PR 17032 at commit 0c05309.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@felixcheung felixcheung Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample could end up having the same row in both training and test set.
I think we should use randomSplit instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will change it. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73440 has finished for PR 17032 at commit 5beca69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981
Copy link
Contributor Author

@felixcheung I have made the changes per our review discussion. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73509 has finished for PR 17032 at commit b0585aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 28, 2017

Test build #73608 has finished for PR 17032 at commit 905ffde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

merged to master.

@asfgit asfgit closed this in 89cd384 Mar 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants