-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18797][SparkR]:Update spark.logit in sparkr-vignettes #16222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @felixcheung |
|
Test build #69892 has finished for PR 16222 at commit
|
|
Test build #69895 has finished for PR 16222 at commit
|
|
Thanks! |
|
@wangmiao1981 Sorry, I was actually suggesting removing the math part. Those are standard logistic regression formulations, which could be found in many other places. We don't really need to repeat them here. Just providing some pointers should be sufficient. The second half overlaps with the API doc. It would cause maintenance overhead, e.g. how to keep the content in sync. Take dplyr for example, the reference manual contains the API doc while the vignettes are more tutorial-like. We should move towards this direction and having less duplicate content in our code base. For this one in particular, I suggest doing a quick introduction followed by some example code. |
|
Got it! I will make a new pass. Thanks! |
mengxr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made one pass
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
|
|
||
| (Added in 2.1.0) | ||
|
|
||
| [Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Model](https://en.wikipedia.org/wiki/Generalized_linear_model). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model -> predictive model
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| (Added in 2.1.0) | ||
|
|
||
| [Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Model](https://en.wikipedia.org/wiki/Generalized_linear_model). | ||
| There are two types of logistic regression models, namely binomial logistic regression (i.e., response is binary) and multinomial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is not very necessary, which is basically explaining what logistic regression models are.
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| [Logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) is a widely-used model when the response is categorical. It can be seen as a special case of the [Generalized Linear Model](https://en.wikipedia.org/wiki/Generalized_linear_model). | ||
| There are two types of logistic regression models, namely binomial logistic regression (i.e., response is binary) and multinomial | ||
| logistic regression (i.e., response falls into multiple classes). We provide `spark.logit` on top of `spark.glm` to support logistic regression with advanced hyper-parameters. | ||
| It supports both binary and multiclass classification, elastic-net regularization, and feature standardization, similar to `glmnet`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with the text, we can say "both binomial and multinomial logistic regression models with elastic-net regularization and feature standardization, similar ..."
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| It supports both binary and multiclass classification, elastic-net regularization, and feature standardization, similar to `glmnet`. | ||
|
|
||
|
|
||
| `spark.logit` fits an Logistic Regression Model against a Spark DataFrame. The `family` parameter can be used to select between the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is not necessary. We can continue with the examples directly.
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| Binomial logistic regression | ||
| ```{r, warning=FALSE} | ||
| df <- createDataFrame(iris) | ||
| # Create a dataframe containing two classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataframe -> DataFrame
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| # Create a dataframe containing two classes | ||
| training <- df[df$Species %in% c("versicolor", "virginica"), ] | ||
| model <- spark.logit(training, Species ~ ., regParam = 0.5) | ||
| summary <- summary(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we didn't implement print.summary. If summary(model) is still somewhat human-readable, we should use it. Once we implemented print.summary, we don't need to change code here.
| df <- createDataFrame(iris) | ||
| # Create a dataframe containing two classes | ||
| training <- df[df$Species %in% c("versicolor", "virginica"), ] | ||
| model <- spark.logit(training, Species ~ ., regParam = 0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, did you check whether regParam = 0.5 returns a good model or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the test as an example. I didn't check whether regParam = 0.5 returns good model or not. I can do some experiments to check it out.
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| df <- createDataFrame(iris) | ||
| # Note family = "multinomial" is optional in this case since the dataset has multiple classes. | ||
| model <- spark.logit(df, Species ~ ., regParam = 0.5) | ||
| summary <- summary(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
R/pkg/vignettes/sparkr-vignettes.Rmd
Outdated
| Multinomial logistic regression against three classes | ||
| ```{r, warning=FALSE} | ||
| df <- createDataFrame(iris) | ||
| # Note family = "multinomial" is optional in this case since the dataset has multiple classes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads like family = "binomial" is required if the dataset has only two classes.
|
Test build #69947 has finished for PR 16222 at commit
|
|
Test build #69948 has finished for PR 16222 at commit
|
|
I used the following R code and glmnet to check whether
multinomial:
If I understand correctly, |
|
Test build #69954 has finished for PR 16222 at commit
|
## What changes were proposed in this pull request? spark.logit is added in 2.1. We need to update spark-vignettes to reflect the changes. This is part of SparkR QA work. ## How was this patch tested? Manual build html. Please see attached image for the result.  Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16222 from wangmiao1981/veg. (cherry picked from commit 2aa16d0) Signed-off-by: Xiangrui Meng <meng@databricks.com>
|
LGTM. Merged into master and branch-2.1. I will change the |
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in #16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16284 from wangmiao1981/ks. (cherry picked from commit 3243885) Signed-off-by: Felix Cheung <felixcheung@apache.org>
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in #16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16284 from wangmiao1981/ks.
## What changes were proposed in this pull request? spark.logit is added in 2.1. We need to update spark-vignettes to reflect the changes. This is part of SparkR QA work. ## How was this patch tested? Manual build html. Please see attached image for the result.  Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16222 from wangmiao1981/veg.
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16284 from wangmiao1981/ks.
## What changes were proposed in this pull request? spark.logit is added in 2.1. We need to update spark-vignettes to reflect the changes. This is part of SparkR QA work. ## How was this patch tested? Manual build html. Please see attached image for the result.  Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16222 from wangmiao1981/veg.
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in apache#16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes apache#16284 from wangmiao1981/ks.
What changes were proposed in this pull request?
spark.logit is added in 2.1. We need to update spark-vignettes to reflect the changes. This is part of SparkR QA work.
How was this patch tested?
Manual build html. Please see attached image for the result.
