Skip to content

[SPARK-17157][SPARKR][WIP]: Add multiclass logistic regression SparkR Wrapper#14818

Closed
wangmiao1981 wants to merge 4 commits intoapache:masterfrom
wangmiao1981:gllm
Closed

[SPARK-17157][SPARKR][WIP]: Add multiclass logistic regression SparkR Wrapper#14818
wangmiao1981 wants to merge 4 commits intoapache:masterfrom
wangmiao1981:gllm

Conversation

@wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
Add SparkR wrapper for multicalss logistic regression.

Not ready for review now. Only R part code is added.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@SparkQA
Copy link

SparkQA commented Aug 25, 2016

Test build #64441 has finished for PR 14818 at commit d6dbff8.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@junyangq
Copy link
Contributor

We may want to use a different name? glmnet related name could be confusing if it is actually only multiclass logistic.

@sethah
Copy link
Contributor

sethah commented Aug 26, 2016

This is going to have to wait. We are changing the interface completely. See https://issues.apache.org/jira/browse/SPARK-17163.

@felixcheung
Copy link
Member

cool, thanks for the heads up @sethah - please loop us in for the R side changes.

@sethah
Copy link
Contributor

sethah commented Aug 26, 2016

In fact, we are actually just eliminating the MultinomialLogisticRegression interface and merging into the existing LogisticRegression estimator. So, maybe we won't need a change after all? I'm not very familiar with the R side, but basically the existing logistic regression will now support multiclass.

My apologies for the confusion, I hadn't seen this Jira until just now.

@felixcheung
Copy link
Member

Right - I suspect we do need to change to R interface though which is why I mention it. Thanks!

@wangmiao1981
Copy link
Contributor Author

OK. I will wait for the interface refactoring to continue this JIRA. Now, I just leave it as a placeholder. Later on, I will change the name of the JIRA. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64489 has finished for PR 14818 at commit 87ae15a.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultinomialLogisticRegressionWrapperWriter(instance: MultinomialLogisticRegressionWrapper)

@sethah
Copy link
Contributor

sethah commented Sep 20, 2016

#14834 is merged now :)

@wangmiao1981
Copy link
Contributor Author

Thanks! I will update the PR accordingly.

@felixcheung
Copy link
Member

thanks - let's review if the name glmnet should be used or should this be part of our support for logit

@wangmiao1981
Copy link
Contributor Author

Yes. I am reviewing the code in #14834 to check if we just have to modify existing wrapper to be consistent with Scala side API.

@wangmiao1981
Copy link
Contributor Author

@felixcheung https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html
glmnet has more functionalities than Logistic Regression (BLOR and MLOR). So logit could be a better name. Based on my search R doesn't have logit. What do you think?

@felixcheung
Copy link
Member

felixcheung commented Sep 23, 2016

glm has a link=logit parameter? not sure if it maps to the Spark version of things
http://www.statmethods.net/advstats/glm.html

@wangmiao1981
Copy link
Contributor Author

@felixcheung For Spark, it supports both binomial and multinomial logistic regression. The link function logit under glm only supports binomial. I found that glmnet supports both. There is also a package mlogit, https://cran.r-project.org/web/packages/mlogit/vignettes/mlogit.pdf. for multinomial only. What do you think? cc @junyangq

Thanks!

@SparkQA
Copy link

SparkQA commented Sep 23, 2016

Test build #65833 has finished for PR 14818 at commit ed1a0fb.

  • This patch fails some tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member

If we are closely matching the capability of glmnet then we could name it glmnet or spark.glmnet, if this fits in the existing spark.glm implementation (which is different from the SparkR glm function) then we should add it there, otherwise our convention would be to add a new function that is named for what it does, such as "spark.multiclassLogisticRegression"

From what you have described perhaps it makes sense to add to spark.glm?

@wangmiao1981
Copy link
Contributor Author

@felixcheung If I added it to spark.glm, it will break the current spark.glm interface.

setMethod("spark.glm", signature(data = "SparkDataFrame", formula = "formula"), function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL, regParam = 0.0)

With this interface, I can't add LogisticRegressionModel. I think it is better to add it to a separate Class, instead of the GeneralizedLinearRegressionModel.

@felixcheung
Copy link
Member

Could you elaborate what's incompatible?

@wangmiao1981
Copy link
Contributor Author

For example:
LogisticRegression has Threshold, Thresholds and AggregationDepth etc as parameters.

GeneralizedLinearRegression doesn't have such parameters. So the interface
function(data, formula, family = gaussian, tol = 1e-6, maxIter = 25, weightCol = NULL,
regParam = 0.0)

should be modified to accept these parameters.

@felixcheung
Copy link
Member

make sense, I think then we should have a separate function

@wangmiao1981
Copy link
Contributor Author

Thanks for your reply! Now, I am working in a fresh branch. Will submit another PR and close this old one later.

@wangmiao1981
Copy link
Contributor Author

@felixcheung I opened #15365 based on our discussion in this PR. I close this PR now and please review #15365 . Thanks!

asfgit pushed a commit that referenced this pull request Oct 26, 2016
## What changes were proposed in this pull request?

As we discussed in #14818, I added a separate R wrapper spark.logit for logistic regression.

This single interface supports both binary and multinomial logistic regression. It also has "predict" and "summary" for binary logistic regression.

## How was this patch tested?

New unit tests are added.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #15365 from wangmiao1981/glm.
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
## What changes were proposed in this pull request?

As we discussed in apache#14818, I added a separate R wrapper spark.logit for logistic regression.

This single interface supports both binary and multinomial logistic regression. It also has "predict" and "summary" for binary logistic regression.

## How was this patch tested?

New unit tests are added.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes apache#15365 from wangmiao1981/glm.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?

As we discussed in apache#14818, I added a separate R wrapper spark.logit for logistic regression.

This single interface supports both binary and multinomial logistic regression. It also has "predict" and "summary" for binary logistic regression.

## How was this patch tested?

New unit tests are added.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes apache#15365 from wangmiao1981/glm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants