-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15113][PySpark][ML] Add missing num features num classes #12889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15113][PySpark][ML] Add missing num features num classes #12889
Conversation
…tion.py and mark experimental class to match scala class
|
Test build #57726 has finished for PR 12889 at commit
|
|
cc @yanboliang |
|
|
||
| @inherit_doc | ||
| class DecisionTreeClassificationModel(DecisionTreeModel, JavaMLWritable, JavaMLReadable): | ||
| class DecisionTreeClassificationModel(DecisionTreeModel, JavaClassificationModel, JavaMLWritable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holdenk are we not missing out GBTClassificationModel, RandomForestClassificationModel in classification? I think GBT should just be JavaPredictionModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RandomForestClassificationModel and NaiveBayesModel should be extended from JavaClassificationModel, GBTClassificationModel should be JavaPredictionModel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious to know why we don't expose numClasses in GBTClassificationModel. Do we not support multiclass currently, or is there some other reason?
|
|
|
Test build #58023 has finished for PR 12889 at commit
|
|
Updated the classification models that do the mixing in based on the current inheritance in Scala side. I can follow up with more regression changes if no one takes over updating regression. |
|
Test build #58594 has finished for PR 12889 at commit
|
|
Test build #59181 has finished for PR 12889 at commit
|
…es in scala GeneralizedLinearRegressionModel
|
Test build #59184 has finished for PR 12889 at commit
|
|
ping? |
|
ping @yanboliang ? |
|
Test build #59966 has finished for PR 12889 at commit
|
|
Test build #61783 has finished for PR 12889 at commit
|
|
Just |
|
Test build #62895 has finished for PR 12889 at commit
|
|
Do we have interest in merging now that 2.0 is out? (cc @davies @yanboliang @MLnick )? Would be nice to do before we start adding more models to the Python ML API. |
|
ping @MLnick ? |
| To be mixed in with class:`pyspark.ml.JavaModel` | ||
| """ | ||
|
|
||
| @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be 2.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good point, this PR has been around - I'll update it here and in the other new method too.
|
Test build #63376 has finished for PR 12889 at commit
|
| override def write: MLWriter = | ||
| new GeneralizedLinearRegressionModel.GeneralizedLinearRegressionModelWriter(this) | ||
|
|
||
| override val numFeatures: Int = coefficients.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need a @Since("2.1.0") on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numFeatures has always been here its just been the default implementation - but I guess the since wouldn't be too confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a Since might actually be somewhat counter intuitive - how about a Javadoc note which says that it is now defined for this model starting in 2.1.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok fair point - I don't feel super strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to add this don't we? Otherwise it is the only public method in this class that doesn't have it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class has @Since("1.6.0") on the method - so it has been public since 1.6 already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that reflected in the documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YUp, e.g. LogisticRegressionModel, RandomForestClassificationModel etc
|
LGTM pending small comment on adding a since annotation. |
|
Test build #64016 has finished for PR 12889 at commit
|
|
jenkins retest this please |
|
Test build #64185 has finished for PR 12889 at commit
|
|
Merged to master. Thanks! |
## What changes were proposed in this pull request? Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as Expiremental to match Scala doc. ## How was this patch tested? Extended doctests Author: Holden Karau <holden@us.ibm.com> Closes apache#12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses.
What changes were proposed in this pull request?
Add missing
numFeaturesandnumClassesto the wrapped Java models in PySpark ML pipelines. Also tagDecisionTreeClassificationModelas Expiremental to match Scala doc.How was this patch tested?
Extended doctests