Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented May 4, 2016

What changes were proposed in this pull request?

Add missing numFeatures and numClasses to the wrapped Java models in PySpark ML pipelines. Also tag DecisionTreeClassificationModel as Expiremental to match Scala doc.

How was this patch tested?

Extended doctests

@SparkQA
Copy link

SparkQA commented May 4, 2016

Test build #57726 has finished for PR 12889 at commit c1961ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class JavaClassificationModel(JavaPredictionModel):
    • class LogisticRegressionModel(JavaModel, JavaClassificationModel, JavaMLWritable, JavaMLReadable):
    • class DecisionTreeClassificationModel(DecisionTreeModel, JavaClassificationModel, JavaMLWritable,
    • class NaiveBayesModel(JavaModel, JavaClassificationModel, JavaMLWritable, JavaMLReadable):
    • class MultilayerPerceptronClassificationModel(JavaModel, JavaClassificationModel, JavaMLWritable,

@holdenk
Copy link
Contributor Author

holdenk commented May 5, 2016

cc @yanboliang


@inherit_doc
class DecisionTreeClassificationModel(DecisionTreeModel, JavaMLWritable, JavaMLReadable):
class DecisionTreeClassificationModel(DecisionTreeModel, JavaClassificationModel, JavaMLWritable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holdenk are we not missing out GBTClassificationModel, RandomForestClassificationModel in classification? I think GBT should just be JavaPredictionModel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomForestClassificationModel and NaiveBayesModel should be extended from JavaClassificationModel, GBTClassificationModel should be JavaPredictionModel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious to know why we don't expose numClasses in GBTClassificationModel. Do we not support multiclass currently, or is there some other reason?

@yanboliang
Copy link
Contributor

numFeatures in PredictionModel at Scala side, so all classes extends from PredictionModel(at Scala side) should be mixed-in it.
numClasses in ClassificationModel at Scala side, so all classes extends from ClassificationModel(at Scala side) should be mixed-in it.

@SparkQA
Copy link

SparkQA commented May 6, 2016

Test build #58023 has finished for PR 12889 at commit e3b01f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RandomForestClassificationModel(TreeEnsembleModels, JavaClassificationModel, JavaMLWritable,
    • class GBTClassificationModel(TreeEnsembleModels, JavaPredictionModel, JavaMLWritable,
    • class MultilayerPerceptronClassificationModel(JavaModel, JavaPredictionModel, JavaMLWritable,

@holdenk
Copy link
Contributor Author

holdenk commented May 9, 2016

Updated the classification models that do the mixing in based on the current inheritance in Scala side. I can follow up with more regression changes if no one takes over updating regression.

@SparkQA
Copy link

SparkQA commented May 13, 2016

Test build #58594 has finished for PR 12889 at commit f5c69f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2016

Test build #59181 has finished for PR 12889 at commit 6e35559.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 24, 2016

Test build #59184 has finished for PR 12889 at commit 020c096.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RandomForestRegressionModel(TreeEnsembleModels, JavaPredictionModel, JavaMLWritable,

@holdenk
Copy link
Contributor Author

holdenk commented May 26, 2016

ping?

@holdenk
Copy link
Contributor Author

holdenk commented May 27, 2016

ping @yanboliang ?

@SparkQA
Copy link

SparkQA commented Jun 3, 2016

Test build #59966 has finished for PR 12889 at commit 45570f5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2016

Test build #61783 has finished for PR 12889 at commit 0d7defa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MechCoder
Copy link
Contributor

Just LinearRegressionModel seems missing to me. LGTM otherwise.

@SparkQA
Copy link

SparkQA commented Jul 26, 2016

Test build #62895 has finished for PR 12889 at commit 8a30c7a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class LinearRegressionModel(JavaModel, JavaPredictionModel, JavaMLWritable, JavaMLReadable):

@holdenk
Copy link
Contributor Author

holdenk commented Aug 3, 2016

Do we have interest in merging now that 2.0 is out? (cc @davies @yanboliang @MLnick )? Would be nice to do before we start adding more models to the Python ML API.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 8, 2016

ping @MLnick ?

To be mixed in with class:`pyspark.ml.JavaModel`
"""

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 2.1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes good point, this PR has been around - I'll update it here and in the other new method too.

@SparkQA
Copy link

SparkQA commented Aug 8, 2016

Test build #63376 has finished for PR 12889 at commit 9283e3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

override def write: MLWriter =
new GeneralizedLinearRegressionModel.GeneralizedLinearRegressionModelWriter(this)

override val numFeatures: Int = coefficients.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need a @Since("2.1.0") on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numFeatures has always been here its just been the default implementation - but I guess the since wouldn't be too confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a Since might actually be somewhat counter intuitive - how about a Javadoc note which says that it is now defined for this model starting in 2.1.0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok fair point - I don't feel super strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to add this don't we? Otherwise it is the only public method in this class that doesn't have it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base class has @Since("1.6.0") on the method - so it has been public since 1.6 already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that reflected in the documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MLnick
Copy link
Contributor

MLnick commented Aug 10, 2016

LGTM pending small comment on adding a since annotation.

@SparkQA
Copy link

SparkQA commented Aug 18, 2016

Test build #64016 has finished for PR 12889 at commit 5045f7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Aug 22, 2016

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Aug 22, 2016

Test build #64185 has finished for PR 12889 at commit 5045f7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Aug 22, 2016

Merged to master. Thanks!

@asfgit asfgit closed this in b264cbb Aug 22, 2016
clockfly pushed a commit to clockfly/spark that referenced this pull request Aug 23, 2016
## What changes were proposed in this pull request?

Add missing `numFeatures` and `numClasses` to the wrapped Java models in PySpark ML pipelines. Also tag `DecisionTreeClassificationModel` as Expiremental to match Scala doc.

## How was this patch tested?

Extended doctests

Author: Holden Karau <holden@us.ibm.com>

Closes apache#12889 from holdenk/SPARK-15113-add-missing-numFeatures-numClasses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants