Implement new analysis type: classification #46537

przemekwitek · 2019-09-10T12:07:04Z

Implement new analysis type: Classification.
Also, extract the common parameters between Classification and Regression to a separate class: BoostedTreeParams.

This PR is not fully functional until changes on C++ are made (WIP).
However, I've sent it to review to gather feedback about the Java part.

Relates #46735

elasticmachine · 2019-09-26T14:03:50Z

Pinging @elastic/ml-core

przemekwitek · 2019-09-26T14:23:43Z

run elasticsearch-ci/bwc
run elasticsearch-ci/default-distro

...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java

benwtrent · 2019-09-27T12:33:45Z

...va/org/elasticsearch/xpack/ml/dataframe/process/customprocessing/CustomProcessorFactory.java

+        if (analysis instanceof Classification) {
+            Classification classification = (Classification) analysis;
+            return new DatasetSplittingCustomProcessor(
+                fieldNames, classification.getDependentVariable(), classification.getTrainingPercent());


It almost seems like we need a new interface for the different analysis.

Unsupervised vs supervised... But that might be a future refactoring

It almost seems like we need a new interface for the different analysis.

Yes, we may end up doing that.

But that might be a future refactoring

Agree, let's not add more interfaces too early.

...asticsearch/xpack/ml/dataframe/process/customprocessing/DatasetSplittingCustomProcessor.java

benwtrent

might be good to have @dimitris-athanasiou give it a once over :). I don't see any major problems

dimitris-athanasiou

Looks good! Left a few minor comments.

dimitris-athanasiou · 2019-10-03T09:43:15Z

...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java

+                (Integer) a[7],
+                (Double) a[8]));
+        parser.declareString(constructorArg(), DEPENDENT_VARIABLE);
+        BoostedTreeParams.declareFields(parser);


Clever trick for reusing code.

However, this made me wonder whether those params should be in a nested object. It'd be ugly though, wouldn't it?

It's a matter of taste ;)
Parsing code would actually become a bit cleaner as I could just declare the BoostedTreeParams field here and it would have its own parser.

However, with nested object:

we need to double-check which parameters we want to move there. I just picked the obvious ones but maybe e.g. dependentVariable should live there as well?

we need to add BWC handling

WDYT?

Yeah, I agree we can leave it as is.

dimitris-athanasiou · 2019-10-03T09:43:57Z

...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java

+    }
+
+    public Classification(String dependentVariable) {
+        this(dependentVariable, new BoostedTreeParams(null, null, null, null, null), null, null, null);


Perhaps add a default constructor for BoostedTreeParams to avoid those nulls?

dimitris-athanasiou · 2019-10-03T09:45:21Z

...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java

+        this.dependentVariable = ExceptionsHelper.requireNonNull(dependentVariable, DEPENDENT_VARIABLE);
+        this.boostedTreeParams = ExceptionsHelper.requireNonNull(boostedTreeParams, BoostedTreeParams.NAME);
+        this.predictionFieldName = predictionFieldName;
+        this.numTopClasses = numTopClasses;


Does num_top_classes have a fixed default value? If so we should set it explicitly.

Done.

I think the default value should be "0".

dimitris-athanasiou · 2019-10-03T09:48:11Z

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java

-        eta = in.readOptionalDouble();
-        maximumNumberTrees = in.readOptionalVInt();
-        featureBagFraction = in.readOptionalDouble();
+        boostedTreeParams = new BoostedTreeParams(in);


We need to add BWC handling here.

I think the code (as it is written right now) is backward-compatible as the sequence of StreamInput reads in the old version is the same as in the new version (the new version has the reads wrapped in the new BoostedTreeParams(in) constructor.
It would change, however, if I introduced a nested object.

dimitris-athanasiou · 2019-10-03T09:48:21Z

...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java

-        out.writeOptionalDouble(eta);
-        out.writeOptionalVInt(maximumNumberTrees);
-        out.writeOptionalDouble(featureBagFraction);
+        boostedTreeParams.writeTo(out);


BWC handling.

See my other comment.

dimitris-athanasiou

LGTM

przemekwitek · 2019-10-03T15:10:25Z

run elasticsearch-ci/bwc

przemekwitek · 2019-10-04T07:40:22Z

run elasticsearch-ci/bwc

przemekwitek · 2019-10-04T07:50:39Z

run elasticsearch-ci/default-distro

przemekwitek added the WIP label Sep 10, 2019

przemekwitek force-pushed the classification branch 2 times, most recently from 0193e12 to 6f92943 Compare September 17, 2019 05:58

przemekwitek force-pushed the classification branch 3 times, most recently from 0f6518c to 1429840 Compare September 20, 2019 08:20

przemekwitek force-pushed the classification branch 4 times, most recently from ede5e3d to 07ffd52 Compare September 26, 2019 13:38

przemekwitek removed the WIP label Sep 26, 2019

przemekwitek marked this pull request as ready for review September 26, 2019 13:42

przemekwitek force-pushed the classification branch 3 times, most recently from e40d5a3 to 1d71028 Compare September 26, 2019 14:01

przemekwitek added :ml Machine learning >feature v7.5.0 v8.0.0 labels Sep 26, 2019

przemekwitek mentioned this pull request Sep 27, 2019

[ML] Introduce classification analysis type #46735

Closed

9 tasks

przemekwitek force-pushed the classification branch from 747d4bd to 96c48a1 Compare September 27, 2019 09:23

benwtrent reviewed Sep 27, 2019

View reviewed changes

benwtrent approved these changes Sep 27, 2019

View reviewed changes

przemekwitek force-pushed the classification branch 5 times, most recently from dde7a9f to 4a2583f Compare October 2, 2019 09:47

przemekwitek force-pushed the classification branch 2 times, most recently from c35932e to ba8bd1d Compare October 2, 2019 10:09

dimitris-athanasiou self-requested a review October 3, 2019 08:01

dimitris-athanasiou reviewed Oct 3, 2019

View reviewed changes

dimitris-athanasiou approved these changes Oct 3, 2019

View reviewed changes

przemekwitek force-pushed the classification branch 2 times, most recently from 374f275 to 18ee05b Compare October 4, 2019 07:26

przemekwitek added 3 commits October 4, 2019 09:58

Implement new analysis type: classification

eef788d

Implement HLRC

d258b1e

Apply review comments

f321d4f

przemekwitek force-pushed the classification branch from 18ee05b to 08a9fc1 Compare October 4, 2019 07:58

Apply review comments

00541d9

przemekwitek force-pushed the classification branch from 08a9fc1 to 00541d9 Compare October 4, 2019 08:40

przemekwitek merged commit 1fc8dd2 into elastic:master Oct 4, 2019

przemekwitek deleted the classification branch October 4, 2019 09:46

przemekwitek added a commit to przemekwitek/elasticsearch that referenced this pull request Oct 4, 2019

Implement new analysis type: classification (elastic#46537)

efcc4d1

przemekwitek mentioned this pull request Oct 4, 2019

[7.x] Implement new analysis type: classification (#46537) #47559

Merged

przemekwitek added a commit that referenced this pull request Oct 4, 2019

[7.x] Implement new analysis type: classification (#46537) (#47559)

ec9b77d

Mpdreamz mentioned this pull request Nov 19, 2019

[meta] 7.5 release elastic/elasticsearch-net#4232

Closed

24 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement new analysis type: classification #46537

Implement new analysis type: classification #46537

przemekwitek commented Sep 10, 2019 •

edited

Loading

elasticmachine commented Sep 26, 2019

przemekwitek commented Sep 26, 2019

benwtrent Sep 27, 2019

przemekwitek Sep 27, 2019

benwtrent left a comment

dimitris-athanasiou left a comment

dimitris-athanasiou Oct 3, 2019

przemekwitek Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

przemekwitek Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

przemekwitek Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

przemekwitek Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

dimitris-athanasiou Oct 3, 2019

przemekwitek Oct 3, 2019

dimitris-athanasiou left a comment

przemekwitek commented Oct 3, 2019

przemekwitek commented Oct 4, 2019

przemekwitek commented Oct 4, 2019

Implement new analysis type: classification #46537

Implement new analysis type: classification #46537

Conversation

przemekwitek commented Sep 10, 2019 • edited Loading

elasticmachine commented Sep 26, 2019

przemekwitek commented Sep 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benwtrent left a comment

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

przemekwitek commented Oct 3, 2019

przemekwitek commented Oct 4, 2019

przemekwitek commented Oct 4, 2019

przemekwitek commented Sep 10, 2019 •

edited

Loading