-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new analysis type: classification #46537
Conversation
0193e12
to
6f92943
Compare
0f6518c
to
1429840
Compare
ede5e3d
to
07ffd52
Compare
e40d5a3
to
1d71028
Compare
Pinging @elastic/ml-core |
run elasticsearch-ci/bwc |
747d4bd
to
96c48a1
Compare
...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java
Outdated
Show resolved
Hide resolved
...in/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Classification.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java
Outdated
Show resolved
Hide resolved
...plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/dataframe/analyses/Regression.java
Outdated
Show resolved
Hide resolved
if (analysis instanceof Classification) { | ||
Classification classification = (Classification) analysis; | ||
return new DatasetSplittingCustomProcessor( | ||
fieldNames, classification.getDependentVariable(), classification.getTrainingPercent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It almost seems like we need a new interface for the different analysis.
Unsupervised vs supervised... But that might be a future refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It almost seems like we need a new interface for the different analysis.
Yes, we may end up doing that.
But that might be a future refactoring
Agree, let's not add more interfaces too early.
...asticsearch/xpack/ml/dataframe/process/customprocessing/DatasetSplittingCustomProcessor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to have @dimitris-athanasiou give it a once over :). I don't see any major problems
dde7a9f
to
4a2583f
Compare
c35932e
to
ba8bd1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Left a few minor comments.
(Integer) a[7], | ||
(Double) a[8])); | ||
parser.declareString(constructorArg(), DEPENDENT_VARIABLE); | ||
BoostedTreeParams.declareFields(parser); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever trick for reusing code.
However, this made me wonder whether those params should be in a nested object. It'd be ugly though, wouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a matter of taste ;)
Parsing code would actually become a bit cleaner as I could just declare the BoostedTreeParams
field here and it would have its own parser.
However, with nested object:
- we need to double-check which parameters we want to move there. I just picked the obvious ones but maybe e.g.
dependentVariable
should live there as well? - we need to add BWC handling
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree we can leave it as is.
} | ||
|
||
public Classification(String dependentVariable) { | ||
this(dependentVariable, new BoostedTreeParams(null, null, null, null, null), null, null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add a default constructor for BoostedTreeParams
to avoid those nulls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
this.dependentVariable = ExceptionsHelper.requireNonNull(dependentVariable, DEPENDENT_VARIABLE); | ||
this.boostedTreeParams = ExceptionsHelper.requireNonNull(boostedTreeParams, BoostedTreeParams.NAME); | ||
this.predictionFieldName = predictionFieldName; | ||
this.numTopClasses = numTopClasses; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does num_top_classes
have a fixed default value? If so we should set it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I think the default value should be "0".
eta = in.readOptionalDouble(); | ||
maximumNumberTrees = in.readOptionalVInt(); | ||
featureBagFraction = in.readOptionalDouble(); | ||
boostedTreeParams = new BoostedTreeParams(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add BWC handling here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code (as it is written right now) is backward-compatible as the sequence of StreamInput reads in the old version is the same as in the new version (the new version has the reads wrapped in the new BoostedTreeParams(in)
constructor.
It would change, however, if I introduced a nested object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, true.
out.writeOptionalDouble(eta); | ||
out.writeOptionalVInt(maximumNumberTrees); | ||
out.writeOptionalDouble(featureBagFraction); | ||
boostedTreeParams.writeTo(out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BWC handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run elasticsearch-ci/bwc |
374f275
to
18ee05b
Compare
run elasticsearch-ci/bwc |
run elasticsearch-ci/default-distro |
18ee05b
to
08a9fc1
Compare
08a9fc1
to
00541d9
Compare
Implement new analysis type:
Classification
.Also, extract the common parameters between
Classification
andRegression
to a separate class:BoostedTreeParams
.This PR is not fully functional until changes on C++ are made (WIP).
However, I've sent it to review to gather feedback about the Java part.
Relates #46735