Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Validate classification dependent_variable cardinality is at lea… #51232

Conversation

dimitris-athanasiou
Copy link
Contributor

…st two

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just a few minor comments

@@ -37,9 +37,9 @@
List<RequiredField> getRequiredFields();

/**
* @return {@link Map} containing cardinality limits for the selected (analysis-specific) fields
* @return {@link List} containing cardinality limits for the selected (analysis-specific) fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/limits/constraints
?

*/
Map<String, Long> getFieldCardinalityLimits();
List<FieldCardinalityConstraint> getFieldCardinalityLimits();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Limits/Constraints
?

@@ -89,7 +89,7 @@ public void testRequiredFieldsIsEmpty() {
}

public void testFieldCardinalityLimitsIsEmpty() {
assertThat(createTestInstance().getFieldCardinalityLimits(), is(anEmptyMap()));
assertThat(createTestInstance().getFieldCardinalityLimits().isEmpty(), is(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "Matchers.empty()" matcher that could be used here.

@@ -106,7 +106,7 @@ public void testRequiredFieldsIsNonEmpty() {
}

public void testFieldCardinalityLimitsIsEmpty() {
assertThat(createTestInstance().getFieldCardinalityLimits(), is(anEmptyMap()));
assertThat(createTestInstance().getFieldCardinalityLimits().isEmpty(), is(true));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "Matchers.empty()" matcher that could be used here.

@@ -177,6 +177,7 @@ private void reindexDataframeAndStartAnalysis(DataFrameAnalyticsTask task, DataF
ActionListener<CreateIndexResponse> copyIndexCreatedListener = ActionListener.wrap(
createIndexResponse -> {
ReindexRequest reindexRequest = new ReindexRequest();
reindexRequest.setRefresh(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is now necessary as we are checking the field cardinality before we call startAnalytics which refreshes the dest index.

@dimitris-athanasiou
Copy link
Contributor Author

@przemekwitek I have addressed all your points plus fixes a bug regarding refreshing of the dest index which was caught by the tests.

@@ -492,7 +492,7 @@ private void initialize(String jobId) {
this.jobId = jobId;
this.sourceIndex = jobId + "_source_index";
this.destIndex = sourceIndex + "_results";
this.analysisUsesExistingDestIndex = randomBoolean();
this.analysisUsesExistingDestIndex = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be reverted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes of course.

@przemekwitek
Copy link
Contributor

@przemekwitek I have addressed all your points plus fixes a bug regarding refreshing of the dest index which was caught by the tests.

Please see my comment in ClassificationIT. Otherwise the PR is good to go.

…st two

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.
@dimitris-athanasiou dimitris-athanasiou force-pushed the validate-classification-dep-var-cardinality-at-least-two branch from e51393e to eab85d5 Compare January 22, 2020 12:03
@dimitris-athanasiou dimitris-athanasiou merged commit a6fa577 into elastic:master Jan 22, 2020
@dimitris-athanasiou dimitris-athanasiou deleted the validate-classification-dep-var-cardinality-at-least-two branch January 22, 2020 13:47
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Jan 22, 2020
…t lea… (elastic#51232)

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

Backport of elastic#51232
dimitris-athanasiou added a commit to dimitris-athanasiou/elasticsearch that referenced this pull request Jan 22, 2020
…t lea… (elastic#51232)

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

Backport of elastic#51232
dimitris-athanasiou added a commit that referenced this pull request Jan 22, 2020
…t lea… (#51232) (#51309)

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

Backport of #51232
dimitris-athanasiou added a commit that referenced this pull request Jan 22, 2020
…t lea… (#51232) (#51310)

Data frame analytics classification currently only supports 2 classes for the
dependent variable. We were checking that the field's cardinality is not higher
than 2 but we should also check it is not less than that as otherwise the process
fails.

Backport of #51232
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants