-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Explain data frame analytics API #49455
[ML] Explain data frame analytics API #49455
Conversation
Pinging @elastic/ml-core (:ml) |
---- | ||
{ | ||
"field_selection": [ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll come back to add this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now added this in
@alvarezmelissa87 pinging you as this is the PR that adds field selection explanation |
@szabosteve I'm updating documentation too in this PR, could you please take a look at the docs changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Some minor comments.
Also, I think you might need to black list the failure focused yaml tests for the ml with security tests
...high-level/src/test/java/org/elasticsearch/client/documentation/MlClientDocumentationIT.java
Outdated
Show resolved
Hide resolved
...high-level/src/test/java/org/elasticsearch/client/documentation/MlClientDocumentationIT.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/xpack/ml/action/TransportDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...ml/src/main/java/org/elasticsearch/xpack/ml/dataframe/extractor/ExtractedFieldsDetector.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @dimitris-athanasiou for documenting this! I left a couple of minor comments, but it LGTM.
x-pack/plugin/src/test/resources/rest-api-spec/test/ml/data_frame_analytics_info.yml
Outdated
Show resolved
Hide resolved
@@ -100,9 +100,9 @@ private MemoryUsageEstimationResult runJob(String jobId, | |||
} finally { | |||
process.consumeAndCloseOutputStream(); | |||
try { | |||
LOGGER.info("[{}] Closing process", jobId); | |||
LOGGER.debug("[{}] Closing process", jobId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be kept in sync with log-levels in AnalyticsProcessManager.java
(which are currently info
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this API is not part of the life-cycle of the job, I'd rather it stays quiet in the info level unless something goes wrong.
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/elasticsearch/xpack/ml/rest/dataframe/RestDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
...l/src/main/java/org/elasticsearch/xpack/ml/action/TransportDataFrameAnalyticsInfoAction.java
Outdated
Show resolved
Hide resolved
FieldSelection(String name, Set<String> mappingTypes, boolean isIncluded, boolean isRequired, @Nullable FeatureType featureType, | ||
@Nullable String reason) { | ||
this.name = Objects.requireNonNull(name); | ||
this.mappingTypes = Collections.unmodifiableSet(mappingTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.requireNonNull(mappingTypes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collections.unmodifiableSet
throws if it's passed null
.stream().collect(Collectors.toSet()); | ||
FieldSelection.FeatureType featureType = randomBoolean() ? null : randomFrom(FieldSelection.FeatureType.values()); | ||
String reason = randomBoolean() ? null : randomAlphaOfLength(20); | ||
return new FieldSelection(randomAlphaOfLength(10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new FieldSelection(randomAlphaOfLength(10), | |
return new FieldSelection( | |
randomAlphaOfLength(10), |
FieldSelection(String name, Set<String> mappingTypes, boolean isIncluded, boolean isRequired, @Nullable FeatureType featureType, | ||
@Nullable String reason) { | ||
this.name = Objects.requireNonNull(name); | ||
this.mappingTypes = Collections.unmodifiableSet(mappingTypes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Objects.requireNonNull
?
Also, should we use ExceptionHelper.requireNonNull
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExceptionsHelper.requireNonNull
makes sense for user facing code. We thus use it when we parse request objects. In this case we create this ourselves Objects.requireNonNull
should be enough.
.stream().collect(Collectors.toSet()); | ||
FieldSelection.FeatureType featureType = randomBoolean() ? null : randomFrom(FieldSelection.FeatureType.values()); | ||
String reason = randomBoolean() ? null : randomAlphaOfLength(20); | ||
return new FieldSelection(randomAlphaOfLength(10), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new FieldSelection(randomAlphaOfLength(10), | |
return new FieldSelection( | |
randomAlphaOfLength(10), |
This commit replaces the _estimate_memory_usage API with a new API, the _info API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why.
…documentation/MlClientDocumentationIT.java Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
…documentation/MlClientDocumentationIT.java Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
…frame/extractor/ExtractedFieldsDetector.java Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
…/dataframe/RestDataFrameAnalyticsInfoAction.java Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
666e764
to
0096136
Compare
run elasticsearch-ci/1 |
This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of elastic#49455
This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why. Backport of #49455
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.
The API consolidates information that is useful before
creating a data frame analytics job.
It includes:
Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.
Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.