Skip to content

[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines #56020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

benwtrent
Copy link
Member

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, InferenceProcessor.Factory::accept(ClusterState) logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. enrich requires cluster metadata to be set before creating the processor).

closes #55985

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@danhermann danhermann added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Apr 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Apr 30, 2020
Comment on lines 203 to 212
Map<String, Object> configMap = configuration.getConfigAsMap();
try {
Pipeline pipeline = Pipeline.create(configuration.getId(),
configuration.getConfigAsMap(),
ingestService.getProcessorFactories(),
ingestService.getScriptService());
count += pipeline.getProcessors().stream().filter(processor -> processor instanceof InferenceProcessor).count();
List<Map<String, Object>> processorConfigs = ConfigurationUtils.readList(null, null, configMap, PROCESSORS_KEY);
for (Map<String, Object> processorConfigWithKey : processorConfigs) {
for (Map.Entry<String, Object> entry : processorConfigWithKey.entrySet()) {
if (TYPE.equals(entry.getKey())) {
count++;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on this approach to getting a count of inference processors. It should fit more nicely into the ingest pipeline initialization process and be lighter weight than fully instantiating each pipeline.

One thing you might consider is that the config map for an ingest pipeline is a tree structure in which some nodes such as ForEach processors may contain child nodes. I do not know how inference processors are typically configured, but the code above will count them only if they're at the top level of the pipeline tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danhermann good point...I will have to traverse the tree.

@benwtrent benwtrent requested a review from danhermann April 30, 2020 15:18
@@ -165,6 +165,7 @@ public String getType() {

public static final class Factory implements Processor.Factory, Consumer<ClusterState> {

private static final String FOREACH_PROCESSOR_NAME = "foreach";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could use ForEachProcessor.TYPE

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require adding the ingest-common module as a dependency to the ML plugin. Did not want to do that for a single string :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Definitely not worth it for a string.

Comment on lines 212 to 221
// Special handling as `foreach` processors allow a `processor` to be defined
if (FOREACH_PROCESSOR_NAME.equals(entry.getKey())) {
if (entry.getValue() instanceof Map<?, ?>) {
Object processorDefinition = ((Map<?, ?>)entry.getValue()).get("processor");
if (processorDefinition instanceof Map<?, ?>) {
if (((Map<?, ?>) processorDefinition).keySet().contains(TYPE)) {
++count;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for-each processor and the onFailure directive are the only scenarios I know of that result in child processors. Both of those can be nested to an indefinite number of levels. I'm not sure how far you want to go down that rabbit hole, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeesh, yeah, thats right.

Ugh, I will do the recursion. But who on earth would have a foreach processor nested in a foreach processor :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, hence the 🐇 hole. 😃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only allow recursion up to 10 layers. Handling on_failure and foreach. seems to work ok :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for making that change.

@benwtrent benwtrent requested a review from danhermann April 30, 2020 18:29
static int numInferenceProcessors(String processorType, Map<String, Object> processorDefinition, int level) {
int count = 0;
// arbitrary, but we must limit this somehow
if (MAX_INFERENCE_PROCESSOR_SEARCH_RECURSIONS > 10) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Suggested change
if (MAX_INFERENCE_PROCESSOR_SEARCH_RECURSIONS > 10) {
if (level > 10) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lulz, yeah, I goofed.

if (FOREACH_PROCESSOR_NAME.equals(processorType)) {
Map<String, Object> innerProcessor = (Map<String, Object>)processorDefinition.get("processor");
if (innerProcessor != null) {
for (Map.Entry<String, Object> innerProcessorWithName : innerProcessor.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach can only have 1 processor so there is no need to iterate here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration is just for simplicity. Otherwise I will have assert size is == 1 and then get first entry, etc. Iteration here is cleaner IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

It threw me because it is called the foreach processor and it looks like a misunderstanding, perhaps leave a comment

@benwtrent benwtrent requested a review from davidkyle May 4, 2020 13:21
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent removed :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team labels May 4, 2020
@benwtrent benwtrent merged commit 134c0e8 into elastic:master May 4, 2020
@benwtrent benwtrent deleted the feature/inference-do-not-parse-pipelines-needlessly branch May 4, 2020 16:16
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 4, 2020
…nes (elastic#56020)

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes elastic#55985
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request May 4, 2020
…nes (elastic#56020)

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes elastic#55985
benwtrent added a commit that referenced this pull request May 4, 2020
…nes (#56020) (#56126)

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes #55985
benwtrent added a commit that referenced this pull request May 13, 2020
…pipelines (#56020) (#56127)

* [ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020)

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes #55985

* fixing for backport

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] InferenceProcessor.Factory creates unnecessary log spam
5 participants