[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines #56020

benwtrent · 2020-04-30T12:18:16Z

If there are ill-formed pipelines, or other pipelines are not ready to be parsed, InferenceProcessor.Factory::accept(ClusterState) logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. enrich requires cluster metadata to be set before creating the processor).

closes #55985

elasticmachine · 2020-04-30T12:18:18Z

Pinging @elastic/ml-core (:ml)

elasticmachine · 2020-04-30T12:26:12Z

Pinging @elastic/es-core-features (:Core/Features/Ingest)

danhermann · 2020-04-30T13:34:10Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+                Map<String, Object> configMap = configuration.getConfigAsMap();
                try {
-                    Pipeline pipeline = Pipeline.create(configuration.getId(),
-                        configuration.getConfigAsMap(),
-                        ingestService.getProcessorFactories(),
-                        ingestService.getScriptService());
-                    count += pipeline.getProcessors().stream().filter(processor -> processor instanceof InferenceProcessor).count();
+                    List<Map<String, Object>> processorConfigs = ConfigurationUtils.readList(null, null, configMap, PROCESSORS_KEY);
+                    for (Map<String, Object> processorConfigWithKey : processorConfigs) {
+                        for (Map.Entry<String, Object> entry : processorConfigWithKey.entrySet()) {
+                            if (TYPE.equals(entry.getKey())) {
+                                count++;
+                            }
+                        }
+                    }


++ on this approach to getting a count of inference processors. It should fit more nicely into the ingest pipeline initialization process and be lighter weight than fully instantiating each pipeline.

One thing you might consider is that the config map for an ingest pipeline is a tree structure in which some nodes such as ForEach processors may contain child nodes. I do not know how inference processors are typically configured, but the code above will count them only if they're at the top level of the pipeline tree.

@danhermann good point...I will have to traverse the tree.

danhermann · 2020-04-30T15:43:14Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

@@ -165,6 +165,7 @@ public String getType() {

    public static final class Factory implements Processor.Factory, Consumer<ClusterState> {

+        private static final String FOREACH_PROCESSOR_NAME = "foreach";


nit: you could use ForEachProcessor.TYPE

This would require adding the ingest-common module as a dependency to the ML plugin. Did not want to do that for a single string :/

Ah, right. Definitely not worth it for a string.

danhermann · 2020-04-30T17:21:41Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+                            // Special handling as `foreach` processors allow a `processor` to be defined
+                            if (FOREACH_PROCESSOR_NAME.equals(entry.getKey())) {
+                                if (entry.getValue() instanceof Map<?, ?>) {
+                                    Object processorDefinition = ((Map<?, ?>)entry.getValue()).get("processor");
+                                    if (processorDefinition instanceof Map<?, ?>) {
+                                        if (((Map<?, ?>) processorDefinition).keySet().contains(TYPE)) {
+                                            ++count;
+                                        }
+                                    }
+                                }


The for-each processor and the onFailure directive are the only scenarios I know of that result in child processors. Both of those can be nested to an indefinite number of levels. I'm not sure how far you want to go down that rabbit hole, though.

Yeesh, yeah, thats right.

Ugh, I will do the recursion. But who on earth would have a foreach processor nested in a foreach processor :(

Yep, hence the 🐇 hole. 😃

Only allow recursion up to 10 layers. Handling on_failure and foreach. seems to work ok :D

Looks good to me. Thanks for making that change.

davidkyle · 2020-05-04T12:17:19Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+        static int numInferenceProcessors(String processorType, Map<String, Object> processorDefinition, int level) {
+            int count = 0;
+            // arbitrary, but we must limit this somehow
+            if (MAX_INFERENCE_PROCESSOR_SEARCH_RECURSIONS > 10) {


??

Suggested change

if (MAX_INFERENCE_PROCESSOR_SEARCH_RECURSIONS > 10) {

if (level > 10) {

lulz, yeah, I goofed.

davidkyle · 2020-05-04T12:42:31Z

.../plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/ingest/InferenceProcessor.java

+            if (FOREACH_PROCESSOR_NAME.equals(processorType)) {
+                Map<String, Object> innerProcessor = (Map<String, Object>)processorDefinition.get("processor");
+                if (innerProcessor != null) {
+                    for (Map.Entry<String, Object> innerProcessorWithName : innerProcessor.entrySet()) {


foreach can only have 1 processor so there is no need to iterate here.

The iteration is just for simplicity. Otherwise I will have assert size is == 1 and then get first entry, etc. Iteration here is cleaner IMO.

👍

It threw me because it is called the foreach processor and it looks like a misunderstanding, perhaps leave a comment

…eedlessly

davidkyle

LGTM

…nes (elastic#56020) If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam. It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion. Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor). closes elastic#55985

…nes (#56020) (#56126) If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam. It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion. Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor). closes #55985

…pipelines (#56020) (#56127) * [ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam. It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion. Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor). closes #55985 * fixing for backport Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

[ML] reduce InfereceProcessor.Factory log spam by not parsing pipelines

380b459

benwtrent added >bug :ml Machine learning v8.0.0 v7.8.0 v7.7.1 labels Apr 30, 2020

danhermann added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Apr 30, 2020

elasticmachine added the Team:Data Management Meta label for data/management team label Apr 30, 2020

danhermann reviewed Apr 30, 2020

View reviewed changes

handling foreach processor

b3b8457

benwtrent requested a review from danhermann April 30, 2020 15:18

danhermann reviewed Apr 30, 2020

View reviewed changes

handling recursively defined processors

a916d4d

benwtrent requested a review from danhermann April 30, 2020 18:29

davidkyle reviewed May 4, 2020

View reviewed changes

benwtrent added 2 commits May 4, 2020 09:21

fixing recursion level check

914348b

Merge branch 'master' into feature/inference-do-not-parse-pipelines-n…

3ace497

…eedlessly

benwtrent requested a review from davidkyle May 4, 2020 13:21

davidkyle approved these changes May 4, 2020

View reviewed changes

adding comment

6ae3ade

benwtrent removed :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team labels May 4, 2020

benwtrent merged commit 134c0e8 into elastic:master May 4, 2020

benwtrent deleted the feature/inference-do-not-parse-pipelines-needlessly branch May 4, 2020 16:16

benwtrent mentioned this pull request May 4, 2020

[7.x] [ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) #56126

Merged

benwtrent mentioned this pull request May 4, 2020

[7.7] [ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) #56127

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

		@@ -165,6 +165,7 @@ public String getType() {

		public static final class Factory implements Processor.Factory, Consumer<ClusterState> {

		private static final String FOREACH_PROCESSOR_NAME = "foreach";

	if (MAX_INFERENCE_PROCESSOR_SEARCH_RECURSIONS > 10) {
	if (level > 10) {

[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines #56020

[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines #56020

Uh oh!

Conversation

benwtrent commented Apr 30, 2020

Uh oh!

elasticmachine commented Apr 30, 2020

Uh oh!

elasticmachine commented Apr 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davidkyle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!