Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow pipeline processor to ignore missing pipelines #87354

Merged

Conversation

martijnvg
Copy link
Member

Add ignore_missing_pipeline option to pipeline processor.
This controls whether the pipeline processor should fail
with an error if no pipeline with a name specified in the
name option exists.

This enhancement is useful to setup a pipeline infrastructure
that lazily adds extension points for overwrites. So that
for specific cluster setups custom pre-processing can be added
at a later point in time.

Relates to #87323

Add `ignore_missing_pipeline` option to `pipeline` processor.
This controls whether the `pipeline` processor should fail
with an error if no pipeline with a name specified in the
`name` option exists.

This enhancement is useful to setup a pipeline infrastructure
that lazily adds extension points for overwrites. So that
for specific cluster setups custom pre-processing can be added
at a later point in time.

Relates to elastic#87323
@martijnvg martijnvg added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.4.0 labels Jun 3, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Jun 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@sethmlarson sethmlarson added the Team:Clients Meta label for clients team label Jun 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/clients-team (Team:Clients)

Copy link
Contributor

@joshdover joshdover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a Fleet perspective. I tested this with different combinations of these pipelines dev tools commands and everything worked as expected:

PUT _ingest/pipeline/test-pipeline
{
  "processors": [
    {
      "append": {
        "field": "hello",
        "value": "from test-pipeline"
      }
    },
    {
      "pipeline": {
        "name": "test-custom",
        "ignore_missing_pipeline": true
      }
    }
  ]
}

PUT _ingest/pipeline/test-custom
{
  "processors": [
    {
      "append": {
        "field": "hello",
        "value": "from test-custom"
      }
    },
    {
      "fail": {
        "message": "boom!"
      }
    }
  ]
}

DELETE _ingest/pipeline/test-custom

POST _ingest/pipeline/test-pipeline/_simulate
{
  "docs": [
    { "_index": "x", "_source": { "foo": "bar" } }  
  ]
}

@martijnvg martijnvg requested a review from dakrone June 3, 2022 12:57
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this Martijn!

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea.

@@ -33,10 +41,14 @@ public void execute(IngestDocument ingestDocument, BiConsumer<IngestDocument, Ex
if (pipeline != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance aspect the getPipeline() operation is basically a regular Map lookup, right?

This path will get executed a lot if this feature is used in all Fleet data stream pipelines. I just wanted to confirm that lookup failures are not particularly expensive 😄 (like lazy loading on lookup failure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance aspect the getPipeline() operation is basically a regular Map lookup, right?

Yes, this method performs a map lookup. I don't think this map lookup will be noticeable during indexing (many other operations occur during indexing, so I don't think this will be visible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement Team:Clients Meta label for clients team Team:Data Management Meta label for data/management team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to ignore a missing pipeline to the pipeline processor
8 participants