Ingest pipeline best practices #1381

philippkahr · 2025-05-07T16:45:09Z

based on the discussions here: #1052

this is my first PR against the docs, and I am building a couple of new pages. I think it makes sense to split it out. I am putting it into that part of the docs. https://www.elastic.co/docs/manage-data/ingest/transform-enrich/ingest-pipelines The tips and tricks are generic and not specific to just o11y, or security.

philippkahr · 2025-05-07T17:52:28Z

There are a couple of things I need help with.

Can someone proof read it and give suggestion on the ease of understanding.
Does the file order make sense in the way I put it? Should we do an additional subfolder?
Can you read through it please and here and there I think we can add links to different docs, like when I say remove. processor, we should link to the remove processor probably?
Not a 100% convincend that common mistakes is the correct heading

kilfoyle · 2025-05-07T18:35:52Z

Thanks a lot for opening this Philipp! I've added the "Team:Obs" label since under the new docs organization that's where the ingest content will land.

…array!

colleenmcginnis

@philippkahr I started reviewing this PR, but I didn't get very far (yet!). There's a lot of content to get into! I'm just going to post the comments/questions/suggestions I have so far so I can see if I'm on the right track. I can jump back in next week.

Some themes in my early feedback include:

I see some opportunities to simplify the examples to really emphasize the point you're making in each section.
It might be helpful to write out in plain language what the example is trying to achieve before jumping into a code snippet. (I provided a couple suggestions below.)
There are probably opportunities to remove redundant information.

manage-data/ingest/transform-enrich/common-mistakes.md

colleenmcginnis · 2025-05-30T21:03:35Z

manage-data/ingest/transform-enrich/common-mistakes.md

+### Contains operation and null check
+
+This includes an initial null check, which is not necessary.
+
+```painless
+"if": "ctx.event?.action !=null 
+&& ['bandwidth','spoofed syn flood prevention','dns authentication','tls attack prevention',
+    'tcp syn flood detection','tcp connection limiting','http rate limiting',
+    'block malformed dns traffic','tcp connection reset','udp flood detection',
+    'dns rate limiting','malformed http filtering','icmp flood detection',
+    'dns nxdomain rate limiting','invalid packets'].contains(ctx.event.action)"
+```
+
+This behaves nearly the same:
+
+```painless
+"if": "['bandwidth','spoofed syn flood prevention','dns authentication','tls attack prevention',
+        'tcp syn flood detection','tcp connection limiting','http rate limiting',
+        'block malformed dns traffic','tcp connection reset','udp flood detection',
+        'dns rate limiting','malformed http filtering','icmp flood detection',
+        'dns nxdomain rate limiting','invalid packets'].contains(ctx.event?.action)"
+```
+
+The difference is in the execution itself which should not matter since it is Java under the hood and pretty fast as this. In reality what happens is the following when doing the first one with the initial: `ctx.event?.action != null` If action is null, then it will exit here and not even perform the contains operation. In our second example we basically run the contains operation x times, for every item in the array and have `valueOfarray.contains('null')` then.


This example confuses me. Why would you want to run the contains operation n times if you already know ctx.event.action is null and it's going to return false.

manage-data/ingest/transform-enrich/common-mistakes.md

colleenmcginnis · 2025-06-11T14:07:35Z

☝️ Updating with the latest on main to hopefully get rid of all the hints here.

eedugon · 2025-06-13T11:57:20Z

Cross linking with #1727, so we can ensure the new docs complement each other properly.

stefnestor · 2025-06-13T17:20:24Z

manage-data/ingest/transform-enrich/error-handling.md

+        }
+      }
+    ],
+    "on_failure": [


Feel free to ignore, but IME when users want global error handling they also want to reroute documents to a different failure index which might be nice to include.

stefnestor · 2025-06-13T17:35:54Z

manage-data/ingest/transform-enrich/general-tips.md

+
+There are various ways to handle data in ingest pipelines, and while they all produce similar results, some methods might be more suitable depending on the specific case. This section provides guidance to ensure that your ingest pipelines are consistent, readable, and maintainable. While we won't focus heavily on performance optimizations, the goal is to create pipelines that are easy to understand and manage.
+
+## Accessing Fields in `if` Statements


"if statements" are officially called conditionals

AND I'd like to request this be flushed out to include context:

that errors inside ifs don't go to ignore_failure Conditional ingest processors should respect ignore_failure for errors in if condition elasticsearch#126005

cross-linking to top support volume is Painless exception handling for missing keys & missing values

@colleenmcginnis can you help me cross link this page https://github.com/elastic/docs-content/pull/1381/files#diff-a70e1482d1c5914e741755809833f482be84ef173010439dae33fb194fd57a55 the error-handling one then here ? I added a bit of fluff around it.

stefnestor · 2025-06-13T17:37:26Z

manage-data/ingest/transform-enrich/general-tips.md

+
+```painless
+if (ctx.user_name != null) {
+   ctx.user.name = ctx.user_name


per above this if will error if field doesn't exist (vs check is field exists but its value is NULL)

I don#t fully get it. Because this:

POST _ingest/pipeline/_simulate { "docs": [ { "_source": { "user_name": "" } }, { "_source": { "user_name": "abc" } }, { "_source": { } }], "pipeline": { "processors": [ { "script": { "source": " ctx.user = new HashMap(); ctx.user.name = ctx.user_name; " } } ] } }

works as expected, the value of user.name is then actually: null. Or am I missing something?

github-actions · 2025-06-13T21:58:04Z

🔍 Preview links for changed docs:

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

manage-data/ingest/transform-enrich/common-mistakes.md

…-pipelines Suggested copy edits for `manage-data/ingest/transform-enrich/common-mistakes.md`

colleenmcginnis

Some comments on the remaining pages, manage-data/ingest/transform-enrich/error-handling.md and manage-data/ingest/transform-enrich/general-tips.md.

colleenmcginnis · 2025-06-16T15:03:07Z

manage-data/ingest/transform-enrich/error-handling.md

+
+Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively.
+
+**Important**: Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.


Suggested change

**Important**: Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.

:::{important}

Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.

:::

colleenmcginnis · 2025-06-24T01:56:35Z

manage-data/ingest/transform-enrich/error-handling.md

+- Parsing Errors: Occur when a processor fails to parse a field, such as a date or number.
+- Missing Fields: Happen when a required field is absent in the document.
+
+:::tip


This should fix the rendering issue (seen here).

Suggested change

:::tip

:::{tip}

colleenmcginnis · 2025-06-24T01:57:56Z

manage-data/ingest/transform-enrich/error-handling.md

+
+The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline.
+
+## Global vs. Processor-Specific


Suggested change

## Global vs. Processor-Specific

## Global vs. processor-specific

colleenmcginnis · 2025-06-24T02:00:19Z

manage-data/ingest/transform-enrich/error-handling.md

+
+We can restructure the pipeline by moving the `on_failure` handling directly into the processor itself. This allows the pipeline to continue execution. In this case, the `event.category` processor still runs. You can also retain the global `on_failure` to handle errors from other processors, while adding processor-specific error handling where needed.
+
+(While executing two `set` processors within the `dissect` error handler may not always be ideal, it serves as a demonstration.)


Maybe use a note here instead of a paragraph in parentheses?

colleenmcginnis · 2025-06-24T02:17:56Z

manage-data/ingest/transform-enrich/general-tips.md

This file needs to be significantly edited down. There's a lot of overlap with the content in the Create readable and maintainable ingest pipelines page (manage-data/ingest/transform-enrich/common-mistakes.md). We should avoid duplication to avoid confusion among users and make the docs easier to maintain. Let me know if you'd like me to take a pass at de-duplicating this content.

colleenmcginnis · 2025-06-24T02:18:13Z

manage-data/ingest/transform-enrich/general-tips.md

+  serverless: ga
+---
+
+# Tips and Tricks


I'm not really a fan of this as a title for technical documentation, but I can't think of a good alternative at the moment. 🙃

colleenmcginnis · 2025-06-24T02:18:59Z

manage-data/ingest/transform-enrich/general-tips.md

+
+There are various ways to handle data in ingest pipelines, and while they all produce similar results, some methods might be more suitable depending on the specific case. This section provides guidance to ensure that your ingest pipelines are consistent, readable, and maintainable. While we won't focus heavily on performance optimizations, the goal is to create pipelines that are easy to understand and manage.
+
+## Accessing fields in `if` statements / conditionals


This section seems to duplicate information from the Create readable and maintainable ingest pipelines page. Do we need this information in two places?

colleenmcginnis · 2025-06-24T02:19:56Z

manage-data/ingest/transform-enrich/general-tips.md

+
+This works fine, as you now check for null.
+
+However there is also an easier to write and maintain alternative available:


This also feels like it's aligned with the content in Create readable and maintainable ingest pipelines. Could we integrate this information into that file using the format established there?

colleenmcginnis · 2025-06-24T02:20:17Z

manage-data/ingest/transform-enrich/general-tips.md

+
+## Remove empty fields or remove empty fields that match a regular expression
+
+Alex and Honza created a [blog post](https://alexmarquardt.com/2020/11/06/using-elasticsearch-painless-scripting-to-iterate-through-fields/) presenting painless scripts that remove empty fields or fields that match a regular expression. We are already using this in a lot of places. Most of the time in the custom pipeline and in the final pipeline as well.


I don't think we should rely on a link to an external blog post in the official documentation. We don't have control over whether that information will continue to be available and will be kept up to date.

colleenmcginnis · 2025-06-24T02:23:11Z

manage-data/ingest/transform-enrich/general-tips.md

+}
+```
+
+## Check if a value exists and is not null


Again, this feels like it's duplicating a lot of the same information in the Create readable and maintainable ingest pipelines page.

initial commit, let's see if it builds

a01a622

github-actions bot deployed to docs-preview May 7, 2025 16:45 View deployment

this should help

ce87300

github-actions bot deployed to docs-preview May 7, 2025 16:52 View deployment

reworked the md

e42cd64

github-actions bot deployed to docs-preview May 7, 2025 16:59 View deployment

Reworking the line breaks

4ef0b86

github-actions bot deployed to docs-preview May 7, 2025 17:15 View deployment

Reworked, grammar, whitespace, formatting

0c7ed69

github-actions bot deployed to docs-preview May 7, 2025 17:52 View deployment

wrong naming

545f91b

github-actions bot deployed to docs-preview May 7, 2025 17:57 View deployment

philippkahr added Team:Platform Issues owned by the Platform Docs Team documentation Improvements or additions to documentation enhancement New feature or request labels May 7, 2025

kilfoyle added Team:Obs Issues owned by the Observability Docs Team and removed Team:Platform Issues owned by the Platform Docs Team labels May 7, 2025

Marius suggested to use append which makes more sense since it is an …

bd76756

…array!

github-actions bot deployed to docs-preview May 9, 2025 10:47 View deployment

fix typo

ca4d379

philippkahr requested review from a team as code owners May 22, 2025 08:29

github-actions bot deployed to docs-preview May 22, 2025 08:33 View deployment

colleenmcginnis added the Team:Ingest Issues owned by the Ingest Docs Team label May 27, 2025

alexandra5000 self-assigned this May 28, 2025

colleenmcginnis reviewed May 30, 2025

View reviewed changes

Added Timestamp in Logstash config

afe8884

github-actions bot deployed to docs-preview June 5, 2025 09:35 View deployment

github-actions bot deployed to docs-preview June 10, 2025 09:52 View deployment

Update common mistakes

a10e333

github-actions bot deployed to docs-preview June 10, 2025 10:23 View deployment

philippkahr added 2 commits June 10, 2025 12:24

miny fixes

9498933

Remove ingest lag from this PR

3649496

github-actions bot deployed to docs-preview June 10, 2025 10:26 View deployment

philippkahr mentioned this pull request Jun 10, 2025

ingest pipeline docs "ingest lag" #1672

Open

Added tips

300fec7

github-actions bot deployed to docs-preview June 10, 2025 10:29 View deployment

Merge branch 'main' into best-practices-ingest-pipelines

9b9cc55

github-actions bot deployed to docs-preview June 11, 2025 14:07 View deployment

eedugon mentioned this pull request Jun 13, 2025

(new troubleshoot) Elasticsearch Ingest Pipelines #1727

Open

stefnestor reviewed Jun 13, 2025

View reviewed changes

fix build errors

bf922d8

github-actions bot deployed to docs-preview June 13, 2025 21:58 View deployment

copy edits

cebd954

colleenmcginnis mentioned this pull request Jun 16, 2025

Suggested copy edits for manage-data/ingest/transform-enrich/common-mistakes.md philippkahr/docs-content#1

Merged

colleenmcginnis reviewed Jun 16, 2025

View reviewed changes

manage-data/ingest/transform-enrich/common-mistakes.md Show resolved Hide resolved

Merge pull request #1 from colleenmcginnis/cmcg-best-practices-ingest…

1313996

…-pipelines Suggested copy edits for `manage-data/ingest/transform-enrich/common-mistakes.md`

github-actions bot deployed to docs-preview June 23, 2025 04:50 View deployment

Update general-tips.md

d304058

github-actions bot deployed to docs-preview June 23, 2025 14:02 View deployment

Update general-tips.md

6c393aa

github-actions bot deployed to docs-preview June 23, 2025 14:07 View deployment

colleenmcginnis reviewed Jun 24, 2025

View reviewed changes


		There are various ways to handle data in ingest pipelines, and while they all produce similar results, some methods might be more suitable depending on the specific case. This section provides guidance to ensure that your ingest pipelines are consistent, readable, and maintainable. While we won't focus heavily on performance optimizations, the goal is to create pipelines that are easy to understand and manage.

		## Accessing Fields in `if` Statements


		Ingest pipelines in Elasticsearch are powerful tools for transforming and enriching data before indexing. However, errors can occur during processing. This guide outlines strategies for handling such errors effectively.

		Important: Ingest pipelines are executed before the document is indexed by Elasticsearch. You can handle the errors occurring while processing the document (i.e. transforming the json object) but not the errors triggered while indexing like mapping conflict. For this is the Elasticsearch Failure Store.


		The `on_failure` parameter can be defined either for individual processors or at the pipeline level to catch exceptions that may occur during document processing. The `ignore_failure` option allows a specific processor to silently skip errors without affecting the rest of the pipeline.

		## Global vs. Processor-Specific


		We can restructure the pipeline by moving the `on_failure` handling directly into the processor itself. This allows the pipeline to continue execution. In this case, the `event.category` processor still runs. You can also retain the global `on_failure` to handle errors from other processors, while adding processor-specific error handling where needed.

		(While executing two `set` processors within the `dissect` error handler may not always be ideal, it serves as a demonstration.)


		There are various ways to handle data in ingest pipelines, and while they all produce similar results, some methods might be more suitable depending on the specific case. This section provides guidance to ensure that your ingest pipelines are consistent, readable, and maintainable. While we won't focus heavily on performance optimizations, the goal is to create pipelines that are easy to understand and manage.

		## Accessing fields in `if` statements / conditionals


		This works fine, as you now check for null.

		However there is also an easier to write and maintain alternative available:


		## Remove empty fields or remove empty fields that match a regular expression

		Alex and Honza created a [blog post](https://alexmarquardt.com/2020/11/06/using-elasticsearch-painless-scripting-to-iterate-through-fields/) presenting painless scripts that remove empty fields or fields that match a regular expression. We are already using this in a lot of places. Most of the time in the custom pipeline and in the final pipeline as well.

Ingest pipeline best practices #1381

Are you sure you want to change the base?

Ingest pipeline best practices #1381

Conversation

philippkahr commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philippkahr commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kilfoyle commented May 7, 2025

Uh oh!

colleenmcginnis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

colleenmcginnis commented Jun 11, 2025

Uh oh!

eedugon commented Jun 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

colleenmcginnis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

philippkahr commented May 7, 2025 •

edited

Loading

philippkahr commented May 7, 2025 •

edited

Loading

github-actions bot commented Jun 13, 2025 •

edited

Loading