Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discuss] Usage of event.original #4733

Open
ruflin opened this issue Nov 29, 2022 · 14 comments
Open

[Discuss] Usage of event.original #4733

ruflin opened this issue Nov 29, 2022 · 14 comments
Labels
Stalled Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem] Team:Fleet Label for the Fleet team [elastic/fleet]

Comments

@ruflin
Copy link
Contributor

ruflin commented Nov 29, 2022

Over the past few months there have been several discussions around how integration package and in general our data should be using the event.original fields from ECS. More and more integrations add it by copying over the message field or similar.

The part I would like to discuss here: What is our general recommendation? Do all integrations needs event.original and in general do we recommend it to add it all log events? There are some advantages of having it but there are also disadvantage like storage overhead.

I would argue, in many observability cases the field might not useful. Should a user be able to opt in / opt out of the field? Should each integration have its own logic around how to handle event.original or is this something we should centralise?

Note: There have been previous discussion on the topic but I could not find the links right now. Please provide them in the comments / update issue to have all the info in one place.

Related links

@ruflin
Copy link
Contributor Author

ruflin commented Nov 29, 2022

@andrewkroh @jsvd @ebeahan @ishleenk17 I would really like to get your thoughts on this topic.

@ruflin ruflin self-assigned this Nov 29, 2022
@ruflin
Copy link
Contributor Author

ruflin commented Nov 29, 2022

I just found two additional related issues I linked above: #994 and #777 @P1llus Pinging you on this one too as you were driving this in the past.

@P1llus
Copy link
Member

P1llus commented Nov 29, 2022

@ruflin for security integrations its pretty straight forward, as that field is often a requirement from our users.

Currently all integrations uses the same logic (A boolean in the UI for the integrations), and it already defaults to off.

A bit unsure what you mean with overhead since its disabled by default?

I added that logic a longer time ago.

The reason we also store the data there, and all our ingest pipelines starts with event.original is to also support reindex operations from event.original.

@P1llus
Copy link
Member

P1llus commented Nov 29, 2022

If there was an issue with apache then its most likely a bug more than anything?

@ruflin
Copy link
Contributor Author

ruflin commented Nov 29, 2022

Thanks @P1llus for chiming in. When I filed the issue I could not find your initial change but I'm glad I have it now. It basically means, the user can decide to opt in.

Currently all integrations uses the same logic (A boolean in the UI for the integrations), and it already defaults to off.

Is my understanding correct, that for this to work any new integration / data stream has to add it. It is not a generic feature that gets added during build time or in Fleet?

@P1llus
Copy link
Member

P1llus commented Nov 29, 2022

@ruflin that is the correct assumption, I have wanted to add this to the package-spec, but it is not possibly to write spec for ingest pipelines I believe.

So we use it for all security integrations as a requirement during review, and at least all intergrations from observability from before I did the change has that feature.

@P1llus
Copy link
Member

P1llus commented Nov 29, 2022

The requirements are the following:

  1. UI elements to support event.original and custom beat processors
    https://github.com/elastic/integrations/blob/main/packages/cisco_duo/data_stream/admin/manifest.yml#L15

  2. Support for processors at the bottom of .yml.hbs file:
    https://github.com/elastic/integrations/blob/main/packages/cisco_duo/data_stream/admin/agent/stream/httpjson.yml.hbs#L32

  3. Never modify event.original, but use it as the baseline at the top of your ingest pipeline, to support reindexing operations:
    https://github.com/elastic/integrations/blob/main/packages/cisco_duo/data_stream/admin/elasticsearch/ingest_pipeline/default.yml#L7

  4. At the end of your ingest pipeline, remove event.original if opting out:
    https://github.com/elastic/integrations/blob/main/packages/cisco_duo/data_stream/admin/elasticsearch/ingest_pipeline/default.yml#L162

@ruflin
Copy link
Contributor Author

ruflin commented Nov 30, 2022

Lets pull in @kpollich from the Fleet team to this as I would expect some of this magic could happen in Fleet (in combination with package spec).

@kpollich kpollich added Team:Fleet Label for the Fleet team [elastic/fleet] Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem] labels Jan 30, 2023
@elasticmachine
Copy link

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

Hi folks. First off, apologies for the delays in response here.

Taking a look at @P1llus's comment above I think these changes make sense, and we'll need to make a few changes in various components of the Fleet platform to support them:

UI elements to support event.original and custom beat processors

Fleet UI policy editor changes + design work for the elements if needed.

Support for processors at the bottom of .yml.hbs file

Package spec change. Is this the same as https://github.com/elastic/ingest-dev/issues/2442?

Never modify event.original, but use it as the baseline at the top of your ingest pipeline, to support reindexing operations
At the end of your ingest pipeline, remove event.original if opting out:

If I understand these points correctly, these will require changes to the ingest pipelines generated during Fleet's package installation process. Seems like from Marius's comments, we already manually adhere to these practices for many integrations (such as those he linked), but we don't globally apply these requirements today. Is that an accurate assessment of this requirement?

@nimarezainia - Could we take a look at placing this work on our roadmap? I think the processors piece is something we've already accounted for, but the other changes around event.original and its global usage across integrations is something we should handle in Fleet.

@P1llus
Copy link
Member

P1llus commented Jan 30, 2023

Thanks for the response @kpollich!

I am a bit unsure if some of these components would fit into fleet taking over the management of them (applying them globally), which is why I initially wanted to have them as part of the package-spec instead, though I am always open to ideas.

To cover some of the points:

For all the points it is important to think about backwards compatibility. For many/most of our integrations, we are currently unable to bump the minimum version of packages, unless strictly required to do so, especially for packages which are still supported by 7.x.
If we are to work on any of the below points and make fleet take over, I feel its important to let the package itself remain the same, so that they can both work on new/old versions, but we are always open to ideas.

  1. The UI elements for a boolean true/false button on event.original seems like a good candidate to something that could/should be managed inside fleet, I would say about 90% of packages from both teams have this option at this moment, but only for log, not metrics as that is less relevant for event.original I believe, at least we have not gotten any requests.

  2. I think this implementation might be a bit harder, because there is a few different components here, mainly the UI elements related to each package manifest.yml, but also the injection of these processors into the *.yml.hbs files.
    While the UI component could easily be managed by fleet, there is a few different implementation of the latter, mainly if there are already hardcoded beats processors in that template file or not, so it might be harder to automate.

  3. The concept of event.original in terms of modification is pretty straight forward. Once the data leaves the beat/agent, it is stored as a single string in the message field, we then rename the original message field to event.original, and base our ingest pipeline on event.original (and we are free to reuse the message field for other values again).

In theory if we would want to implement this on a global level, we would need to move the ingest pipeline logic (which always happens at the start of an ingest pipeline), to the initial_pipeline which I believe fleet already manages?
The issue there is what about integrations that do not want to do this, like APM, endpoint, metrics etc?

We do indeed manually implement them to this day, as it would allow to also ignore certain packages that would not be able to implement (or don't need) any of this functionality.

Let's pick up this discussion at some point, especially the fleet vs package-spec only approach, as initially I was looking for ways to "enforce" certain components, while still providing a way to "opt-out" when needed, which the package-spec and the elastic-package lint tooling is great for

@P1llus
Copy link
Member

P1llus commented Jan 30, 2023

It's also good to note that currently, event.original is a blocker for anything wanting to use synthetic sources, as they are not compatible (synthetic source will throw an error).

@ruflin
Copy link
Contributor Author

ruflin commented Oct 5, 2023

As a follow up, I filed elastic/elasticsearch#100320 in Elasticsearch. To simplify things and not having to modify all the integrations, a setting for this should exist in Elasticsearch. I'm proposing to close this issue here. @P1llus Would be great if you could take a look at the linked issue.

@botelastic
Copy link

botelastic bot commented Oct 4, 2024

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stalled Team:Ecosystem Label for the Packages Ecosystem team [elastic/ecosystem] Team:Fleet Label for the Fleet team [elastic/fleet]
Projects
None yet
Development

No branches or pull requests

4 participants