-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache ingest pipeline doesn't tolerate existing "event.original" field #3451
Comments
Hi @jsvd, Is the issue only for "apache" integration. What happens for the other integrations which also renames "message" to "event.original"? |
@lalit-satapathy : This issue will be for integrations which are using rename but are not using either of the below:
|
LS will be resolving this issue with : https://github.com/logstash-plugins/logstash-input-elastic_agent/issues/3 |
Following up here: Logstash will work on ensuring the Logstash's elastic_agent input doesn't modify data coming from Elastic Agent. |
Some quick greps show that there are currently 213 pipelines with a rename processor that moves "message" to "event.original":
More than half of these skip the rename if the "message" field exists through the use of
Having only the A few already are protected against the existence of "event.original" through the use of a conditional checking if "event.original" isn't null:
@ishleenk17 do you think it's worth broadening the scope of this issue or create a new one to standardize how integrations pipelines handle "message" and "event.original"? |
We really need to find a way to have a standardised setting. We ran into this using Logstash, which adds the Maybe we can find a way to leverage some form of tests that make sure that a skeleton of the integrations looks the same? This also goes a long way to the settings that every integration should offer, such as tags, processors... |
Hi, Filebeat version 8.5 When will the problem be fixed? |
@jsvd can share the ETA for https://github.com/logstash-plugins/logstash-input-elastic_agent/issues/3. this looks to be important to avoid any further SDH's. |
hi folks, on our side we're close to merging logstash-plugins/logstash-input-beats#464 (comment) which will allow disabling several kinds of enrichments, including the event.original. |
Starting with Logstash 8.7.0, the Agent/Beats inputs now support an
This closes the loop on the Logstash side, I believe there's still a goal to resolve the inconsistent |
I am confused by using Can Logstash, Ingest team and Integrations developers sync up in order to document what users should do to make data flow from Elastic Agents to Logstash to Elasticsearch properly? |
Logstash has always added metadata to the event it receives, and in turn (due to ECS), creates event.original. The While this was originally added to help with the lack of consistency in |
@joshdover can you track this to make sure that all ingest pipeline can deal with an already existing event.original? |
@mrodm What do you think the best way to handle this across all integrations would be? I think we could add a linting/validation rule to package-spec to avoid a rename processor without checking if I'm not 100% sure we can catch all cases this way, but it would be a good place to start. A foolproof option would be a test case that attempts to ingest data with a non-empty event.original and verify there are no pipeline errors. This seem expensive to run and probably not worth the effort right now? |
@josh adding this new validation rule or lint to the package-spec would be a breaking change. It would make all the packages that do not fit that to fail.
So, something like it has been done in this PR: #7026
Another option, it would be adding "logstash" to the stack (somehow optional). This should be optional since we should keep testing packages without logstash. Doing that, it could be checked how packages behave with that @elastic/ecosystem please chime in here if I miss anything |
@mrodm Let's go with the first option. Can you open the appropriate package-spec issues? |
@josh sure! issues created
Thinking about the new validation rule in package-spec, should that issue be added as part of the Package Spec V3 elastic/package-spec#539 ? This is likely to introduce a breaking change for existing packages. @joshdover @jsoriano |
@joshdover I will plan the issues that Mario mentioned above. What should we do with this specific integration's issue? |
Thanks, @jlind23. @ishleenk17 Are you still planning to work on this? How should we get this prioritized on the Infra Obs team? |
@joshdover: I went through the above discussions. I am trying to articulate my understanding regarding the 3 aspects of the solution: LogstashAt the Logstash side, there is an enrich flag, which is disabled by default, hence not making any changes to the data coming in from Beats (Solving the actual issue). IntegrationsTo fix the above-discussed issue(when enrich is enabled), changes are needed in the Integrations, where we have to add both of the 2 checks ( Elastic-PackageTo ensure that there is no rename processor without either of these 2 checks, package-spec changes are being done as well, where a developer will get an error if the rename processor misses a check. Is my understanding correct? |
Thanks @ishleenk17. That matches my understanding, thanks for putting the pieces together 🕵️ |
If we need both these checks, we may end of changing most of the existing ingest pipeline for this change.
Will this check, when added fail existing packages, which will force to be updated? |
In which case, does a data stream requires a rename processor for message to event.original? Is this applicable for all log data streams? If most data streams will need to add such a rename processor for message to event.original, and too, in the same format (with ignore_missing and if check), can we explore an option to add them into, a default ingest pipeline? |
Most of the log datastreams do use the rename processor from message to event.original as the grok pattern is then applied on top of event.original. In case we make it the default option, we need to handle these scenarios. |
Preserve original event will just not remove the event.original field in the end. @P1llus : do you have other thoughts on this since you implemented the preserve original event? |
The only thing that is mandatory in my view is, that the preserve original event, needs to happen at the earliest stage (the first processor in the ingest pipeline), like with the rename processor on various other integrations already. If e.g. mongoDB does not have a |
The processor looks like this that checks for preserve original event
If event.original is not present, then the ignore_missing flag will just skip the processor. |
But how does this work in the MongoDB case, when there is no As long as this toggle exists, doesn't that make it mandatory that every integration has the |
@joshdover @ishleenk17 @philippkahr There is a few pointers here that needs to be considered, and we should provide a short and long term solution.
Some more long term fixes/possibilities that I see we should consider:
|
Logstash has already made the chnage by adding the enrich flag and taking action accordingly. #3451 (comment)
Yes, this is what we need to do now for all Integrations missing this check.
Correct, we can have it as a long term plan |
I've added a note to an open PR that also updates the final pipeline to include this new processor too: elastic/kibana#167318 |
From the apache ingest pipeline (link):
The rename processor documentation states:
This means that any document that already has an "event.original" field (with or without a "message") field will cause an ingestion error:
A suggestion is to tolerate the presence of "event.original" and "message" fields by including an
if
condition in the rename processor.The text was updated successfully, but these errors were encountered: