Filebeat should fail to start when multiple filestream inputs have the same input ID #40540

cmacknz · 2024-08-15T17:49:42Z

When multiple filestream inputs exist in a Beat configuration without unique input IDs for each input, Filebeat should exit with an error.

Filestream inputs require unique IDs so that they can correctly track their state in the registry. Failing to provide unique IDs can lead to data loss and duplication.

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html

Each filestream input must have a unique ID. Omitting or changing the filestream ID may cause data duplication. Without a unique ID, filestream is unable to correctly track the state of files.

Elastic Agent already requires each input in the policy to have a unique ID. We should mirror that behavior into Filebeat for filestream inputs.

Failure at Beats startup time should log an error message telling users exactly why, including the specific duplicate inputs by name.
Parsing the whole file and dump out all duplicates at once.

elasticmachine · 2024-08-15T17:49:45Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

AndersonQ · 2024-11-14T07:18:34Z

@cmacknz is filebeat supposed to also crash if it detects a duplicated input at runtime? For example when using autodiscover?

cmacknz · 2024-11-14T17:50:00Z

I would lean towards yes but in general we should mirror what Elastic Agent does. If you use a variable provider in Elastic Agent to create two inputs where the id is populated dynamically by a provider (like the local dynamic provider) and is not unique what happens?

I would expect agent refuses to create the inputs in an obvious way but I haven't specifically tested this in a while.

AndersonQ · 2024-11-20T13:30:21Z

I was looking at the code and filestream does not receive the whole configuration with all inputs at once. So so far to have filestream to check all its inputs and report all duplicated IDs might require more refactor that it's worth.

Making filestream does not start an inout if it has a duplicated ID is an easy fix. Another alternative that seems reasonable would be to impose this restriction for all inputs (I'm not sure if it's desired) on a higher layer which has access to all inputs so it can check the IDs.

What I believe we need to avoid is having some filestream specific code on a layer that should be dealing with input specific code.
It could even be a validation just after loading the config from disk, but as I said, then I'd rather do it for all inputs than just for filestream

what do you think @cmacknz ?

cmacknz · 2024-11-20T20:04:52Z

Another alternative that seems reasonable would be to impose this restriction for all inputs (I'm not sure if it's desired) on a higher layer which has access to all inputs so it can check the IDs.

Is putting the logic at a higher level but applying it only to filestream inputs an option? There is no requirement that the logic must live in filestream, only the goal that there can't be two filestream inputs with the same ID.

AndersonQ · 2024-11-21T08:17:02Z

Is putting the logic at a higher level but applying it only to filestream inputs an option?

it's always an option, I just want to avoid adding input specific logic on a layer that shouldn't have input-specific logic

cmacknz · 2024-11-21T15:47:01Z

It is possible other inputs may want to enforce uniqueness in the future, so from an extensibility perspective putting this outside of filestream could be better.

Had we taken a wider view on this originally, we probably would have required that all filebeat inputs have a unique ID.

AndersonQ · 2024-11-21T16:14:06Z

agreed, also we already do what I was trying to avoid, so no reason to try to work it on a more generic way.

cmacknz · 2024-12-03T21:07:16Z

I created #41881 as a follow up from #41731 which doesn't address autodiscovery. It's not clear yet how much work it would be to fix this for autodiscovery, so let's investigate it and see what happens today and if it can be improved.

AndersonQ · 2024-12-04T10:32:13Z

@cmacknz one thing that isn't clear yet for me. The core issue is to prevent filestream with duplicated IDs from running. The check on start up and any possible change on autodiscover only prevent filestream from receiving duplicated IDs, it does not prevent finlestream itself from running imputs with duplicated IDs.

My take here is ideally there should be a change on filestream itself so it refuses to run inputs with duplicated IDs.

No amount of work on other components will completely prevent it from happening. Whereas it seems logic and reasonable any thing sending inputs to filestream (either the standalone config, managed config or autodiscover) should not send duplicated IDs, ultimately the definitive prevention needs to be on filestream. So should we tackle both of them, ensure autodiscover does not send duplicated ids and make filestream reject duplicated ids?

If memory does not betray me, beats autodiscover already do not send duplicated IDs to filestream. The last issue was a false positive due to the way config is validated. Currently it's done by creating an input and not running it. And this validation happens before the autodiscover ensure each new config is unique. That's why there was the false positives regarding duplicated input IDs

cmacknz · 2024-12-04T15:48:59Z

My take here is ideally there should be a change on filestream itself so it refuses to run inputs with duplicated IDs.

This makes sense to me. We flag it as an error where we can, and on paths where we can't, we don't let more than one instance of filestream run for the same ID.

AndersonQ · 2024-12-06T10:59:56Z

My take here is ideally there should be a change on filestream itself so it refuses to run inputs with duplicated IDs.

This makes sense to me. We flag it as an error where we can, and on paths where we can't, we don't let more than one instance of filestream run for the same ID.

Ok, so then, shall we have another sub-issue to tackle that? And I believe it should come before #41881. What do you think?

cmacknz · 2024-12-06T15:47:06Z

Makes sense to me, create the issue for tracking and link it here.

jlind23 · 2025-01-04T06:40:09Z

Both sub-issues have been tackled, closing this as done.

cmacknz added breaking change Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Aug 15, 2024

pierrehilbert assigned AndersonQ Nov 12, 2024

AndersonQ mentioned this issue Nov 21, 2024

filestream: validate input id on startup #41731

Merged

5 tasks

cmacknz mentioned this issue Dec 3, 2024

Investigate if it is possible to prevent inputs with non-unique IDs from being templated by autodiscovery #41881

Closed

jlind23 assigned belimawr and unassigned AndersonQ Dec 9, 2024

jlind23 closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat should fail to start when multiple filestream inputs have the same input ID #40540

Filebeat should fail to start when multiple filestream inputs have the same input ID #40540

cmacknz commented Aug 15, 2024 •

edited by AndersonQ

Loading

elasticmachine commented Aug 15, 2024

AndersonQ commented Nov 14, 2024

cmacknz commented Nov 14, 2024

AndersonQ commented Nov 20, 2024

cmacknz commented Nov 20, 2024

AndersonQ commented Nov 21, 2024

cmacknz commented Nov 21, 2024

AndersonQ commented Nov 21, 2024

cmacknz commented Dec 3, 2024

AndersonQ commented Dec 4, 2024

cmacknz commented Dec 4, 2024

AndersonQ commented Dec 6, 2024

cmacknz commented Dec 6, 2024

jlind23 commented Jan 4, 2025

Filebeat should fail to start when multiple filestream inputs have the same input ID #40540

Filebeat should fail to start when multiple filestream inputs have the same input ID #40540

Comments

cmacknz commented Aug 15, 2024 • edited by AndersonQ Loading

elasticmachine commented Aug 15, 2024

AndersonQ commented Nov 14, 2024

cmacknz commented Nov 14, 2024

AndersonQ commented Nov 20, 2024

cmacknz commented Nov 20, 2024

AndersonQ commented Nov 21, 2024

cmacknz commented Nov 21, 2024

AndersonQ commented Nov 21, 2024

cmacknz commented Dec 3, 2024

AndersonQ commented Dec 4, 2024

cmacknz commented Dec 4, 2024

AndersonQ commented Dec 6, 2024

cmacknz commented Dec 6, 2024

jlind23 commented Jan 4, 2025

cmacknz commented Aug 15, 2024 •

edited by AndersonQ

Loading