-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat should fail to start when multiple filestream inputs have the same input ID #40540
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@cmacknz is filebeat supposed to also crash if it detects a duplicated input at runtime? For example when using autodiscover? |
I would lean towards yes but in general we should mirror what Elastic Agent does. If you use a variable provider in Elastic Agent to create two inputs where the id is populated dynamically by a provider (like the local dynamic provider) and is not unique what happens? I would expect agent refuses to create the inputs in an obvious way but I haven't specifically tested this in a while. |
I was looking at the code and filestream does not receive the whole configuration with all inputs at once. So so far to have filestream to check all its inputs and report all duplicated IDs might require more refactor that it's worth. Making filestream does not start an inout if it has a duplicated ID is an easy fix. Another alternative that seems reasonable would be to impose this restriction for all inputs (I'm not sure if it's desired) on a higher layer which has access to all inputs so it can check the IDs. What I believe we need to avoid is having some filestream specific code on a layer that should be dealing with input specific code. what do you think @cmacknz ? |
Is putting the logic at a higher level but applying it only to filestream inputs an option? There is no requirement that the logic must live in filestream, only the goal that there can't be two filestream inputs with the same ID. |
it's always an option, I just want to avoid adding input specific logic on a layer that shouldn't have input-specific logic |
It is possible other inputs may want to enforce uniqueness in the future, so from an extensibility perspective putting this outside of filestream could be better. Had we taken a wider view on this originally, we probably would have required that all filebeat inputs have a unique ID. |
agreed, also we already do what I was trying to avoid, so no reason to try to work it on a more generic way. |
@cmacknz one thing that isn't clear yet for me. The core issue is to prevent filestream with duplicated IDs from running. The check on start up and any possible change on autodiscover only prevent filestream from receiving duplicated IDs, it does not prevent finlestream itself from running imputs with duplicated IDs. My take here is ideally there should be a change on filestream itself so it refuses to run inputs with duplicated IDs. No amount of work on other components will completely prevent it from happening. Whereas it seems logic and reasonable any thing sending inputs to filestream (either the standalone config, managed config or autodiscover) should not send duplicated IDs, ultimately the definitive prevention needs to be on filestream. So should we tackle both of them, ensure autodiscover does not send duplicated ids and make filestream reject duplicated ids? If memory does not betray me, beats autodiscover already do not send duplicated IDs to filestream. The last issue was a false positive due to the way config is validated. Currently it's done by creating an input and not running it. And this validation happens before the autodiscover ensure each new config is unique. That's why there was the false positives regarding duplicated input IDs |
This makes sense to me. We flag it as an error where we can, and on paths where we can't, we don't let more than one instance of filestream run for the same ID. |
Ok, so then, shall we have another sub-issue to tackle that? And I believe it should come before #41881. What do you think? |
Makes sense to me, create the issue for tracking and link it here. |
Both sub-issues have been tackled, closing this as done. |
When multiple filestream inputs exist in a Beat configuration without unique input IDs for each input, Filebeat should exit with an error.
Filestream inputs require unique IDs so that they can correctly track their state in the registry. Failing to provide unique IDs can lead to data loss and duplication.
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html
Elastic Agent already requires each input in the policy to have a unique ID. We should mirror that behavior into Filebeat for filestream inputs.
The text was updated successfully, but these errors were encountered: