Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if it is possible to prevent inputs with non-unique IDs from being templated by autodiscovery #41881

Closed
cmacknz opened this issue Dec 3, 2024 · 9 comments · Fixed by #41954
Assignees
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@cmacknz
Copy link
Member

cmacknz commented Dec 3, 2024

When using an autodiscovery provider with Filebeat, investigate what happens when the variable substitution would result in two filestream inputs with a non-unique ID. If Filebeat allows the creation of multiple filestream inputs with the same ID, propose a way to prevent this from happening.

Elastic Agent forces the IDs to be unique when using the local dynamic provider, below is a simple example of an agent policy creating this situation. If the my_file variable is omitted, only a single input is created. Creating inputs with the same ID is impossible in Elastic Agent and we want the same behaviour in Filebeat.

For an example of the problem this is trying to prevent, a common mistake was setting the input ID on Kubernetes to kubernetes-container-logs-${kubernetes.pod.name} when an input per running container is used instead of the unique kubernetes-container-logs-${kubernetes.pod.name}-${kubernetes.container.id}

outputs:
  default:
    type: elasticsearch
    hosts: [127.0.0.1:9200]
    api_key: "example-key"
    preset: balanced

inputs:
  - type: filestream
    id: ${local_dynamic.my_var}
    data_stream:
      dataset: generic
    paths:
      - /var/log/${local_dynamic.my_file}
  
providers:
 local_dynamic:
   enabled: true
   items:
     - vars:
         my_var: A
         my_file: a.log
     - vars:
         my_var: A
         my_file: b.log
     - vars:
         my_var: A
         my_file: c.log

This results in the following running inputs:

./elastic-agent status --output=full
┌─ fleet
│  └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
   ├─ status: (HEALTHY) Running
   ├─ info
   │  ├─ id: 88d066e9-f099-4769-9416-f20f1792d081
   │  ├─ version: 8.15.2
   │  └─ commit: 621bbc685c2de8a2ad2c7c7f7cc1eb5fcf77d6d4
   ├─ filestream-default
   │  ├─ status: (HEALTHY) Healthy: communicating with pid '96292'
   │  ├─ filestream-default
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: OUTPUT
   │  ├─ filestream-default-A-local_dynamic-0
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: INPUT
   │  ├─ filestream-default-A-local_dynamic-1
   │  │  ├─ status: (HEALTHY) Healthy
   │  │  └─ type: INPUT
   │  └─ filestream-default-A-local_dynamic-2
   │     ├─ status: (HEALTHY) Healthy
   │     └─ type: INPUT
@cmacknz cmacknz added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Dec 3, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz cmacknz changed the title Confirm that duplicate inputs templated using autodiscover with non-unique IDs cannot be created Investigate if it is possible to prevent inputs with non-unique IDs from being templated by autodiscover Dec 3, 2024
@cmacknz cmacknz changed the title Investigate if it is possible to prevent inputs with non-unique IDs from being templated by autodiscover Investigate if it is possible to prevent inputs with non-unique IDs from being templated by autodiscovery Dec 3, 2024
@jlind23 jlind23 assigned belimawr and unassigned AndersonQ Dec 9, 2024
@belimawr
Copy link
Contributor

I've been looking on how the Elastic-Agent implements ID uniqueness for its inputs. Every provider has to add or update its variables into the dynamicProviderState via the AddOrUpdate method. This method requires an ID, that should be unique. Later when the coordinator render the inputs, it generates the final input ID by concatenating the current input ID (after rendering) with the provider name and the ID the provider gave to its set of variables.

Example:

inputs:
  - type: filestream
    id: user-defined-ID-${local_dynamic.my_var}
    data_stream:
      dataset: generic
    paths:
      - /var/log/foo.log
  
providers:
 local_dynamic:
   enabled: true
   items:
     - vars:
         my_var: A

The local provider used the index in the items array as its ID, so once the input is rendered we have:

id: user-defined-ID-A

This gets concatenated with the provider's name and the ID it gave to the variables (0 in this case), which leaves us with the final ID:

user-defined-ID-A-local_dynamic-0

For other providers like Kubernetes or Docker, they use an UID of the element being discovered, which is always unique.

I believe we can implement a similar logic into Beats autodiscover: whenever it renders a template, the provider will provide an unique ID alongside each set of variables, whenever they're used to render an input template, this appended at the end of the input ID together with the provider name. If there was no ID defined in the input template, it's considered as being an empty string and added to the final configuration.

It is important to keep in mind this will be a breaking change for the Filestream input because the final ID will be different than what is defined in the autodiscovery configuration, which will have the side effect of re-ingesting some files when upgrading from 8.x to 9.x.

@belimawr
Copy link
Contributor

When using an autodiscovery provider with Filebeat, investigate what happens when the variable substitution would result in two filestream inputs with a non-unique ID.

Once #41954 is merged,only the first input to be created will run, all others inputs with duplicated IDs will fail on creation.

@cmacknz
Copy link
Member Author

cmacknz commented Dec 17, 2024

I believe we can implement a similar logic into Beats autodiscover: whenever it renders a template, the provider will provide an unique ID alongside each set of variables, whenever they're used to render an input template, this appended at the end of the input ID together with the provider name. If there was no ID defined in the input template, it's considered as being an empty string and added to the final configuration.

It is important to keep in mind this will be a breaking change for the Filestream input because the final ID will be different than what is defined in the autodiscovery configuration, which will have the side effect of re-ingesting some files when upgrading from 8.x to 9.x.

Is there a way to do this only if there isn't an ID already defined for the filestream input? Having everybody's Kubernetes logs unconditionally re-ingested on upgrade to 9.0 would not be very nice. I think the "If there was no ID defined in the input template" might cover this but wanted to double check.

@belimawr
Copy link
Contributor

Is there a way to do this only if there isn't an ID already defined for the filestream input?

Well, even if there is an filestream input defined, it could not be unique :/ The example you put in the issue description is a good one, if the Filestream input ID is defined as

id: kubernetes-container-logs-${kubernetes.pod.name}

the ID will be present but it won't be unique.

Having everybody's Kubernetes logs unconditionally re-ingested on upgrade to 9.0 would not be very nice.

I definitely agree with this.

The biggest problem is that the autodiscover does not know anything about Filestream or which inputs are running/IDs are in use. So if we modify the ID to ensure uniqueness (like Elastic-Agent does), we will fall into the re-ingestion case upon upgrade.

Implementing something only for the case where IDs are not defines, won't work. Not having an ID defined for the Filestream input leads to data duplication, this has been solved for the Kubernetes autodiscover with documentation and examples, so at this point in time, I believe most, if not all users, already have unique IDs set.

There might be a hacky way out of this by validating the input config and re-trying. Let me investigate more.

@leehinman
Copy link
Contributor

Not sure if this helps or not, but the metrics namespace is global, so autodiscover could "check" to see if the metrics ID already exists for other "filestream" configed inputs. autodiscover would have to know that it is making a filestream id though.

@belimawr
Copy link
Contributor

Not sure if this helps or not, but the metrics namespace is global, so autodiscover could "check" to see if the metrics ID already exists for other "filestream" configed inputs. autodiscover would have to know that it is making a filestream id though.

That could work, but it feels odd to have the metrics registry as a source of truth for currently running inputs.

Anyways, we were talking during our team meeting today and decided that #41954 is enough, we don't need to be generating unique IDs inside the autodiscover. The goal is to make sure we're not running duplicated IDs and we correctly inform the user.

@belimawr
Copy link
Contributor

I looked how the autodiscover behaves when generating Filestream inputs with duplicated IDs and the changes from #41954.

The autodiscover will keep retrying to start inputs until all inputs are successfully started, with the default settings, this happens every 10s.

On a normal execution (all inputs are successfully started), Kubernetes events are gathered as they happen, inputs configurations are generated, however the inputs are only started every second (debounce period).

However when there is an error, we wait for 10s (10 * debouncePeriod). So having inputs that can never start will significantly increase the delay between an K8s event and the resulting input being started/stopped.

Because there is no way to know the specific inputs that failed, I can implement a logic similar to what #41954 does for inputs from inputs.d: if all errors are of type "not reloadable", then we don't try reloading. If at least one error is not of this type, then we retry with the 10*debouncePeirod delay (current behaviour).

What do you think @cmacknz? I could even add it as part of #41954, so once it is merged, everything is consistent. I'll also make sure to add a log error stating inputs won't be retried.

@cmacknz
Copy link
Member Author

cmacknz commented Dec 18, 2024

Yeah I think that makes sense, assuming the autodiscovery process will also log the "not reloadable" error that indicates filestreams are being duplicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants