Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't enable default processors if the environment does not require it #35244

Closed
rdner opened this issue Apr 27, 2023 · 4 comments
Closed

Don't enable default processors if the environment does not require it #35244

rdner opened this issue Apr 27, 2023 · 4 comments
Labels
Filebeat Filebeat Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@rdner
Copy link
Member

rdner commented Apr 27, 2023

Describe the enhancement:

Currently, we always add default processors like in Filebeat here:

func defaultProcessors() []mapstr.M {
// processors:
// - add_host_metadata:
// when.not.contains.tags: forwarded
// - add_cloud_metadata: ~
// - add_docker_metadata: ~
// - add_kubernetes_metadata: ~
return []mapstr.M{
{
"add_host_metadata": mapstr.M{
"when.not.contains.tags": "forwarded",
},
},
{"add_cloud_metadata": nil},
{"add_docker_metadata": nil},
{"add_kubernetes_metadata": nil},
}
}

Auditbeat:

func defaultProcessors() []mapstr.M {
// processors:
// - add_host_metadata: ~
// - add_cloud_metadata: ~
// - add_docker_metadata: ~
return []mapstr.M{
{"add_host_metadata": nil},
{"add_cloud_metadata": nil},
{"add_docker_metadata": nil},
}
}

Packetbeat:

func defaultProcessors() []mapstr.M {
// processors:
// - # Add forwarded to tags when processing data from a network tap or mirror.
// if.contains.tags: forwarded
// then:
// - drop_fields:
// fields: [host]
// else:
// - add_host_metadata: ~
// - add_cloud_metadata: ~
// - add_docker_metadata: ~
// - detect_mime_type:
// field: http.request.body.content
// target: http.request.mime_type
// - detect_mime_type:
// field: http.response.body.content
// target: http.response.mime_type
return []mapstr.M{
{
"if.contains.tags": "forwarded",
"then": []interface{}{
mapstr.M{
"drop_fields": mapstr.M{
"fields": []interface{}{"host"},
},
},
},
"else": []interface{}{
mapstr.M{
"add_host_metadata": nil,
},
},
},
{"add_cloud_metadata": nil},
{"add_docker_metadata": nil},
{
"detect_mime_type": mapstr.M{
"field": "http.request.body.content",
"target": "http.request.mime_type",
},
},
{
"detect_mime_type": mapstr.M{
"field": "http.response.body.content",
"target": "http.response.mime_type",
},
},
}
}

Some of the processors like these 3

{"add_cloud_metadata": nil},
{"add_docker_metadata": nil},
{"add_kubernetes_metadata": nil}

depend on the environment where the Beats are running. Which means if our users run the Beats without Docker, or not in the cloud, or not in Kubernetes we end up with spamming debug logs and wasting time on trying to attach metadata to every event.

We should be smarter about enabling these processors and do it only if the environment can provide the metadata that the processors are extracting.

For example, we could add a new function to these heavy processors (their packages) called Probe() bool that returns true when it makes sense to run the processor and only then add it on the list.

Describe a specific use case for the enhancement or feature:

When running Elastic Agent on a VM (no Docker) with debug logs on, you'll see:

Docker Integration: Error while extracting container ID from source path: index is out of range for field 'log.file.path'

on every single event.

@rdner rdner added Filebeat Filebeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Apr 27, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz
Copy link
Member

cmacknz commented Apr 27, 2023

For agent, configuring which global processors are enabled by default will be enabled by the shipper.

For example, we could add a new function to these heavy processors (their packages) called Probe() bool that returns true when it makes sense to run the processor and only then add it on the list.

I like this idea, generally an agent that needs add_cloud_metadata is not suddenly going to not need it without being restarted or really being a completely new installation.

In general configuring the of global processors used by default when running the agent will be enabled by the shipper, so I don't feel like we need to prioritize this since the global processor support is probably good enough to start. Standalone Beats already allow this, the default processors are just the ones that happen to be in the reference configuration file and this is the situation agent will be in soon.

@belimawr
Copy link
Contributor

Just a small detail @rdner, regarding:

When running Elastic Agent on a VM (no Docker) with debug logs on, you'll see:

On only saw that happening on a VM with Docker installed. I did not test it without Docker installed or with Docker not running.

@botelastic
Copy link

botelastic bot commented Apr 27, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Filebeat Filebeat Stalled Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

4 participants