-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Today if a Beat has an elasticsearch output with an invalid Elasticsearch output SSL configuration, the Beat sub-process will fail to start. The elastic-agent status output and Fleet health report will indicate that the component (which always has only one output) is failed allowing users to tell which output caused the problem.
For example, an elastic-agent.yml with the following configuration where "/etc/client/cert.pem" does not exist will fail to start:
outputs:
broken:
type: elasticsearch
hosts: [127.0.0.1:9200]
api_key: "example-key"
ssl:
certificate: "/etc/client/cert.pem"
key: "/etc/client/cert.key"
inputs:
- type: system/metrics
id: unique-system-metrics-input
use_output: broken
streams:
- metricsets:
- cpu
agent.monitoring:
enabled: falseThe elastic-agent status output will show the system/metrics input with the broken output name as failed with a clear error.
❯ sudo elastic-development-agent status
┌─ fleet
│ └─ status: (STOPPED) Not enrolled into Fleet
└─ elastic-agent
├─ status: (DEGRADED) 1 or more components/units in a failed state
└─ system/metrics-broken
├─ status: (HEALTHY) Healthy: communicating with pid '14069'
├─ system/metrics-broken
│ └─ status: (FAILED) could not start output: failed to reload output: open /etc/client/cert.pem: no such file or directory /etc/client/cert.pem accessing 'elasticsearch'
└─ system/metrics-broken-unique-system-metrics-input
└─ status: (STARTING) StartingWith the switch to using a collector auth extension to use the Beats HTTP transport in elastic/opentelemetry-collector-components#722 with the Elasticsearch exporter, the collector will instead exit with the error associated with the failing extension.
An equivalent collector configuration that looks like the following will cause the collector to exit:
receivers:
filelog:
include_file_name: true
include:
- "./otlp-all.json"
extensions:
beatsauth:
ssl:
enabled: true
verification_mode: none
certificate: "/etc/client/cert.pem"
key: "/etc/client/cert.key"
timeout: 9s
exporters:
elasticsearch:
endpoints:
- https://localhost:9200
password: testing
user: admin
auth:
authenticator: beatsauth
service:
extensions: [beatsauth]
pipelines:
logs:
receivers: [filelog]
processors: []
exporters: [elasticsearch]The error in this case is more vague and associated with the entire collector process:
2025-09-05T15:19:18.682-0400 error service@v0.130.0/service.go:187 error found during service initialization {"resource": {"service.instance.id": "12bc8224-d5a1-48e9-9422-2744a923b584", "service.name": "elastic-collector-components", "service.version": "0.0.1"}, "error": "failed to build extensions: failed to create extension \"beatsauth\": failed unpacking config: open /etc/client/cert.pem: no such file or directory /etc/client/cert.pem accessing config"}
go.opentelemetry.io/collector/service.New.func1
go.opentelemetry.io/collector/service@v0.130.0/service.go:187
go.opentelemetry.io/collector/service.New
go.opentelemetry.io/collector/service@v0.130.0/service.go:223
go.opentelemetry.io/collector/otelcol.(*Collector).setupConfigurationComponents
go.opentelemetry.io/collector/otelcol@v0.130.0/collector.go:197
go.opentelemetry.io/collector/otelcol.(*Collector).Run
go.opentelemetry.io/collector/otelcol@v0.130.0/collector.go:312
go.opentelemetry.io/collector/otelcol.NewCommand.func1
go.opentelemetry.io/collector/otelcol@v0.130.0/command.go:39
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.9.1/command.go:1015
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.9.1/command.go:1148
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.9.1/command.go:1071
main.runInteractive
github.com/elastic/opentelemetry-collector-components/main.go:58
main.run
github.com/elastic/opentelemetry-collector-components/main_others.go:10
main.main
github.com/elastic/opentelemetry-collector-components/main.go:51
runtime.main
runtime/proc.go:285
Error: failed to build extensions: failed to create extension "beatsauth": failed unpacking config: open /etc/client/cert.pem: no such file or directory /etc/client/cert.pem accessing config
2025/09/05 15:19:18 collector server run finished with error: failed to build extensions: failed to create extension "beatsauth": failed unpacking config: open /etc/client/cert.pem: no such file or directory /etc/client/cert.pem accessing config
When we execute the collector as a sub-process we will need some way to get this error to surface in the health report to Fleet, and associate it back to the originating output.
We discussed what to do about this in the beats receiver meeting today and concluded:
- Instead of the extension returning an error and exiting we should report the extension as failed via healthcheckv2 and the component status API.
- There needs to be an auth extension per output and we need to make sure there is a test for this. This was done in [beatreceivers] Integrate beatsauthextension #9257.
- The auth extension ID needs to be reported as part of the relevant output unit in the agent health report to Fleet
- The current configuration of the auth extension should be reported in otel-merged.yml even when that configuration is failing per [beats receivers] Surface which output configuration caused the collector to exit #9771 (comment). This will probably be accomplished by the shift to reporting failures via the component status API.