-
Notifications
You must be signed in to change notification settings - Fork 203
[8.19] (backport #8248) feat: hybrid elastic agent invoke collector as subprocess #8713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* feat: enable support for healthcheckv2 extension * feat: add edot-supervised agent sub-command * feat: refactor otel manager to invoke collector as subprocess * fix: staticcheck QF1003 in unmarshalLevel of log_writer * doc: add changelog fragment * fix: add a random UUID as name of the healthcheck extension to avoid conflicts * feat: consolidate otel and otel-supervised subcommands * fix: use http.DefaultClient directly * fix: add require failure message * fix: rename otelSetSupervised to otelSetSupervisedFlagName and improve description * fix: improve documentation * fix: extract otel settings preparation in separate function * fix: allocate healthCheckExtensionID in idiomatic way * fix: update NOTICE files * fix: exclude extensions from otel to beats status processing * fix: always emit statuses * fix: emit statuses only if there is a change in the event and subcomponents status/error * fix: denoise code * feat: reintroduce running collector embedded * fix: update NOTICE.txt * fix: replace runtime with execution * fix: clean up commented code from TestCompareAggregateStatuses * fix: removed changelog * fix: exclude extensions from getOtelRuntimePipelineStatuses * fix: pass elastic-agent logging level to supervised collector * fix: couple embedded collector context with parent one * fix: increase interval and max failed attempts of healthcheck v2 polling * fix: make exceeding failed attempts a recoverable error and don't give up * feat: add recovery support for supervised edot * feat: rework health check fail to connect threshold * feat: add license headers in recovery_backoff.go and recovery_noop.go * fix: handle races in otel manager tests * feat: support resetting to initial backoff interval for recoveryBackoff * fix: correct comments * fix: make recovery backoff unit tests more robust on OS with lower time resolution * fix: format code after resolving conflicts (cherry picked from commit c56581d) # Conflicts: # internal/pkg/otel/manager/manager.go # magefile.go
|
Cherry-pick of c56581d has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|




What does this PR do?
This PR introduces an architectural enhancement to how the Elastic Agent launches and supervises its internal OpenTelemetry (OTEL) collector:
- Adds a newAfter review comments: consolidated with the existingedot-supervisedsub-command to the Elastic Agent, enabling delegated startup of the OTEL collector with direct integration to the supervisor.otelcommandhealthcheckv2extension in the OTEL configuration to enhance lifecycle and health signal handling from collector components.otelmanagertest logic to use time-based event checks, ensuring better guarantees.This work sets the foundation for better fault isolation, process recovery, and alignment with existing process invocation model of elastic agent.
Why is it important?
Running the collector as a supervised subprocess provides improved fault isolation; a crash in the collector won't directly affect the main agent process.
Checklist
./changelog/fragmentsusing the changelog toolDisruptive User Impact
No immediate user impact is expected. The functionality remains the same as when running the collector directly inside the agent process. The new supervision model is internal and gated behind the
edot-supervisedcommand.How to test this PR locally
Related issues
N/A
This is an automatic backport of pull request #8248 done by [Mergify](https://mergify.com).