-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Management] Report agent status during checkin #23058
[Ingest Management] Report agent status during checkin #23058
Conversation
Pinging @elastic/ingest-management (Team:Ingest Management) |
/package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I think more work will need to be done to link it to each input for the configuration. At the moment this works for 7.11 being that we do not have per-input status reporting.
return errors.New(fmt.Sprintf("operator: received unexpected event '%s'", step.ID), errors.TypeConfig) | ||
} | ||
|
||
if err := handler(step); err != nil { | ||
o.statusReporter.Update(status.Failed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be easier to do these in a defered.
defer func() {
if err != nil {
o.statusReporter.Update(status.Failed)
return err
}
}()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed
[Ingest Management] Report agent status during checkin (elastic#23058)
What does this PR do?
This PR adds a basic status reporter and controller structures used to compute overall agent health.
This computed status is then used during checkin and reported to fleet.
With each new configuration status is reset to healthy to avoid reporting failures from unrelated configurations.
Discussed with @nchaulet and for now it's ok not to reset long polling request in order to shorten delay in between status changes (some backpressure or rate limiter protection would need to be applied on either of the sides to prevent harms from frequent changes)
Why is it important?
related elastic/elastic-agent#120
related elastic/kibana#71009
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.