Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Show remote es output error state on UI #172181

Merged
merged 14 commits into from
Dec 5, 2023

Conversation

juliaElastic
Copy link
Contributor

@juliaElastic juliaElastic commented Nov 29, 2023

Summary

Relates elastic/fleet-server#3116

Relates #104986

Reading latest output health state from logs-fleet_server.output_health-default data stream by output id, and displaying error state on UI - Edit Output flyout.

Steps to verify:

  • enable feature flag remoteESOutput
  • add remote_elasticsearch output, can be a non-existent host for this test
  • add the output as monitoring output of an agent policy
  • run fleet-server with the changes here
  • enroll an agent
  • wait until fleet-server starts reporting degraded state in the output health data stream
  • open edit output flyout on UI and verify that the error state is visible
  • when the connection is back again (update host to a valid one, or remote es was temporarily down), the error state goes away
image

The UI was suggested in the design doc: https://docs.google.com/document/d/19D0bX7oURf0yms4qemfqDyisw_IYB-OVw4oU-t4lf18/edit#bookmark=id.595r8l91kaq8

Notes/suggestions:

  • We might want to add the output state to the output list as well (maybe as badges like agent health?) as it's not too visible in the flyout (have to scroll down).
  • Also the error state will be reported earliest when an agent is enrolled and fleet-server can't create api key, so not immediately when the output is added. It would be good to show the time of the last state (e.g. how we display on agents last checkin x minutes ago)
  • I think it would be beneficial to display the healthy state too.

Added badges to output list:
image

Added healthy state UI to Edit output:
image

Checklist

Delete any items that are not applicable to this PR.

@juliaElastic juliaElastic self-assigned this Nov 29, 2023
@juliaElastic juliaElastic requested a review from a team as a code owner November 29, 2023 14:55
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 29, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@juliaElastic juliaElastic added release_note:skip Skip the PR/issue when compiling release notes and removed Team:Fleet Team label for Observability Data Collection Fleet team labels Nov 29, 2023
@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

const latestHit = response.hits.hits[0]._source as any;
return {
state: latestHit.state,
message: latestHit.message ?? '',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to add the timestamp, to report the last time on the UI, in case the health reporting stopped and the state might be stale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a tooltip with the last reported time:
image

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Nov 29, 2023
@juliaElastic juliaElastic marked this pull request as draft November 29, 2023 14:59
@juliaElastic juliaElastic added the ci:cloud-deploy Create or update a Cloud deployment label Nov 29, 2023
@joshdover
Copy link
Contributor

Couple of questions:

  • How does this UX work when there are multiple Fleet Servers and some are having connectivity issues to a remote ES and others are not?
  • Could we show an overall status badge similar to the agent health badge on the output table in main Fleet Settings page? That way users can see the status problems without opening the flyout.

@juliaElastic
Copy link
Contributor Author

Couple of questions:

  • How does this UX work when there are multiple Fleet Servers and some are having connectivity issues to a remote ES and others are not?

Is it possible that the same agent is checking in to multiple fleet servers? If so, then it's possible that two fleet servers start pinging the same remote ES. If they report different state to the data stream, we might see it as oscillating between healthy and degraded on the UI.
What would be the scenario when two Fleet Servers report different status? Something like an air gapped FS and a public one?

  • Could we show an overall status badge similar to the agent health badge on the output table in main Fleet Settings page? That way users can see the status problems without opening the flyout.

Yes, this would be nice, I was thinking about this. I'll add it.

@juliaElastic juliaElastic marked this pull request as ready for review November 30, 2023 15:53
Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@juliaElastic juliaElastic requested a review from a team as a code owner December 4, 2023 08:00
@juliaElastic juliaElastic requested a review from kilfoyle December 4, 2023 08:01
@kibana-ci
Copy link
Collaborator

kibana-ci commented Dec 4, 2023

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #3 / endpoint Response Actions Responder from alerts should show Responder from alert details under alerts list page

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
fleet 949 950 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 1.2MB 1.2MB +2.5KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 151.9KB 152.3KB +353.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

juliaElastic added a commit that referenced this pull request Dec 5, 2023
## Summary

Closes #104986

Enable feature flags for `remoteESOutput` and `outputSecretsStorage`.

The feature is ready when #172181
and elastic/fleet-server#3127 is merged.

Output secret storage
[issues](#157458) are closed, so
I think the feature flag for that should be enabled too. cc
@jillguyonnet
Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@juliaElastic juliaElastic merged commit ae5e2fd into elastic:main Dec 5, 2023
@kibanamachine kibanamachine added v8.12.0 backport:skip This commit does not require backporting labels Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting ci:cloud-deploy Create or update a Cloud deployment release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants