Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps] Visualize alerting metrics in Stack Monitoring #123726

Merged
merged 125 commits into from
Jun 9, 2022

Conversation

chrisronline
Copy link
Contributor

@chrisronline chrisronline commented Jan 25, 2022

Relates to #122367

This is the final PR for the initial deliverable for the o11y of alerting initiative.

It involves using the indexed data from the previous work to show meaningful visualizations to the user within the Stack Monitoring UI. For now, we are adhering to existing design mechanisms within Stack Monitoring and simply adding to existing pages.

The metrics being added are:

  • Rule Failures Rate
  • Rule Executions Rate
  • Action Failures Rate
  • Action Executions Rate
  • Rule Overdue Count
  • Average Rule Overdue Delay
  • Worst Rule Overdue Delay
  • Action Overdue Count
  • Average Action Overdue Delay
  • Worst Action Overdue Delay

Testing

These metrics are only available through Metricbeat-based collection so testers will need to standup a local Metricbeat instance, configured to talk to their locally running Kibana.

I don't know about Metricbeat nightly binaries or how that works, so I just build from source - instructions on doing this are available in their docs.

Once you are able to build locally, you'll want to ensure you have the proper modules enabled:

CBR-MBP:metricbeat chris$ ./metricbeat modules list
Enabled:
elasticsearch-xpack
kibana-xpack

with configuration that points to your local environment:

- module: kibana
  xpack.enabled: true
  period: 10s
  hosts: ["https://localhost:5601"]
  username: "elastic"
  password: "changeme"
  ssl.verification_mode: none

(Note: You need to run the Elasticsearch module as well as the Stack Monitoring UI only works for Kibana data if there is also Elasticsearch data)

Start Metricbeat after this and visit the Stack Monitoring UI to verify data is properly flowing.

Screenshots

Screen Shot 2022-03-16 at 3 12 31 PM

Screen Shot 2022-05-24 at 2 50 24 PM

Screen Shot 2022-05-24 at 2 50 48 PM

cc katefarrar (in the future)

chrisronline and others added 30 commits January 18, 2022 10:29
@neptunian
Copy link
Contributor

@elastic/response-ops Tests were unskipped and updated and now passing. Your approval is needed to merge.

@neptunian
Copy link
Contributor

@elasticmachine merge upstream

@neptunian neptunian disabled auto-merge June 9, 2022 12:26
@neptunian neptunian added skip-ci and removed skip-ci labels Jun 9, 2022
@gmmorris
Copy link
Contributor

gmmorris commented Jun 9, 2022

@neptunian @XavierM looking at the original PR description, I get the feeling something is out of date.
Am I right in understanding that the metrics exposed by this PR in SM are:

  • count of rule executions over time
  • count of rule failures over time

Rather than the metrics displayed in the screenshot above?
If so, can we update the PR description to match what is actually being released? 🙏

Thank again, I know this has been a bit of a mess due to the reshuffle of this work across the teams.

@brianseeders
Copy link
Contributor

Buildkite test this

@neptunian
Copy link
Contributor

Buildkite test this

@neptunian
Copy link
Contributor

@gmmorris The screenshots are correct. I've updated the PR description to list the metrics being added.

@neptunian
Copy link
Contributor

Buildkite test this

@neptunian neptunian enabled auto-merge (squash) June 9, 2022 17:44
@neptunian
Copy link
Contributor

Buildkite test this

@neptunian
Copy link
Contributor

Buildkite test this

@neptunian neptunian merged commit b75d964 into elastic:main Jun 9, 2022
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
monitoring 476.5KB 478.9KB +2.4KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @chrisronline

@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Jun 9, 2022
@gmmorris gmmorris added the Feature:Alerting/RulesManagement Issues related to the Rules Management UX label Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Alerting/RulesManagement Issues related to the Rules Management UX release_note:enhancement review Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.4.0
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

9 participants