-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly #109971
[Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly #109971
Conversation
…to fire correctly
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
x-pack/plugins/infra/server/lib/alerting/metric_threshold/lib/evaluate_alert.ts
Outdated
Show resolved
Hide resolved
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for writing tests
Tested the branch and was able to get metric threshold alerts to trigger for both metric types. |
…ctor to fire correctly (elastic#109971) * [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly * Removing unused variables * Fixing tests for metric_threshold_executor * Fixing test for metric_query * fixing test * Changing type guard
…ctor to fire correctly (elastic#109971) * [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly * Removing unused variables * Fixing tests for metric_threshold_executor * Fixing test for metric_query * fixing test * Changing type guard
…ctor to fire correctly (#109971) (#111012) * [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly * Removing unused variables * Fixing tests for metric_threshold_executor * Fixing test for metric_query * fixing test * Changing type guard Co-authored-by: Chris Cowan <chris@chriscowan.us>
…eporting-to-v2 * 'master' of github.com:elastic/kibana: (65 commits) Move to vis_types folder part 2 (elastic#110574) [SOR] use initialNamespaces when checking for conflict for `create` and `bulkCreate` (elastic#111023) [Discover] Remove export* syntax (elastic#110934) [Event log][7.x] Updated event log client to search across legacy IDs (elastic#109365) [Security Solution][Detection Rules] Changes 'activated' text on rule details page (elastic#111044) [Metrics UI] Filter out APM nodes from the inventory view (elastic#110300) [package testing] Update logging and pid configuration (elastic#111059) [Dashboard] Read App State from URL on Soft Refresh (elastic#109354) Add correct roles to test user for functional tests in dashboard (elastic#110880) [DOCS] Adds Lens Inspector and minor edits (elastic#109736) [DOCS] Updates Spaces page (elastic#111005) normalize initialNamespaces (elastic#110936) [Reporting] Clean up `any` usage, reorganize server route files (elastic#110740) [Security Solution] [CTI] Fixes bug that caused Threshold and Indicator Match rules to ignore custom rule filters if a saved query was used in the rule definition. (elastic#109253) skip flaky suites: elastic#111001, elastic#111022 [Security Solution][RAC] - Update reason field text (elastic#110308) [RAC][Security Solution] Make analyzer work with EuiDataGrid full screen (elastic#110913) [Metrics UI] Add integration tests for Metric Threshold Rule and refactor to fire correctly (elastic#109971) [DOCS] Updates Discover docs (elastic#110346) [RAC] Persistent timeline fields fix (elastic#110685) ...
Looks like this PR has backport PRs but they still haven't been merged. Please merge them ASAP to keep the branches relatively in sync. |
…ctor to fire correctly (#109971) (#111011) * [Metrics UI] Add integration tests for Metric Threshold and refactor to fire correctly * Removing unused variables * Fixing tests for metric_threshold_executor * Fixing test for metric_query * fixing test * Changing type guard Co-authored-by: Chris Cowan <chris@chriscowan.us>
Summary
This PR adds integration tests along with a refactor to the Metric Threshold Alerts so that they fire as expected. Fixes #110912
Data and Expectations
For this PR I've added a new data set under
x-pack/test/functional/es_archives/infra/alerts_test_data
that contains 2 real world examples. The first is agauge
style example where the data is indexed every 5 minutes for two different environments,dev
andprod
. The values fordev
are all set to0
and the values forprod
range from0 to 3
randomly. The very last event looks like this:The expectation is that if the rule ran at
2021-01-01T01:00:00Z
with the following criteria:It should trigger a
shouldFire
event for theprod
environment. Here is the same data in a chart with the threshold line.This failed on the current implementation because it was adding a
timeUnit
to theend
value of the time frame, then moving theend
to the start of thetimeUnit
. Also, thedate_range
bucket was being offset by60s
which also contributed the situation where it would miss the bucket.This PR removed the
end
modification for bothgauge
andcounter
style queries. Forgauge
style queries, thedate_range
bucketing has been removed. Thedate_range
bucketing was problematic because it uses the following logic@timestamp >= from && @timestamp < to
which excluded the last event (which is included in therange
query), which we expect to trigger the rule.The implementation in this PR now relies on the
range
in thequery
along with a basic metric aggregation. Forgauge
style rules, it will only query the same amount of data that is expressed withtimeUnit
andtimeSize
; for the example above, that would bethe last 5 minutes
.For
counter
style rules that usesAggregators.RATE
, everything should work the same as before and only trigger on the last "full" bucket. The chart below is from the second data set and has been confirmed with an integration test.Checklist
Glossary
gauge
style metrics are queries that use basic metric aggregations likesum
,avg
,max
,min
,percentile
to aggregate values across multiple documents. What makes a metric agauge
is that the value is a snapshot in time. For examplesystem.cpu.total.pct
is the total CPU usage being used at that point in time.counter
style metrics are monotonically increasing numbers where the value is a cumulative sum over time. To find the value at a point in time you have to typically use derivative pipeline aggregation or rate (which isderivative(max(value))
).