-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(crons): Refactor to record_clock_tick_volume_metric #80605
Merged
evanpurkhiser
merged 1 commit into
master
from
evanpurkhiser/feat-crons-refactor-to-record-clock-tick-volume-metric
Nov 13, 2024
Merged
feat(crons): Refactor to record_clock_tick_volume_metric #80605
evanpurkhiser
merged 1 commit into
master
from
evanpurkhiser/feat-crons-refactor-to-record-clock-tick-volume-metric
Nov 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-actions
bot
added
the
Scope: Backend
Automatically applied to PRs that change backend components
label
Nov 12, 2024
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #80605 +/- ##
==========================================
- Coverage 78.36% 78.36% -0.01%
==========================================
Files 7207 7207
Lines 318744 318737 -7
Branches 43909 43908 -1
==========================================
- Hits 249782 249770 -12
- Misses 62600 62603 +3
- Partials 6362 6364 +2 |
This changes how we're using the volume history. Previously we were intending to use the volume history to make a decision for a specific tick, we cannot do that since we'll actually need to look at historic volume metrics to determine if we've entered an incident or if we just had an abnormality in mean deviation. - The clock_dispatch no longer includes a volume_anomaly_result. I will remove this from the sentry-kafka-schema in an upcoming PR. - Instead of evaluating a tick decision during dispatch, we now record the metrics for the timestamp we just ticked past into redis. This is done during the processing of the clock tick in the clock_tick_consumer. - The clock_tick_consumer no longer reads the volume_anomaly_result into a TickVolumeAnomolyResult. We'll still do something with this since in the future we'll be evaluating a tick result decision based on the tick metrics and will need to dispatch mark_unknown when entering an incident. But for now I've removed this logic. - I've also updated the pct_deviation metric (which is the one recorded into the redis key) to not be an absolute value, since we want to know which direction we've deviated in, we do not want to produce an incident in the scenario that we've increased in volume. - I've removed the safe_evaluate_tick_decision instead of creating a safe_record_clock_tick_volume_metric since we're now running this logic in a consumer which can backlog if we do have some kind of issue. This wrapper only existed since it was in a hot path that could fail in an unrecoverable way. We've also had this code running for a while now with no problems, so it's safe to not be overly cautious.
evanpurkhiser
force-pushed
the
evanpurkhiser/feat-crons-refactor-to-record-clock-tick-volume-metric
branch
from
November 12, 2024 18:03
777ce1d
to
e5cd6b5
Compare
evanpurkhiser
commented
Nov 12, 2024
Comment on lines
-126
to
+128
pct_deviation = (abs(past_minute_volume - historic_mean) / historic_mean) * 100 | ||
pct_deviation = (past_minute_volume - historic_mean) / historic_mean * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's where we adjust the pct mean deviation to allow negative values
wedamija
approved these changes
Nov 13, 2024
evanpurkhiser
deleted the
evanpurkhiser/feat-crons-refactor-to-record-clock-tick-volume-metric
branch
November 13, 2024 01:19
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes how we're using the volume history. Previously we were intending to use the volume history to make a decision for a specific tick, we cannot do that since we'll actually need to look at historic volume metrics to determine if we've entered an incident or if we just had an abnormality in mean deviation.
The clock_dispatch no longer includes a volume_anomaly_result. I will remove this from the
sentry-kafka-schema
package in an upcoming PR (ref(crons): Remove unusedvolume_anomaly_result
from clock_tick sentry-kafka-schemas#349).Instead of evaluating a tick decision during dispatch, we now record the metrics for the timestamp we just ticked past into redis. This is done during the processing of the clock tick in the clock_tick_consumer.
The clock_tick_consumer no longer reads the volume_anomaly_result into a TickVolumeAnomolyResult. We'll still do something with this since in the future we'll be evaluating a tick result decision based on the tick metrics and will need to dispatch mark_unknown when entering an incident. But for now I've removed this logic.
I've also updated the pct_deviation metric (which is the one recorded into the redis key) to not be an absolute value, since we want to know which direction we've deviated in, we do not want to produce an incident in the scenario that we've increased in volume.
I've removed the safe_evaluate_tick_decision instead of creating a safe_record_clock_tick_volume_metric since we're now running this logic in a consumer which can backlog if we do have some kind of issue. This wrapper only existed since it was in a hot path that could fail in an unrecoverable way. We've also had this code running for a while now with no problems, so it's safe to not be overly cautious.
Part of GH-79328