Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "Continue escalation if >X alerts per Y minutes" escalation step #2636

Merged
merged 7 commits into from
Jul 26, 2023

Conversation

vstpme
Copy link
Member

@vstpme vstpme commented Jul 25, 2023

What this PR does

Fixes a faulty escalation step "Continue escalation if >X alerts per Y minutes".

Which issue(s) this PR fixes

#895

Checklist

  • Unit, integration, and e2e (if applicable) tests updated
  • Documentation added (or pr:no public docs PR label added if not required)
  • CHANGELOG.md updated (or pr:no changelog PR label added if not required)

@vstpme vstpme added the pr:no public docs Added to a PR that does not require public documentation updates label Jul 25, 2023
Comment on lines -201 to -202
if self.pause_escalation:
return False
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escalation_chain_exists should not depend on pause_escalation, these are two separate properties that are not dependent on each other. For example, an alert group that has a proper escalation chain could be paused, and it would be a totally valid state.

if (
self.is_restricted
or is_on_maintenace_or_debug_mode
or self.pause_escalation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escalations for paused alert groups should not be skipped. Paused alert groups should be escalated further and get handled by the "Continue escalation if >X alerts per Y minutes" step.

Comment on lines +29 to +30
if alert.is_the_first_alert_in_group or alert.group.pause_escalation:
alert.group.start_escalation_if_needed(countdown=TASK_DELAY_SECONDS)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change of the PR: run start_escalation_if_needed not only when receiving the first alert, but also when the group is paused (see comment above).

Comment on lines +80 to +81
# Check that setting pause_escalation to True doesn't make the snapshot empty
assert alert_group.build_raw_escalation_snapshot() != EMPTY_RAW_ESCALATION_SNAPSHOT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating the test to reflect this change.

@vstpme vstpme changed the title Fix ">X alerts per Y minutes" escalation step Fix "Continue escalation if >X alerts per Y minutes" escalation step Jul 25, 2023


@pytest.mark.django_db
def test_distribute_alert_escalate_alert_group_when_escalation_paused(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a test to check that escalate_alert_group task is called for paused alert groups (i.e. paused alert groups get handled by the escalation step). There's already a test that checks the actual step behaviour here.

@vstpme vstpme marked this pull request as ready for review July 25, 2023 16:58
@vstpme vstpme requested a review from a team July 25, 2023 16:58
@vstpme vstpme merged commit 5529999 into dev Jul 26, 2023
@vstpme vstpme deleted the vadimkerr/fix-more-x-in-y-minutes branch July 26, 2023 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:no public docs Added to a PR that does not require public documentation updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants