Skip to content

Commit

Permalink
Add escalation step to notify all members from a team (#3908)
Browse files Browse the repository at this point in the history
Based on #3477

---------

Co-authored-by: xssfox <xss@sprocketfox.io>
  • Loading branch information
matiasb and xssfox authored Feb 20, 2024
1 parent 6da36b3 commit d6467e9
Show file tree
Hide file tree
Showing 19 changed files with 448 additions and 6 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Update OnCall Insights dashboard @Ferril ([#3875](https://github.com/grafana/oncall/pull/3875))
- Do not delete webhook if its team is deleted @mderynck ([#3873](https://github.com/grafana/oncall/pull/3873))
- Update user details internal API perms ([#3900](https://github.com/grafana/oncall/pull/3900))
- Add escalation to notify entire Grafana team @xssfox ([#3477](https://github.com/grafana/oncall/pull/3477))

## v1.3.105 (2024-02-13)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ need a larger time interval, use multiple wait steps in a row.
* `Notify users` - send a notification to a user or a group of users.
* `Notify users from on-call schedule` - send a notification to a user or a group of users
from an on-call schedule.
* `Notify all users from a team` - send a notification to all users in a team.
* `Resolve incident automatically` - resolve the alert group right now with status
`Resolved automatically`.
* `Notify whole slack channel` - send a notification to the users in the slack channel.
Expand Down
32 changes: 31 additions & 1 deletion docs/sources/oncall-api-reference/escalation_policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The above command returns JSON structured in the following way:
| `escalation_chain_id` | Yes | Each escalation policy is assigned to a specific escalation chain. |
| `position` | Optional | Escalation policies execute one after another starting from `position=0`. `Position=-1` will put the escalation policy to the end of the list. A new escalation policy created with a position of an existing escalation policy will move the old one (and all following) down in the list. |
| `type` | Yes | One of: `wait`, `notify_persons`, `notify_person_next_each_time`, `notify_on_call_from_schedule`, `notify_user_group`, `trigger_action`, `resolve`, `notify_whole_channel`, `notify_if_time_from_to`. |
| `important` | Optional | Default is `false`. Will assign "important" to personal notification rules if `true`. This can be used to distinguish alerts on which you want to be notified immediately by phone. Applicable for types `notify_persons`, `notify_on_call_from_schedule`, and `notify_user_group`. |
| `important` | Optional | Default is `false`. Will assign "important" to personal notification rules if `true`. This can be used to distinguish alerts on which you want to be notified immediately by phone. Applicable for types `notify_persons`, `notify_team_members`, `notify_on_call_from_schedule`, and `notify_user_group`. |
| `duration` | If type = `wait` | The duration, in seconds, when type `wait` is chosen. Valid values are: `60`, `300`, `900`, `1800`, `3600`. |
| `action_to_trigger` | If type = `trigger_action` | ID of a webhook. |
| `group_to_notify` | If type = `notify_user_group` | ID of a `User Group`. |
Expand All @@ -44,6 +44,7 @@ The above command returns JSON structured in the following way:
| `notify_on_call _from_schedule` | If type = `notify_on_call_from_schedule` | ID of a Schedule. |
| `notify_if_time_from` | If type = `notify_if_time_from_to` | UTC time represents the beginning of the time period, for example `09:00:00Z`. |
| `notify_if_time_to` | If type = `notify_if_time_from_to` | UTC time represents the end of the time period, for example `18:00:00Z`. |
| `team_to_notify` | If type = `notify_team_members` | ID of a team. |

**HTTP request**

Expand All @@ -70,6 +71,35 @@ The above command returns JSON structured in the following way:
}
```

# Update an escalation policy

```shell
curl "{{API_URL}}/api/v1/escalation_policies/E3GA6SJETWWJS/" \
--request PUT \
--header "Authorization: meowmeowmeow" \
--header "Content-Type: application/json" \
--data '{
"type": "wait",
"duration": 300,
}'
```

The above command returns JSON structured in the following way:

```json
{
"id": "E3GA6SJETWWJS",
"escalation_chain_id": "F5JU6KJET33FE",
"position": 0,
"type": "wait",
"duration": 300
}
```

**HTTP request**

`PUT {{API_URL}}/api/v1/on_call_shifts/<ON_CALL_SHIFT_ID>/`

**HTTP request**

`GET {{API_URL}}/api/v1/escalation_policies/<ESCALATION_POLICY_ID>/`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ def build_raw_escalation_snapshot(self) -> dict:
'wait_delay': None,
'notify_schedule': None,
'notify_to_group': None,
'notify_to_team_members': None,
'passed_last_time': None,
'escalation_counter': 0,
'last_notified_user': None,
Expand All @@ -84,6 +85,7 @@ def build_raw_escalation_snapshot(self) -> dict:
'wait_delay': '00:05:00',
'notify_schedule': None,
'notify_to_group': None,
'notify_to_team_members': None,
'passed_last_time': None,
'escalation_counter': 0,
'last_notified_user': None,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ class Meta:
"custom_webhook",
"notify_schedule",
"notify_to_group",
"notify_to_team_members",
"escalation_counter",
"passed_last_time",
"pause_escalation",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ class EscalationPolicySnapshot:
"custom_webhook",
"notify_schedule",
"notify_to_group",
"notify_to_team_members",
"escalation_counter",
"passed_last_time",
"pause_escalation",
Expand Down Expand Up @@ -72,6 +73,7 @@ def __init__(
escalation_counter,
passed_last_time,
pause_escalation,
notify_to_team_members=None,
):
self.id = id
self.order = order
Expand All @@ -87,6 +89,7 @@ def __init__(
self.custom_webhook = custom_webhook
self.notify_schedule = notify_schedule
self.notify_to_group = notify_to_group
self.notify_to_team_members = notify_to_team_members
self.escalation_counter = escalation_counter # used for STEP_REPEAT_ESCALATION_N_TIMES
self.passed_last_time = passed_last_time # used for building escalation plan
self.pause_escalation = pause_escalation # used for STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW
Expand Down Expand Up @@ -124,6 +127,8 @@ def execute(self, alert_group: "AlertGroup", reason) -> StepExecutionResultData:
EscalationPolicy.STEP_FINAL_RESOLVE: self._escalation_step_resolve,
EscalationPolicy.STEP_NOTIFY_GROUP: self._escalation_step_notify_user_group,
EscalationPolicy.STEP_NOTIFY_GROUP_IMPORTANT: self._escalation_step_notify_user_group,
EscalationPolicy.STEP_NOTIFY_TEAM_MEMBERS: self._escalation_step_notify_team_members,
EscalationPolicy.STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT: self._escalation_step_notify_team_members,
EscalationPolicy.STEP_NOTIFY_SCHEDULE: self._escalation_step_notify_on_call_schedule,
EscalationPolicy.STEP_NOTIFY_SCHEDULE_IMPORTANT: self._escalation_step_notify_on_call_schedule,
EscalationPolicy.STEP_TRIGGER_CUSTOM_BUTTON: self._escalation_step_trigger_custom_button,
Expand Down Expand Up @@ -358,6 +363,55 @@ def _escalation_step_notify_user_group(self, alert_group: "AlertGroup", reason:
tasks.append(notify_group)
self._execute_tasks(tasks)

def _escalation_step_notify_team_members(self, alert_group: "AlertGroup", reason: str) -> None:
tasks = []

if self.notify_to_team_members is None:
log_record = AlertGroupLogRecord(
type=AlertGroupLogRecord.TYPE_ESCALATION_FAILED,
alert_group=alert_group,
reason=reason,
escalation_policy=self.escalation_policy,
escalation_error_code=AlertGroupLogRecord.ERROR_ESCALATION_NOTIFY_TEAM_MEMBERS_STEP_IS_NOT_CONFIGURED,
escalation_policy_step=self.step,
)
log_record.save()
else:
log_record = AlertGroupLogRecord(
type=AlertGroupLogRecord.TYPE_ESCALATION_TRIGGERED,
alert_group=alert_group,
reason=reason,
escalation_policy=self.escalation_policy,
escalation_policy_step=self.step,
step_specific_info={"team": self.notify_to_team_members.name},
)
log_record.save()
self.notify_to_users_queue = self.notify_to_team_members.users.all()
reason = "user belongs to team {}".format(self.notify_to_team_members.name)
for notify_to_user in self.notify_to_users_queue:
notify_task = notify_user_task.signature(
(
notify_to_user.pk,
alert_group.pk,
),
{
"reason": reason,
"important": self.step == EscalationPolicy.STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
},
immutable=True,
)
tasks.append(notify_task)
AlertGroupLogRecord.objects.create(
type=AlertGroupLogRecord.TYPE_ESCALATION_TRIGGERED,
author=notify_to_user,
alert_group=alert_group,
reason=reason,
escalation_policy=self.escalation_policy,
escalation_policy_step=self.step,
)

self._execute_tasks(tasks)

def _escalation_step_notify_if_time(self, alert_group: "AlertGroup", _reason: str) -> StepExecutionResultData:
eta = None

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Generated by Django 4.2.10 on 2024-02-16 17:24

from django.db import migrations, models
import django.db.models.deletion


class Migration(migrations.Migration):

dependencies = [
('user_management', '0020_organization_is_grafana_labels_enabled'),
('alerts', '0044_alertreceivechannel_alertmanager_v2_backup_templates_and_more'),
]

operations = [
migrations.AddField(
model_name='escalationpolicy',
name='notify_to_team_members',
field=models.ForeignKey(default=None, null=True, on_delete=django.db.models.deletion.SET_NULL, related_name='escalation_policies', to='user_management.team'),
),
migrations.AlterField(
model_name='escalationpolicy',
name='step',
field=models.IntegerField(choices=[(0, 'Wait'), (1, 'Notify User'), (2, 'Notify Whole Channel'), (3, 'Repeat Escalation (5 times max)'), (4, 'Resolve'), (5, 'Notify Group'), (6, 'Notify Schedule'), (7, 'Notify User (Important)'), (8, 'Notify Group (Important)'), (9, 'Notify Schedule (Important)'), (10, 'Trigger Outgoing Webhook'), (11, 'Notify User (next each time)'), (12, 'Continue escalation only if time is from'), (13, 'Notify multiple Users'), (14, 'Notify multiple Users (Important)'), (15, 'Continue escalation if >X alerts per Y minutes'), (16, 'Trigger Webhook'), (17, 'Notify all users in a Team'), (18, 'Notify all users in a Team (Important)')], default=None, null=True),
),
]
8 changes: 7 additions & 1 deletion engine/apps/alerts/models/alert_group_log_record.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,8 @@ class AlertGroupLogRecord(models.Model):
ERROR_ESCALATION_NOTIFY_IN_SLACK,
ERROR_ESCALATION_NOTIFY_IF_NUM_ALERTS_IN_WINDOW_STEP_IS_NOT_CONFIGURED,
ERROR_ESCALATION_TRIGGER_CUSTOM_WEBHOOK_ERROR,
) = range(18)
ERROR_ESCALATION_NOTIFY_TEAM_MEMBERS_STEP_IS_NOT_CONFIGURED,
) = range(19)

type = models.IntegerField(choices=TYPE_CHOICES)

Expand Down Expand Up @@ -519,6 +520,11 @@ def rendered_log_line_action(self, for_slack=False, html=False, substitute_autho
result += 'skipped escalation step "Notify Schedule" because it is not configured'
elif self.escalation_error_code == AlertGroupLogRecord.ERROR_ESCALATION_NOTIFY_GROUP_STEP_IS_NOT_CONFIGURED:
result += 'skipped escalation step "Notify Group" because it is not configured'
elif (
self.escalation_error_code
== AlertGroupLogRecord.ERROR_ESCALATION_NOTIFY_TEAM_MEMBERS_STEP_IS_NOT_CONFIGURED
):
result += 'skipped escalation step "Notify Team Members" because it is not configured'
elif (
self.escalation_error_code
== AlertGroupLogRecord.ERROR_ESCALATION_TRIGGER_CUSTOM_BUTTON_STEP_IS_NOT_CONFIGURED
Expand Down
35 changes: 34 additions & 1 deletion engine/apps/alerts/models/escalation_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,9 @@ class EscalationPolicy(OrderedModel):
STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT,
STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW,
STEP_TRIGGER_CUSTOM_WEBHOOK,
) = range(17)
STEP_NOTIFY_TEAM_MEMBERS,
STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
) = range(19)

# Must be the same order as previous
STEP_CHOICES = (
Expand All @@ -66,6 +68,8 @@ class EscalationPolicy(OrderedModel):
(STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT, "Notify multiple Users (Important)"),
(STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW, "Continue escalation if >X alerts per Y minutes"),
(STEP_TRIGGER_CUSTOM_WEBHOOK, "Trigger Webhook"),
(STEP_NOTIFY_TEAM_MEMBERS, "Notify all users in a Team"),
(STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT, "Notify all users in a Team (Important)"),
)

# Ordered step choices available for internal api.
Expand All @@ -74,6 +78,7 @@ class EscalationPolicy(OrderedModel):
# Common
STEP_WAIT,
STEP_NOTIFY_MULTIPLE_USERS,
STEP_NOTIFY_TEAM_MEMBERS,
STEP_NOTIFY_SCHEDULE,
STEP_FINAL_RESOLVE,
# Slack
Expand All @@ -100,6 +105,8 @@ class EscalationPolicy(OrderedModel):
STEP_NOTIFY_USERS_QUEUE,
STEP_NOTIFY_IF_TIME,
STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW,
STEP_NOTIFY_TEAM_MEMBERS,
STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
STEP_NOTIFY_MULTIPLE_USERS,
STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT,
STEP_TRIGGER_CUSTOM_BUTTON,
Expand All @@ -113,6 +120,10 @@ class EscalationPolicy(OrderedModel):
# Common steps
STEP_WAIT: ("Wait {{wait_delay}} minute(s)", "Wait"),
STEP_NOTIFY_MULTIPLE_USERS: ("Start {{importance}} notification for {{users}}", "Notify users"),
STEP_NOTIFY_TEAM_MEMBERS: (
"Start {{importance}} notification for {{team}} team members",
"Notify all team members",
),
STEP_NOTIFY_SCHEDULE: (
"Start {{importance}} notification for schedule {{schedule}}",
"Notify users from on-call schedule",
Expand Down Expand Up @@ -157,24 +168,28 @@ class EscalationPolicy(OrderedModel):
STEP_NOTIFY_GROUP: STEP_NOTIFY_GROUP_IMPORTANT,
STEP_NOTIFY_SCHEDULE: STEP_NOTIFY_SCHEDULE_IMPORTANT,
STEP_NOTIFY_MULTIPLE_USERS: STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT,
STEP_NOTIFY_TEAM_MEMBERS: STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
}
IMPORTANT_TO_DEFAULT_STEP_MAPPING = {
STEP_NOTIFY_GROUP_IMPORTANT: STEP_NOTIFY_GROUP,
STEP_NOTIFY_SCHEDULE_IMPORTANT: STEP_NOTIFY_SCHEDULE,
STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT: STEP_NOTIFY_MULTIPLE_USERS,
STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT: STEP_NOTIFY_TEAM_MEMBERS,
}

# Default steps are just usual version of important steps. E.g. notify group - notify group important
DEFAULT_STEPS_SET = {
STEP_NOTIFY_GROUP,
STEP_NOTIFY_SCHEDULE,
STEP_NOTIFY_MULTIPLE_USERS,
STEP_NOTIFY_TEAM_MEMBERS,
}

IMPORTANT_STEPS_SET = {
STEP_NOTIFY_GROUP_IMPORTANT,
STEP_NOTIFY_SCHEDULE_IMPORTANT,
STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT,
STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
}

SLACK_INTEGRATION_REQUIRED_STEPS = [
Expand All @@ -187,6 +202,7 @@ class EscalationPolicy(OrderedModel):
STEP_WAIT,
STEP_NOTIFY_SCHEDULE,
STEP_NOTIFY_MULTIPLE_USERS,
STEP_NOTIFY_TEAM_MEMBERS,
STEP_NOTIFY_USERS_QUEUE,
STEP_NOTIFY_GROUP,
STEP_FINAL_RESOLVE,
Expand All @@ -213,6 +229,8 @@ class EscalationPolicy(OrderedModel):
STEP_NOTIFY_USERS_QUEUE: "notify_person_next_each_time",
STEP_NOTIFY_MULTIPLE_USERS: "notify_persons",
STEP_NOTIFY_MULTIPLE_USERS_IMPORTANT: "notify_persons",
STEP_NOTIFY_TEAM_MEMBERS: "notify_team_members",
STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT: "notify_team_members",
STEP_NOTIFY_IF_TIME: "notify_if_time_from_to",
STEP_NOTIFY_IF_NUM_ALERTS_IN_TIME_WINDOW: "notify_if_num_alerts_in_window",
STEP_REPEAT_ESCALATION_N_TIMES: "repeat_escalation",
Expand Down Expand Up @@ -244,6 +262,14 @@ class EscalationPolicy(OrderedModel):

step = models.IntegerField(choices=STEP_CHOICES, default=None, null=True)

notify_to_team_members = models.ForeignKey(
"user_management.Team",
on_delete=models.SET_NULL,
related_name="escalation_policies",
default=None,
null=True,
)

notify_to_group = models.ForeignKey(
"slack.SlackUserGroup",
on_delete=models.SET_NULL,
Expand Down Expand Up @@ -368,6 +394,13 @@ def insight_logs_serialized(self):
if self.notify_to_group:
result["user_group"] = self.notify_to_group.name
result["user_group_id"] = self.notify_to_group.public_primary_key
elif self.step in [
EscalationPolicy.STEP_NOTIFY_TEAM_MEMBERS,
EscalationPolicy.STEP_NOTIFY_TEAM_MEMBERS_IMPORTANT,
]:
if self.notify_to_team_members:
result["team"] = self.notify_to_team_members.name
result["team_id"] = self.notify_to_team_members.public_primary_key
elif self.step in [EscalationPolicy.STEP_NOTIFY_SCHEDULE, EscalationPolicy.STEP_NOTIFY_SCHEDULE_IMPORTANT]:
if self.notify_schedule:
result["on-call_schedule"] = self.notify_schedule.insight_logs_verbal
Expand Down
Loading

0 comments on commit d6467e9

Please sign in to comment.