Add transaction on_commit before signals for alert group actions #3731

mderynck · 2024-01-23T01:11:08Z

What this PR does

Add transactions around log record creation and check transaction on_commit before sending signals passing DB id of alert group log records. In cases for delete we can then assume any missing IDs on tasks are from intentionally deleted alert groups and we can stop tasks from retrying endlessly.

Which issue(s) this PR fixes

Checklist

Unit, integration, and e2e (if applicable) tests updated
Documentation added (or pr:no public docs PR label added if not required)
CHANGELOG.md updated (or pr:no changelog PR label added if not required)

joeyorlando · 2024-01-24T15:23:46Z

this may also close #3729

iskhakov

I like the idea, I see that it worked in the past, but do we have a transaction blocks in these places?

iskhakov · 2024-01-25T06:49:22Z

engine/apps/alerts/models/alert_group.py

@@ -686,11 +687,7 @@ def acknowledge_by_source(self):
            f"log record {log_record.pk} with type '{log_record.get_type_display()}', action source: alert"


logging is incorrect

iskhakov · 2024-01-25T06:59:41Z

engine/apps/alerts/models/alert_group.py

-            log_record=log_record.pk,
-            action_source=action_source,
-        )
+        transaction.on_commit(partial(send_alert_group_signal.delay, log_record.pk))


Not sure if it called inside transaction block

matiasb · 2024-01-25T15:47:26Z

engine/apps/alerts/models/alert_group.py

-        alert_group_action_triggered_signal.send(
-            sender=self.acknowledge_by_user,
-            log_record=log_record.pk,
-            action_source=action_source,


To be sure, are you dropping action_source on purpose here (and below)?
Also checking, we are ok making this async now, right? (I guess so, and it makes sense; will take a look)

I'll double-check this, I think only slack uses action_source from the signal but force_sync was used instead (for the delete case). In all cases it should be able to be taken from the log record instead of it being passed on the signal, I wanted to avoid changing the signature on send_alert_group_signal/having to add another task with a different signature.

matiasb · 2024-01-25T16:02:33Z

engine/apps/telegram/tasks.py

+        logger.warning(
+            f"AlertGroupTelegramRepresentative: log record {log_record_id} never created or has been deleted"
+        )
+        return


matiasb · 2024-01-25T16:09:45Z

engine/apps/alerts/tasks/delete_alert_group.py

+            log_record=log_record_pk,
+            action_source=None,
+            force_sync=True,
+        )


So, we trigger the action triggered signal and queue the cleanup task after that. Makes sense 👍

matiasb · 2024-01-25T16:13:10Z

engine/apps/alerts/tasks/delete_alert_group.py

@@ -10,7 +11,7 @@
 @shared_dedicated_queue_retry_task(
    autoretry_for=(Exception,), retry_backoff=True, max_retries=1 if settings.DEBUG else None
 )
-def delete_alert_group(alert_group_pk, user_pk):
+def delete_alert_group(alert_group_pk: int, user_pk: int) -> None:


Maybe not for this PR, but wondering if this still needs to be a task?

I think we can keep it for now, I guess something could go wrong in the stop escalation part of delete and we could retry here.

mderynck · 2024-01-25T19:58:38Z

I like the idea, I see that it worked in the past, but do we have a transaction blocks in these places?

We do not, going to test this on dev to see if it improves the issue. It comes down to things should have autocommit but we keep finding cases where the objects are missing.

mderynck · 2024-01-31T21:49:43Z

Added back raising exceptions. In most cases they will not retry and we will get a log message and task failure. In the delete cases only a log message is written, no reason to raise an exception if it was already deleted. There are a couple cases for slack and telegram where it is inside of a signal call, here will still retry to not change the behavior at the top level. If we want to not retry the signal cases we will need to add an intermediate task to isolate the one handler from others.

matiasb

LGTM

# What this PR does Add transactions around log record creation and check transaction on_commit before sending signals passing DB id of alert group log records. In cases for delete we can then assume any missing IDs on tasks are from intentionally deleted alert groups and we can stop tasks from retrying endlessly. ## Which issue(s) this PR fixes ## Checklist - [x] Unit, integration, and e2e (if applicable) tests updated - [x] Documentation added (or `pr:no public docs` PR label added if not required) - [x] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required)

Add transaction oncommit before signals for alert group actions

cb7aa98

mderynck added 2 commits January 24, 2024 17:51

For delete make sure alert group still exists when signal is processed

5244da3

Split up delete tasks so slack message delete can retry separately

0fe9b3e

mderynck added the pr:no public docs Added to a PR that does not require public documentation updates label Jan 25, 2024

mderynck added 3 commits January 24, 2024 19:22

Fix tests

51b320d

Merge branch 'dev' into mderynck/oncommit-action-triggered-signal

54a3fed

Fix log message

86fcd20

mderynck marked this pull request as ready for review January 25, 2024 02:34

mderynck requested a review from a team January 25, 2024 02:34

iskhakov reviewed Jan 25, 2024

View reviewed changes

matiasb reviewed Jan 25, 2024

View reviewed changes

mderynck added 14 commits January 25, 2024 13:47

Remove action_source from signal

068284f

Merge branch 'dev' into mderynck/oncommit-action-triggered-signal

abae499

Add debug log statements

914ff15

Add logging for delete alert group log record

08ae20c

Add transaction in acknowledge reminder

0b1ded2

Merge branch 'dev' into mderynck/oncall-dev-debugging

76ae688

Add transactions

c422c55

Merge branch 'dev' into mderynck/oncommit-action-triggered-signal

74171e8

Merge dev

19beec9

Update changelog

807943f

Merge dev

0c234ec

Raise exceptions but do not retry on object does not exist

d9e7d40

Raise exception for missing in webhook trigger

9c84d26

Raise exception

f57b756

matiasb approved these changes Jan 31, 2024

View reviewed changes

mderynck merged commit 2a466a0 into dev Jan 31, 2024
20 of 21 checks passed

mderynck deleted the mderynck/oncommit-action-triggered-signal branch January 31, 2024 22:54

iskhakov mentioned this pull request Feb 6, 2024

fix "AlertGroup not found" exception in apps.metrics_exporter.tasks.update_metrics_for_alert_group task #3824

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transaction on_commit before signals for alert group actions #3731

Add transaction on_commit before signals for alert group actions #3731

mderynck commented Jan 23, 2024 •

edited

Loading

joeyorlando commented Jan 24, 2024

iskhakov left a comment

iskhakov Jan 25, 2024

iskhakov Jan 25, 2024

matiasb Jan 25, 2024

mderynck Jan 25, 2024

matiasb Jan 25, 2024

matiasb Jan 25, 2024

matiasb Jan 25, 2024

mderynck Jan 25, 2024

mderynck commented Jan 25, 2024

mderynck commented Jan 31, 2024

matiasb left a comment

		@@ -686,11 +687,7 @@ def acknowledge_by_source(self):
		f"log record {log_record.pk} with type '{log_record.get_type_display()}', action source: alert"

Add transaction on_commit before signals for alert group actions #3731

Add transaction on_commit before signals for alert group actions #3731

Conversation

mderynck commented Jan 23, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes

Checklist

joeyorlando commented Jan 24, 2024

iskhakov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mderynck commented Jan 25, 2024

mderynck commented Jan 31, 2024

matiasb left a comment

Choose a reason for hiding this comment

mderynck commented Jan 23, 2024 •

edited

Loading