Create a foundation for an "alert" system #1255
Replies: 3 comments 12 replies
-
Thanks. This is a great write-up and exactly the conversation we need to be having. My thoughts below: notifications_subscriptions
Would you please unpack this more? Are we unable to send an email to
Given that we do intend to allow for arbitrary emails, why not just store the desired email directly, and avoid the coupling to our Then, we could have trigger on BEFORE INSERT/UPDATE that the email is actually contained in Would this also avoid the extra notification_templatesHaving a CMS of some sort in the loop makes plenty of sense. But, can we not use Strapi ? It's the CMS system we already have in place, that people who might be editing this content know how to use, and has a reasonable GUI. There's also an email plugin for templated email. There may be issues with it, but we should be sure it isn't fit for purpose before introducing a new thing. Having an internal One other comment is that we should try and stick to established industry terminology, which is:
pagerduty docs have a good writeup, but Prometheus and Grafana use identical terminology. data_processing_notifications
There's a key question we're circling around here that I want to get at: what is an alert, exactly? We have one kind that we're defining today -- no data processed in an $interval -- and we expect to have others, like telling uses that they've gone over free tier limits and are now in a free trial, or that payment failed, and other things that I can't foresee right now. We should explicitly model and get consensus on what an alert is. I think it's defined as:
The natural "key" of an alert is Whatever the exact definition is -- and we may want to model it as an explicit DB type -- an observation is that the role of the This is a really nice pattern and I'm thinking we should rally around it. To put it in slightly different words, the concrete table
Side note: individual views are likely to get slow, depending on the business logic, but we can accelerate them as-needed by having select alert types use materialized views which are refreshed using a type-specific The one place where this ☝️ approach is incompatible with the current design, is the I think that "have I sent an email yet?" is separate state that should be tracked internally by the mechanism that's responsible for sending emails. This is a convergence problem -- this email-sending mechanism is running scheduled iterations of:
Questions
I think "yes". Any user with
I think this falls out of the question on Strapi. What next?Please poke holes or push back on what I've written where there's disagreement. Also, while a lot of this feedback is broadly compatible with the model you've proposed, some of it is probably a significant departure from current implementation. I think it's acceptable to take measured short-cuts to get this feature out the door, but I do want there to be a consensus design outcome from this discussion that we're moving towards. |
Beta Was this translation helpful? Give feedback.
-
On verification & authorizationIn some contexts, like our dashboard, the ability to demonstrate ownership of an email address bestows the bearer with a capability to act as that account and do things. Verification is the proof that "you" (as a browser) are the owner of the email (equivalent: account), and your browser can thus act as an agent for that account with all of its capabilities. This is not the situation we have here. An email to which we send alerts is given no capabilities to an account within our system. The only capability is an implicit one, to read the contents of alerts we're sending out. But, that flow has already been authorized by an account in our system having From a AuthN / AuthZ perspective, I see this as a non-issue. The only role of verification in this context is to prevent spamming of email that doesn't want it (or equivalently: to prevent problems with email reputation). I definitely think we can punt on this. On terminologyThis discussion is mixing up concepts of "alert", "notification", and "subscription". My mental picture has been that "alert" and "notification" are the exact same thing -- a system condition that can be firing or resolved, that we're telling the user about. I honestly don't know where / how "notification" entered the scene -- it doesn't appear in the PRD and isn't a separable product concept in any discussion I've been part of -- and I've been rolling with it as a synonym of "alert" to date, but it's getting confusing and we need to pick one and be consistent (some of this discussion now seems to mix a "notification" and "subscriber/subscription" concept). Proposal: "alert" and "alert subscriber/subscription" are the technical terms we use. "notification" is not. Another one is "classification" vs "type" for the taxonomy of specific kinds of alerts. IMO classification is a long word to type and easily misspelled, and it also emotes a notion of triaging of firing alerts that isn't actually intended. For example, I might expect a classification of alerts to be a higher-order taxonomy applied to a set of concrete types (akin to log levels applying to the concrete places where logs are emitted within a codebase). On coupling to the
|
Beta Was this translation helpful? Give feedback.
-
The backend of the alerts functionality has been deployed and working more or less as expected, with one caveat. Alert emails are being sent in duplicate. How it works todayAs implemented, the edge function just receives a request, and then queries the database for the alerts for which it will send emails. The edge function does this by just querying for all alerts that have either started firing or resolved within the last 5 minutes (a hard-coded constant). The edge function is called by a trigger on the
Because there's two separate statements for How to solveA possible short term solution is to just update the above code to be a single statement, with separate CTEs for the inserts and updates. This can probably be good enough for a while. But I think we'll eventually need to make this more robust. Can we always count on being able to modify
|
Beta Was this translation helpful? Give feedback.
-
Last Updated: Nov. 9, 2023
The inaugural alert that will be supported by the system has the following requirements:
Provide a means for a user to subscribe to any alert defined under a specific prefix (of which they admin). The user will only be presented with the option to receive alerts via email at this time, but Slack channels are a notification delivery method that will need to be supported in the future.
Provide a means for the user to select a period of time over which a task's data processing should be evaluated from a predefined set of options. Accepted time periods are: two hours, four hours, eight hours, 12 hours, 24 hours, and two days.
Send one alert email when the condition for the aforementioned alert is valid.
Send a confirmation email when the condition for the aforementioned alert is no longer valid.
Allow a user to enter an email that is not associated with an account in the system to receive alerts.
Design
At a high level, the alert system breaks down into three, core components: a table containing subscription-related information, a table containing the state relevant to a specific alert type, and a table containing alert activity records.
Alert Subscriptions Table
The
alert_subscriptions
table stores the set of active, alert subscriptions. Each row corresponds to an individual user's subscription. Initially, a user will only be able to subscribe to alerts on a tenant-basis. That said, the prefixed scope may be narrowed to allow more fine-grained control over subscriptions in the future.id
uuid
detail
text
ornull
created_at
timestampz
updated_at
timestampz
catalog_prefix
catalog_prefix
email
text
ornull
Alert Data Processing Table
The
alert_data_processing
table stores state relevant to the'data_not_processed_in_interval'
alert type. Each row of this table corresponds to the alert state for a specific task.catalog_name
catalog_name
evaluation_interval
interval
There is a view associated with this table,
alert_data_processing_firing
, whose main purpose is to relay the set of alerts (of this type) that need to be sent.Alert History Table
The
alert_history
table provides historical context for all alert activity.alert_type
text
catalog_name
catalog_name
fired_at
timestampz
resolved_at
timestampz
ornull
arguments
json
Notes
There are two, behavioral quirks of the implementation that are considered acceptable:
Disabling or deleting a task for which an alert is firing (i.e., there is a row in the
alert_history
table for the task whereresolved_at
isnull
) will result in a confirmation email sent.Unsetting the
evaluation_interval
of a task for which an alert is firing (i.e., there is a row in thealert_history
table for the task whereresolved_at
isnull
but the row in thealert_data_processing
table for the task has been deleted) will result in a confirmation email sent.Questions
Would we like to track errors encountered when attempting to send an alert? Or are we comfortable relying on the error handling capabilities built into the marketing (and transactional) email provider?
How long should data in the
alert_history
table be retained?Additional Comments
Given the vulnerabilities identified with Resend late last week, searching for a new marketing (and transactional) email provider is advised for future notification work. Noting this here for record.
A proper, CMS solution is a pending item that will be addressed in a future adaptation of this alert system. A proposal was made in the
notification_templates
section of this comment. It is my understanding that the team is comfortable employing different solutions for email alerts and in-app alerts.In the event external email verification is addressed in the future, the current modeling does not allow for the UI to reliably surface verified, external emails as alert delivery method options.
Beta Was this translation helpful? Give feedback.
All reactions