-
Notifications
You must be signed in to change notification settings - Fork 4
Decision records
This page is a record of some of the decisions we've made.
We also have separate documents for individual decisions. If a decision is long and complex or there are lots of aspects to discuss, a separate document may be more appropriate. Otherwise, document the key points here.
Note: this is retrospective documentation of a past decision. More details can be found in this discussion PR.
We use Cronitor to monitor infrequent but critical tasks have run as expected e.g. collate-letter-pdfs-to-be-sent
must run on schedule. While we usually get an alert if a task logs an exception, Cronitor covers us if a task fails silently, which is usually due to a deployment or if an instance gets recycled.
We don't add Cronitor to more frequent tasks. Adding a task requires extra config in -credentials
and costs more. There's also a risk of reducing the value of Cronitor alerts if we alert on all tasks indiscriminately, since more frequent tasks naturally recur without needing any action from us.
We did consider the potential for Cronitor to cover us for schedule / scheduler bugs, but we should get alerts about these together with other errors. We don't add Cronitor for daily -alert-
tasks (e.g. check-if-letters-still-in-created
) - although these are critical, they are really temporary until we have more timely alerts.
We have decided to stop using Dockerhub in favour of ghcr for the following reasons:
- Cost: Notify has a Dockerhub "Pro" account, which costs $60/year which used to allow up to 50,000 pulls/day [link]. Back in May, Dockerhub notified us that we were averaging ~56,000 pulls/day and we should either reduce the usage or pay $18,000 for a "service account" which would increase our allowance to 150,000 pulls/day. In contrast, ghcr is already provided as part of our enterprise agreement with Github without any limit in the number of pulls.
- Security: Since ghcr is part of Github we can take advantage of the same access controls as with the rest of our repos, meaning that we will have one less thing to manage
As a result of this decision, we should be keeping an eye out for any images that we pull from Dockerhub and use this pipeline to copy them to ghcr instead.
We have decided we will not have cookies on gov.uk/alerts for the following reasons
- We don't have a need for Google analytics. We think we can get per second requests and break down by URL stats through Fastly which will be sufficient
- We don't have any other use cases for why we need to have any other cookies
- We don't want to damage users trust in the emergency alerts and whether we are tracking them etc
- we don't want users to see a big cookie banner first rather than the important content telling them what to do to potentially save their life
This decision could be reversed if we discover new needs for cookies and feel their benefits outweigh the downsides of having them
We have decided for the interim to host gov.uk/alerts as a static website in S3 for the following reasons
- We think this is one of the quickest ways to get a website created
- We think it will be relatively easy to build and run
- We think S3 will offer us the ability to handle high load (although Fastly will protect us from most of it)
- We think it will be more reliable than GOV.UK PaaS
- We get easy built in security permissions and access logging in AWS that match the rest of our broadcast infrastructure for free, without needing to replicate them in a different environment like the PaaS
This decision is likely one to last a few months, until we find that we outgrow this solution. When we do more exploration into publishing alerts in real time, our infrastructure needs will become more complex and we should be open to completely changing this decision if we see fit. The team has already discussed some of the many alternatives we may have at that point.