Adjust reserved concurrency limit for fetch Lambda #71
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this change?
tl;dr
This PR allows the Lambda which fetches saved articles to scale more liberally in order to avoid unreliable deployments.
Longer version
AWS Lambda provides a couple of different ways of controlling concurrency. In the Mobile account, it is common to use reserved concurrency.
It looks like we introduced reserved concurrency to prevent one of these lambdas from impacting on other services (see #42 and #43 for history). These limits were likely correct at the time, but they are now regularly causing this service to be throttled (N.B. throttling results in us serving 5XXs to clients):
This problem is particularly noticeable when the project is deployed. I presume this is because the execution times immediately after deployment are much slower (due to cold starts), so we need more concurrent executions for a short period in order to deal with the volume of requests.
I'm adjusting the limit in an attempt to avoid these problems.
How to test
There isn't a great way to test this; we'll need to ship to
PROD
and monitor things.How can we measure success?
Hopefully this new limit is sufficient to allow deployments to run without impacting on reliability.
Have we considered potential risks?
There are a couple of risks here:
1. As we increase the amount of reserved concurrency used by this lambda, we shrink the pool of resources which is available for lambdas which use unreserved concurrency.
However:
other core Lambda-based services (e.g. notifications) use reserved concurrency anyway, so this change does not affect them
our max unreserved concurrent executions metric suggests that we are well below the throttling limit for this category:
2. We are hardcoding limits and they are likely to go out of date again.
We have a couple of options to mitigate this:
a) Swap to using unreserved concurrency
b) Configure alerting so that we notice if this type of problem happens again (N.B. this was also discussed here)