Adjust reserved concurrency limit for fetch Lambda #71

jacobwinch · 2022-06-14T13:20:19Z

What does this change?

tl;dr

This PR allows the Lambda which fetches saved articles to scale more liberally in order to avoid unreliable deployments.

Longer version

AWS Lambda provides a couple of different ways of controlling concurrency. In the Mobile account, it is common to use reserved concurrency.

Reserving concurrency has the following effects.

Other functions can't prevent your function from scaling – All of your account's functions in the same Region without reserved concurrency share the pool of unreserved concurrency. Without reserved concurrency, other functions can use up all of the available concurrency. This prevents your function from scaling up when needed.

Your function can't scale out of control – Reserved concurrency also limits your function from using concurrency from the unreserved pool, which caps its maximum concurrency. You can reserve concurrency to prevent your function from using all the available concurrency in the Region, or from overloading downstream resources.

It looks like we introduced reserved concurrency to prevent one of these lambdas from impacting on other services (see #42 and #43 for history). These limits were likely correct at the time, but they are now regularly causing this service to be throttled (N.B. throttling results in us serving 5XXs to clients):

This problem is particularly noticeable when the project is deployed. I presume this is because the execution times immediately after deployment are much slower (due to cold starts), so we need more concurrent executions for a short period in order to deal with the volume of requests.

I'm adjusting the limit in an attempt to avoid these problems.

How to test

There isn't a great way to test this; we'll need to ship to PROD and monitor things.

How can we measure success?

Hopefully this new limit is sufficient to allow deployments to run without impacting on reliability.

Have we considered potential risks?

There are a couple of risks here:

1. As we increase the amount of reserved concurrency used by this lambda, we shrink the pool of resources which is available for lambdas which use unreserved concurrency.

However:

our unreserved concurrency currently accounts for substantial proportion of the account limit:

other core Lambda-based services (e.g. notifications) use reserved concurrency anyway, so this change does not affect them
our max unreserved concurrent executions metric suggests that we are well below the throttling limit for this category:

2. We are hardcoding limits and they are likely to go out of date again.

We have a couple of options to mitigate this:

a) Swap to using unreserved concurrency
b) Configure alerting so that we notice if this type of problem happens again (N.B. this was also discussed here)

DavidLawes

Thanks for all the information you've provided here Jacob :)

At the end of the PR description there were 2 mitigations we could put in place to reduce/control the risk (move all lambdas to unreserved concurrency and put monitoring in place). Do you think it's worth reviewing and/or implementing these mitigations? If yes, we'll create tickets for the MSS team to look at them.

jacobwinch · 2022-06-16T08:45:57Z

Thanks for the review @DavidLawes!

Do you think it's worth reviewing and/or implementing these mitigations? If yes, we'll create tickets for the MSS team to look at them.

I'd suggest creating a card for 2b (and I'd be happy to pair on that when it is prioritised, if helpful). On reflection I'm not sure that 2a is advisable anyway because it'd allow this Lambda to stop other services from operating (by using up the whole pool), so it is probably worse for reliability overall.

DavidLawes · 2022-06-16T09:04:21Z

Thanks Jacob :) I've created a task for add the 5xx alerting, mss will reach out for pairing/assistance on this, thanks again!

Adjust reserved concurrency limit for fetch lambda

4495f7f

DavidLawes approved these changes Jun 16, 2022

View reviewed changes

jacobwinch merged commit cf95e09 into master Jun 16, 2022

jacobwinch deleted the jw-lambda-concurrency branch June 16, 2022 08:46

jacobwinch mentioned this pull request Jun 16, 2022

Adjust reserved concurrency limit for fetch Lambda (again) #74

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust reserved concurrency limit for fetch Lambda #71

Adjust reserved concurrency limit for fetch Lambda #71

jacobwinch commented Jun 14, 2022 •

edited

Loading

DavidLawes left a comment

jacobwinch commented Jun 16, 2022

DavidLawes commented Jun 16, 2022

Adjust reserved concurrency limit for fetch Lambda #71

Adjust reserved concurrency limit for fetch Lambda #71

Conversation

jacobwinch commented Jun 14, 2022 • edited Loading

What does this change?

How to test

How can we measure success?

Have we considered potential risks?

DavidLawes left a comment

Choose a reason for hiding this comment

jacobwinch commented Jun 16, 2022

DavidLawes commented Jun 16, 2022

jacobwinch commented Jun 14, 2022 •

edited

Loading