Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CLI): hotswapping should wait for Lambda's updateFunctionCode operation to complete #18386

Closed
1 of 2 tasks
skinny85 opened this issue Jan 12, 2022 · 6 comments · Fixed by #18536
Closed
1 of 2 tasks
Assignees
Labels
effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI

Comments

@skinny85
Copy link
Contributor

skinny85 commented Jan 12, 2022

Description

Right now, when hotswapping Lambda code, we don't wait for the updateFunctionCode API call we make to complete.

However, according to the documentation, this operation is actually eventually consistent today for Lambda functions that use Docker images, or are present in a VPC, and, even more importantly, this API will become eventually consistent for all Functions starting February 1, 2022.

We need to change our logic to wait for the operation to complete.

While implementing this, we should be cognizant of making sure we wait as effectively as possible, to not affect Lambda hotswap times too adversely. The simplest solution might be to use the standard Lambda waiter, that we use if we need to publish a new Version. Perhaps we can use a custom Waiter for this purpose, like we do for ECS service hotswapping.

Thanks a lot to @tmokmss who figured out all of this while working on Docker image Function hotswapping.

Acknowledge

  • I may be able to implement this feature request
  • This feature might incur a breaking change
@skinny85 skinny85 added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jan 12, 2022
@github-actions github-actions bot added the package/tools Related to AWS CDK Tools or CLI label Jan 12, 2022
@skinny85 skinny85 added effort/medium Medium work item – several days of effort p1 and removed needs-triage This issue or PR still needs to be triaged. labels Jan 12, 2022
@skinny85 skinny85 changed the title (CLI): hotswapping should wait for Lambda updateFunctionCode operation to complete (CLI): hotswapping should wait for Lambda's updateFunctionCode operation to complete Jan 13, 2022
@corymhall corymhall assigned corymhall and unassigned rix0rrr Jan 13, 2022
@corymhall
Copy link
Contributor

corymhall commented Jan 14, 2022

After looking into this, there are some nuances to account for. Here is what I've found.

There are a couple pieces of information that we get from Lambda after updating the function code.

State

Controls whether we can update the function and whether we can invoke it:

  • State=Active: Function can be invoked. Whether it can be updated is determined by LastUpdateStatus (see below)
  • State=Pending: Function cannot be invoked or updated

For our purposes the State can either be Active or Pending. When updating a function (not creating) there is only 1 scenario when the State can be Pending (when StateReasonCode=Restoring) which I describe later on.

LastUpdateStatus

Controls whether the function can be updated:

  • LastUpdatedStatus=Successful: The update is complete and the function can be updated
  • LastUpdateStatus=InProgress: The update is in progress. If the State=Active then the function can still be invoked, but the invoke will happen against the old config. Cannot make updates while the status is InProgress

StateReasonCode

The reason for the current State. For our purposes the only value we care about is:

  • StateReasonCode=Restoring: This will only occur on a VPC function that has been idle for a couple of weeks and Lambda has reclaimed the ENI resources. If you make an update to an idle function the State will transition to Pending with the StateReasonCode=Restoring. During this time the function cannot be invoked or updated.

Update Time

In my testing I found that the time it took for a function to go from LastUpdateStatus=InProgress to LastUpdateStatus=Successful was:

  • ~1 second for a zip Function not in a VPC
  • ~25 seconds for a container Function or a Function in a VPC
  • ~2 minutes to deploy a VPC function (best proxy for StateReasonCode=Restoring)

Proposal

Based on the above I think we have to account for 2-3 different scenarios.

Scenario 1 - zip function not in a VPC
If the function is a zip function not in a VPC (we get this info back from updateFunctionCode) then we can poll (quickly) for LastUpdateStatus=Successful. We probably don't want to use Lambda waiters for this since those wait at 5 second intervals.

Scenario 2: StateReasonCode=Restoring
In this scenario we need to wait until State=Active & LastUpdateStatus=Successful. I think it makes sense to not complete the hotswap operation until this state is reached since the function cannot be invoked or updated. We can poll more slowly here since we know it will take at least 60 seconds.

Scenario 3: container function or function in a VPC
In this scenario we can immediately invoke the function, but cannot make updates (for ~25 seconds). We could try to handle this differently than scenario 2, for example the hotswap operation could complete, but we could block/queue future updates until the LastUpdateStatus=Successful. Since this is a hotswap though I would assume that the user would core more about being able to invoke the "new" code so they probably wouldn't consider the hotswap complete until LastUpdateStatus=Successful. If this is the case, we would just treat this scenario the same as scenario 2.

@skinny85
Copy link
Contributor Author

Awesome work @corymhall researching this, super thorough! I agree with all of your recommendations 🙂.

Given that, do you have enough information to start working on implementing this?

@corymhall
Copy link
Contributor

@skinny85 yep I've already started working on it :)

@skinny85
Copy link
Contributor Author

@corymhall one last question. I understand your comment gives a great summary of the current state.

Do you know how will that description be affected by the changes outlined in the blog post linked in the issue description, that are scheduled to go out on February 1?

@corymhall
Copy link
Contributor

@skinny85 my comment describes the state with those changes in place. In the current state the State will always be active and the LastUpdateStatus will always be Successful.

@mergify mergify bot closed this as completed in #18536 Jan 21, 2022
mergify bot pushed a commit that referenced this issue Jan 21, 2022
…mplete (#18536)

There are [upcoming changes](https://aws.amazon.com/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/)
that will rollout Lambda states to all Lambda Functions. Prior to
this update (current functionality) when you made an
`updateFunctionCode` request the function was immediately available for
both invocation and future updates. Once this change is rolled out this
will no longer be the case. With Lambda states, when you make an update
to a Lambda Function, it will not be available for future updates until
the `LastUpdateStatus` returns `Successful`.

This PR introduces a custom waiter that will wait for the update to
complete before proceeding. The waiter will wait until the
`State=Active` and the `LastUpdateStatus=Successful`.

The `State` controls whether or not the function can be invoked, and the
`LastUpdateStatus` controls whether the function can be updated. Based
on this, I am not considering a deployment complete until both are
successful. To see a more in depth analysis of the different values see #18386.

In my testing I found that the time it took for a function to go from
`LastUpdateStatus=InProgress` to `LastUpdateStatus=Successful` was:

- ~1 second for a zip Function not in a VPC
- ~25 seconds for a container Function or a Function in a VPC
- ~2 minutes to deploy a VPC function (best proxy for StateReasonCode=Restoring)

There are a couple of built in waiters that could have been used for
this, namely
[functionUpdated](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html#functionUpdated-waiter).
This waiter uses `getFunctionConfiguration` which has a quota of 15
requests/second. In addition the waiter polls every 5 seconds and this
cannot be configured. Because hotswapping is sensitive to any latency
that is introduced, I created a custom waiter that uses `getFunction`.
`getFunction` has a quota of 100 requests/second and the custom waiter
can be configured to poll every 1 second or every 5 seconds depending on
what type of function is being updated.

fixes #18386


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

LukvonStrom pushed a commit to LukvonStrom/aws-cdk that referenced this issue Jan 26, 2022
…mplete (aws#18536)

There are [upcoming changes](https://aws.amazon.com/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/)
that will rollout Lambda states to all Lambda Functions. Prior to
this update (current functionality) when you made an
`updateFunctionCode` request the function was immediately available for
both invocation and future updates. Once this change is rolled out this
will no longer be the case. With Lambda states, when you make an update
to a Lambda Function, it will not be available for future updates until
the `LastUpdateStatus` returns `Successful`.

This PR introduces a custom waiter that will wait for the update to
complete before proceeding. The waiter will wait until the
`State=Active` and the `LastUpdateStatus=Successful`.

The `State` controls whether or not the function can be invoked, and the
`LastUpdateStatus` controls whether the function can be updated. Based
on this, I am not considering a deployment complete until both are
successful. To see a more in depth analysis of the different values see aws#18386.

In my testing I found that the time it took for a function to go from
`LastUpdateStatus=InProgress` to `LastUpdateStatus=Successful` was:

- ~1 second for a zip Function not in a VPC
- ~25 seconds for a container Function or a Function in a VPC
- ~2 minutes to deploy a VPC function (best proxy for StateReasonCode=Restoring)

There are a couple of built in waiters that could have been used for
this, namely
[functionUpdated](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html#functionUpdated-waiter).
This waiter uses `getFunctionConfiguration` which has a quota of 15
requests/second. In addition the waiter polls every 5 seconds and this
cannot be configured. Because hotswapping is sensitive to any latency
that is introduced, I created a custom waiter that uses `getFunction`.
`getFunction` has a quota of 100 requests/second and the custom waiter
can be configured to poll every 1 second or every 5 seconds depending on
what type of function is being updated.

fixes aws#18386


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this issue Feb 21, 2022
…mplete (aws#18536)

There are [upcoming changes](https://aws.amazon.com/blogs/compute/coming-soon-expansion-of-aws-lambda-states-to-all-functions/)
that will rollout Lambda states to all Lambda Functions. Prior to
this update (current functionality) when you made an
`updateFunctionCode` request the function was immediately available for
both invocation and future updates. Once this change is rolled out this
will no longer be the case. With Lambda states, when you make an update
to a Lambda Function, it will not be available for future updates until
the `LastUpdateStatus` returns `Successful`.

This PR introduces a custom waiter that will wait for the update to
complete before proceeding. The waiter will wait until the
`State=Active` and the `LastUpdateStatus=Successful`.

The `State` controls whether or not the function can be invoked, and the
`LastUpdateStatus` controls whether the function can be updated. Based
on this, I am not considering a deployment complete until both are
successful. To see a more in depth analysis of the different values see aws#18386.

In my testing I found that the time it took for a function to go from
`LastUpdateStatus=InProgress` to `LastUpdateStatus=Successful` was:

- ~1 second for a zip Function not in a VPC
- ~25 seconds for a container Function or a Function in a VPC
- ~2 minutes to deploy a VPC function (best proxy for StateReasonCode=Restoring)

There are a couple of built in waiters that could have been used for
this, namely
[functionUpdated](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda.html#functionUpdated-waiter).
This waiter uses `getFunctionConfiguration` which has a quota of 15
requests/second. In addition the waiter polls every 5 seconds and this
cannot be configured. Because hotswapping is sensitive to any latency
that is introduced, I created a custom waiter that uses `getFunction`.
`getFunction` has a quota of 100 requests/second and the custom waiter
can be configured to poll every 1 second or every 5 seconds depending on
what type of function is being updated.

fixes aws#18386


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants