Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waitUntilFunctionUpdated (all waiters?) times out instead of throwing permissions error when permissions are missing #6699

Closed
3 of 4 tasks
rix0rrr opened this issue Nov 27, 2024 · 4 comments
Assignees
Labels
bug This issue is a bug. closed-for-staleness feature-request New feature or enhancement. May require GitHub community feedback. p2 This is a standard priority issue

Comments

@rix0rrr
Copy link

rix0rrr commented Nov 27, 2024

Checkboxes for prior research

Describe the bug

We were using waitUntilFunctionUpdated to wait for a function to stabilize after updating it. We have since moved to waitUntilFunctionUpdatedV2, I don't know if this matters.

Since then, we have gotten reports from the Amplify team that tests were failing with the error:

TimeoutError: Resource is not in the expected state due to waiter status: TIMEOUT. Waiter has timed out.

We spent days looking over timings, comparing code between v2 and v3 waitiers, and theorizing what might be the problem that caused the waiter to hit the timeout. We ultimately had to give up because we couldn't think of anything.

Then, 2 weeks later, at the least opportune time (blocked days for re:Invent), we figured out what the issue was: the policy they were using had permissions for lambda:GetFunction, but it needed permissions for lambda:GetFunctionConfiguration.

We could have known this immediately, but the waiter swallowed the permissions error and instead reported an "oh well the service doesn't seem to stabilize ¯\(ツ)/¯" error.

This is extremely unexpected behavior, and this poor error reporting has costs us many days and sweat and stress.

I understand you're probably doing this to proceed in the face of transient errors, but I would ask for one of the following behaviors, in order of preference:

  • If you catch a non-retryable error, throw it immediately instead of continuing to wait.
  • Upon throwing the TimedOut error, if you notice that you've been consistently getting the same error again and again over the period of the wait (not a single success), just throw that error instead.
  • Upon throwing the TimedOut error, include the (unique?) error messages of the errors you've seen in the error description.

Regression Issue

  • Select this option if this issue appears to be a regression.

SDK version number

@aws-sdk/client-lambda@3.699.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v22.11.0

Reproduction Steps

Create and assume a role that has the statement:

{
  "Effect": "Deny",
  "Action": "lambda:GetFunctionConfiguration",
  "Resource": "*"
}

Run the following code:

import { Lambda, waitUntilFunctionUpdated } from '@aws-sdk/client-lambda';

async function main() {
  const client = new Lambda({ region: 'eu-west-1' });
  await waitUntilFunctionUpdated({ client, maxWaitTime: 30 }, {
    FunctionName: 'some-function-in-your-account',
  });
  console.log('OK');
}

main().catch(e => {
  console.error(e);
  process.exitCode = 1;
});

Observed Behavior

After 10 seconds, see:

Error [TimeoutError]: {"state":"TIMEOUT","reason":"Waiter has timed out"}

Expected Behavior

Immediately, see:

An error occurred (AccessDeniedException) when calling the GetFunctionConfiguration operation: ...etc...

Possible Solution

No response

Additional Information/Context

No response

@rix0rrr rix0rrr added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 27, 2024
@kuhe kuhe self-assigned this Dec 2, 2024
@kuhe kuhe added p2 This is a standard priority issue queued This issues is on the AWS team's backlog feature-request New feature or enhancement. May require GitHub community feedback. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 2, 2024
@kuhe
Copy link
Contributor

kuhe commented Dec 6, 2024

If you catch a non-retryable error, throw it immediately instead of continuing to wait.

This is reasonable for most scenarios, but we can't do this in the absence of formal specification updates.

  • The waiter is designed by the service via its model, and they choose what constitutes a non-retryable error. In the case of functionUpdated, it is only when LastUpdateStatus == Failed.
  • Any other error may be something that the user is actively correcting. Just because authentication failed doesn't always mean a user intends for the waiter to abort. I realize this is unlikely, but still possible.

Upon throwing the TimedOut error, if you notice that you've been consistently getting the same error again and again over the period of the wait (not a single success), just throw that error instead.

This changes error handling unpredictably. We will attach a count of the observed responses instead.

Upon throwing the TimedOut error, include the (unique?) error messages of the errors you've seen in the error description.

I created smithy-lang/smithy-typescript#1467 to do this.

@kuhe kuhe added pending-release This issue will be fixed by an approved PR that hasn't been released yet. and removed queued This issues is on the AWS team's backlog labels Dec 6, 2024
@kuhe
Copy link
Contributor

kuhe commented Dec 9, 2024

This is available in https://www.npmjs.com/package/@smithy/util-waiter/v/3.2.0. A lockfile update would bring this in, until a future version of SDK clients requires it as the minimum version.

@kuhe kuhe added closing-soon This issue will automatically close in 4 days unless further comments are made. and removed pending-release This issue will be fixed by an approved PR that hasn't been released yet. labels Dec 9, 2024
@github-actions github-actions bot added closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Dec 14, 2024
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. closed-for-staleness feature-request New feature or enhancement. May require GitHub community feedback. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

2 participants