Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack unable to delete ServiceLinkedRoles when upgraded to v1.4.3 #237

Open
silkyroadsilk opened this issue Aug 15, 2023 · 12 comments
Open
Labels
bug Something isn't working

Comments

@silkyroadsilk
Copy link

Describe the bug
When updated from version 1.4.1 to 1.4.3 the pipeline errored out in failure to delete existing Service Linked Roles.

2023-08-14 10:38:42.275 | error | toolkit | Stack Deployments Failed: Error: The stack named AWSAccelerator-AccountsStack-123456789-us-east-1 failed to deploy: UPDATE_ROLLBACK_FAILED (The following resource(s) failed to update: [DenyOnSecurityOUsF05B383A, GuardDutyServiceLinkedRoleCreateServiceLinkedRoleResourceD5FE1FBD, DenyOnMigrated7312F37B, SecurityHubServiceLinkedRoleCreateServiceLinkedRoleResource4CC7EFAA, DenyOnProduction26D683DC, DenyOnSandboxD0F93382, DenyOnDevelopmentC81CE8A0]. ): Received response status [FAILED] from custom resource. Message returned: AccessDeniedException: Resource is not in the state functionActive
AWSAccelerator-AccountsStack-1234567891234-us-east-1 |  0/32 | 10:38:23 AM | UPDATE_FAILED        | Custom::CreateServiceLinkedRole | GuardDutyServiceLinkedRole/CreateServiceLinkedRoleResource/Default (GuardDutyServiceLinkedRoleCreateServiceLinkedRoleResourceD5FE1FBD) Received response status [FAILED] from custom resource. Message returned: AccessDeniedException: Resource is not in the state functionActive
    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:61:27)
    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:61:8)
    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)

    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
        at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
        at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
        at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 9ec79f9b-e8d9-49f3-a973-8f6d44b96d2c)
        new CustomResource (/codebuild/output/src2727/src/s3/00/source/node_modules/aws-cdk-lib/core/lib/custom-resource.js:1:823)
        \_ new ServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/constructs/lib/aws-iam/service-linked-role.ts:87:22)
        \_ AccountsStack.createServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accelerator-stack.ts:1210:9)
        \_ AccountsStack.createGuardDutyServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accelerator-stack.ts:901:12)
        \_ new AccountsStack (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accounts-stack.ts:258:14)
        
    \_ main (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/bin/app.ts:543:29)
        \_ processTicksAndRejections (node:internal/process/task_queues:96:5)
        \_ async /codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/bin/app.ts:1017:5
    AWSAccelerator-AccountsStack-1234567891234-us-east-1 |  0/32 | 10:38:23 AM | UPDATE_FAILED        | Custom::CreateServiceLinkedRole | SecurityHubServiceLinkedRole/CreateServiceLinkedRoleResource/Default (SecurityHubServiceLinkedRoleCreateServiceLinkedRoleResource4CC7EFAA) Received response status [FAILED] from custom resource. Message returned: AccessDeniedException: Resource is not in the state functionActive
        at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:61:27)
        at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:61:8)
        at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
        at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
        at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
        at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
        at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
        at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
        at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
        at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 4d9cd5cd-895c-433a-8444-823324098955)
        new CustomResource (/codebuild/output/src2727/src/s3/00/source/node_modules/aws-cdk-lib/core/lib/custom-resource.js:1:823)
        \_ new ServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/constructs/lib/aws-iam/service-linked-role.ts:87:22)
        \_ AccountsStack.createServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accelerator-stack.ts:1226:11)
        \_ AccountsStack.createSecurityHubServiceLinkedRole (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accelerator-stack.ts:957:12)
        \_ new AccountsStack (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/lib/stacks/accounts-stack.ts:261:14)
        \_ main (/codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/bin/app.ts:543:29)
        \_ processTicksAndRejections (node:internal/process/task_queues:96:5)
        \_ async /codebuild/output/src2727/src/s3/00/source/packages/@aws-accelerator/accelerator/bin/app.ts:1017:5

To Reproduce
I have tried to re-run the AWSAccelerator-Pipeline after having upgraded landing-zone-accelerator-on-aws to version 1.4.3. In doing so the pipeline was unabled to delete the following roles AWSServiceRoleForSecurityHub', 'AWSServiceRoleForAccessAnalyzer' and 'AWSServiceRoleForAmazonGuardDuty' with the reason AccessDeniedException.

Expected behavior
I expect when the pipeline line is run, that if the roles already exist it will be able to delete the existing and replace with the new.

Additional context
I have also tried to delete a Role by hand in the AWS console and I get the following error:
IAM Access Analyzer is enabled in one or more regions in your AWS organization. Ask your administrator to delete all analyzers in all regions for your organization before attempting to delete this role.
Having seen this message I ensured that no Access Analyzers exist in any region, and tried to delete again after some time. The same error still persists even though there are no access Analyzers.

Here is an extract of the cloudwatch logs

2023-08-15T09:47:14.528Z	5c61f759-c077-40db-90ca-772b14b6cdb6	INFO	[provider-framework] executing user function arn:aws:lambda:us-east-1:123456789123:function:AWSAccelerator-AccountsSt-AccessAnalyzerServiceLin-knBwOCWbfBtn with payload 
{
    "RequestType": "Update",
    "ServiceToken": "arn:aws:lambda:us-east-1:123456789123:function:AWSAccelerator-AccountsSt-AccessAnalyzerServiceLin-HGbb5TyW6yG6",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:us-east-1:123456789123:stack/AWSAccelerator-AccountsStack-123456789123-us-east-1/f7e419b0-1fef-11ee-847f-1284cfa3114f",
    "RequestId": "e2c90ef4-4b21-4e5f-b43e-1266babd0e9a",
    "LogicalResourceId": "AccessAnalyzerServiceLinkedRoleCreateServiceLinkedRoleResource7C0C5637",
    "PhysicalResourceId": "1ac3cae3-239d-42bb-b4cc-7d68dae0f523",
    "ResourceType": "Custom::CreateServiceLinkedRole",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:us-east-1:123456789123:function:AWSAccelerator-AccountsSt-AccessAnalyzerServiceLin-HGbb5TyW6yG6",
        "roleName": "AWSServiceRoleForAccessAnalyzer",
        "serviceName": "access-analyzer.amazonaws.com",
        "uuid": "9bf3a309-8b93-4ef4-b772-2d3120e2c7b8"
    },
    "OldResourceProperties": {
        "ServiceToken": "arn:aws:lambda:us-east-1:123456789123:function:AWSAccelerator-AccountsSt-AccessAnalyzerServiceLin-HGbb5TyW6yG6",
        "roleName": "AWSServiceRoleForAccessAnalyzer",
        "serviceName": "access-analyzer.amazonaws.com",
        "uuid": "3cec1420-e77d-4d38-bfa4-6cf13b2c2e01"
    }
}
2023-08-15T09:47:15.692Z	5c61f759-c077-40db-90ca-772b14b6cdb6	INFO	[provider-framework] submit response to cloudformation 
{
    "Status": "FAILED",
    "Reason": "AccessDeniedException: Resource is not in the state functionActive\n    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:61:27)\n    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:61:8)\n    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\n    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)\n    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)\n    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)\n    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10\n    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)\n    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)",
    "StackId": "arn:aws:cloudformation:us-east-1:123456789123:stack/AWSAccelerator-AccountsStack-123456789123-us-east-1/f7e419b0-1fef-11ee-847f-1284cfa3114f",
    "RequestId": "e2c90ef4-4b21-4e5f-b43e-1266babd0e9a",
    "PhysicalResourceId": "1ac3cae3-239d-42bb-b4cc-7d68dae0f523",
    "LogicalResourceId": "AccessAnalyzerServiceLinkedRoleCreateServiceLinkedRoleResource7C0C5637"
}
@KashifSaadat
Copy link

KashifSaadat commented Aug 15, 2023

Peeking at the CloudWatch Logs, at a guess it looks like this issue was introduced in v1.4.3 in the following commit: 5854321

The resource properties will show a change every time for the UUID and attempt to delete + recreate the ServiceLinkedRole, which won't work for the GuardDuty, AccessAnalyzer and SecurityHub SLRs when those features are enabled in the solution. There's no PR or comments related to the commit above. Can someone comment and confirm whether this is the cause, and why the UUID change was introduced for SLRs? What is the recommended procedure to recover from this issue, as we cannot progress with any changes to the solution.

Edit: We rolled back to v1.4.2 and the pipeline succeeded. So I suspect it was the change linked above that caused the problem.

@atte-hemminki
Copy link

We have experienced the same issue in our environment. It seems that pipeline is able to randomly run the Accounts step successfully ( it might take 1 retry of the accounts step, sometimes up to 4 retries).

@bo1984
Copy link

bo1984 commented Aug 21, 2023

Thank you for bringing this to our attention @silkyroadsilk , we're aware of this issue and should be addressing this in our next release. As @atte-hemminki a workaround for this is to retry the stage. As this is already being tracked, I will keep this issue open and update you once this issue has been addressed in a later release.

@de-cx-cloud
Copy link

I have encountered the same issue consistently across all releases of the pipeline, specifically with the Cloudformation stack in the region us-east-1. The issue lies in the inability of the Cloudformation stack to delete the AWSAccelerator ServiceLinkedRoles. This leads to a situation where I have to manually destroy the stack multiple times until the roles are successfully deleted.

This issue is reproducible with the following specifications:
LZA Version: 1.6.2
Template: TSE-SE

❌ Deployment failed: Error: The stack named AWSAccelerator-AccountsStack-635719067474-us-east-1 failed to deploy: DELETE_FAILED (The following resource(s) failed to delete: [SecurityHubServiceLinkedRoleCreateServiceLinkedRoleResource4CC7EFAA]. ): Received response status [FAILED] from custom resource. Message returned: TimeoutError: {"state":"TIMEOUT","reason":"Waiter has timed out"}

@alexhaycock
Copy link

alexhaycock commented Apr 26, 2024

@de-cx-cloud Seeing the exact same issue as you, multiple retries of that stage and it finally works. We've got another LZA deployed not using the default prefix 'AWSAccelerator' and never really seen this error.

@spyoungtech
Copy link

spyoungtech commented May 24, 2024

I have a similar issue with a custom resource lambda. It just randomly times out, according to CFN. The message returned ("waiter timed out") is obviously part of the framework code, not my lambda itself.

For example, the custom resource lambda does nothing on resource deletions (because it basically always 'retains' the underlying resource). So, it's not clear to me why the lambda is timing out, even in cases where the lambda action is basically a no-op. Retrying several time resolves the issue, but it's really frustrating, especially in stack creations where this error will cause the stack creation to rollback entirely.

@itmustbejj
Copy link

itmustbejj commented Aug 9, 2024

It's been a year. Is there any movement on this? I waste so much time retrying the Accounts stage because of this error. I had to retry this 4 times before it would finally work today, which is a typical experience with this bug.

@gustavo-guerra-compasso

I have the same problem.

@richardkeit
Copy link
Contributor

@itmustbejj , @gustavo-guerra-compasso - what versions are you on?

Providing as much detail os possible can help prioritise, for example posts above say the default prefix don't see this issue

@gustavo-guerra-compasso

I'm using version 1.9.1 I have the same problem that @de-cx-cloud is having. The account stage timeouts sometimes and I have to retry the stage.

@mbevc1
Copy link

mbevc1 commented Aug 31, 2024

Similar here with
UPDATE_FAILED | Custom::CreateServiceLinkedRole | MacieServiceLinkedRole/CreateServiceLinkedRoleResource/Default

Using v1.9.2

@adielLevyAllcloud
Copy link

Same issue
Similar here with
UPDATE_FAILED | Custom::CreateServiceLinkedRole | MacieServiceLinkedRole/CreateServiceLinkedRoleResource/Default

Using v1.9.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests