Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-dynamodb] Fail to rollback if global table creation is failed #10256

Closed
sungbokang opened this issue Sep 9, 2020 · 2 comments
Closed

[aws-dynamodb] Fail to rollback if global table creation is failed #10256

sungbokang opened this issue Sep 9, 2020 · 2 comments
Assignees
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. needs-triage This issue or PR still needs to be triaged. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.

Comments

@sungbokang
Copy link

sungbokang commented Sep 9, 2020

Hi,

CloudFormation's rollback process has a bug regarding its sequence so that the DynamoDB replication rollback fails due to a missing IAM permission. During the rollback process of creating a DynamoDB global table, CloudFormation tries to delete the global table with the reverted IAM role, so it throws DELETE_FAILED status with a status reason below.

(masked confidential data)

Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at invokeUserFunction (/var/task/framework.js:85:19) at process._tickCallback (internal/process/next_tick.js:68:7) Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-***/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:us-west-2:***:table/*** at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27) at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20) at Request.emit (/tmp/nod

This was found out while creating an additional DynamoDB replica(global table).

Reproduction Steps

In order to reproduce this issue, the same reproducing scenario of this issue(#10249) should be followed.

What did you expect to happen?

If creating a DynamoDB replica(global table) fails for whatever reason, then it should've reverted to the previous status without any failures.

What actually happened?

The CloudFormation's events are listed as below,

  1. CREATE_IN_PROGRESS - Resource creation Initiated
    DynamoDB Table replication starts.

  2. CREATE_FAILED - Failed to create resource. Operation timed out
    It fails due to 30 minutes timeout limit.

  3. UPDATE_ROLLBACK_IN_PROGRESS - The following resource(s) failed to create: [***]
    The rollback process initiates.

  4. UPDATE_IN_PROGRESS
    awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource*** nested stack resource and ***DataLambda*** are reverted, which contain the IAM policy for generating the global table.

  5. UPDATE_COMPLETE
    Resouces are well reverted which are necessary for deleting the global table.

  6. UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS

  7. DELETE_IN_PROGRESS
    It tries to remove the global table which has been interrupted due to the 30 minute default totalTimeout limitation of CustomResource.

  8. DELETE_FAILED
    Deleting the global table fails.

Please have a look at the CF log that I partially captured from the console, (due to the confidentiality issue, it is not able to copy the whole log)

***DataResourcesStack-LOCAL: creating CloudFormation changeset...
 0/4 | 1:28:44 AM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack      | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***) 
 1/4 | 1:29:19 AM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack      | @aws-cdk--aws-dynamodb.ReplicaProvider.NestedStack/@aws-cdk--aws-dynamodb.ReplicaProvider.NestedStackResource (awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***) 
 1/4 | 1:29:23 AM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack      | ***DataResourcesStack-LOCAL 
 1/4 | 1:29:24 AM | DELETE_IN_PROGRESS   | AWS::CloudFormation::CustomResource | ***DataTableReplica****** 
1/4 Currently in progress: ***DataResourcesStack-LOCAL, ***DataTableReplica******
 2/4 | 1:30:19 AM | DELETE_FAILED        | AWS::CloudFormation::CustomResource | ***DataTableReplica****** Failed to delete resource. Error: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
    at invokeUserFunction (/var/task/framework.js:85:19)
    at process._tickCallback (internal/process/next_tick.js:68:7)
Remote function error: AccessDeniedException: User: arn:aws:sts::***:assumed-role/***DataResour-OnEventHandlerServiceRol-12IORSG2MLAGL/***DataResource-OnEventHandler***-*** is not authorized to perform: dynamodb:DeleteTableReplica on resource: arn:aws:dynamodb:***:***:table/***
    at Request.extractError (/tmp/node_modules/aws-sdk/lib/protocol/json.js:51:27)
    at Request.callListeners (/tmp/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/tmp/n

Environment

  • CLI Version : 1.32
  • Framework Version: 1.32
  • Node.js Version: NodeJS 12.x
  • OS : Amazon Linux 2012
  • Language (Version): Typescript 3.9.2

Other

  • Related issues : We found out this issue when encountered an operation timeout issue while creating an additional global table. This global table timeout issue is reported in a separate issue: [aws-dynamodb] Fail to create a global table due to replication time-out #10249
  • Suggestions on how to fix : This happens because the IAM policy that replica-provider's onEventHandler uses is reverted before deleting a global table. In order to fix this, the rollback sequence of a DynamoDB replica provider nested stack should be reverted only after deletion is succeeded.

This is 🐛 Bug Report

@sungbokang sungbokang added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 9, 2020
@github-actions github-actions bot added the @aws-cdk/aws-dynamodb Related to Amazon DynamoDB label Sep 9, 2020
@skinny85
Copy link
Contributor

Thanks for opening the issue @sungbokang . But I wonder whether there's much we can do here. That order that you spelled out, of all the events - that's driven by CloudFormation, and it's not something we can control in the CDK.

Any thoughts on that?

Thanks,
Adam

@skinny85 skinny85 added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Oct 16, 2020
@github-actions
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-dynamodb Related to Amazon DynamoDB bug This issue is a bug. closed-for-staleness This issue was automatically closed because it hadn't received any attention in a while. needs-triage This issue or PR still needs to be triaged. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Projects
None yet
Development

No branches or pull requests

2 participants