[aws-dynamodb] Fail to rollback if global table creation is failed #10256
Labels
@aws-cdk/aws-dynamodb
Related to Amazon DynamoDB
bug
This issue is a bug.
closed-for-staleness
This issue was automatically closed because it hadn't received any attention in a while.
needs-triage
This issue or PR still needs to be triaged.
response-requested
Waiting on additional info and feedback. Will move to "closing-soon" in 7 days.
Hi,
CloudFormation's rollback process has a bug regarding its sequence so that the DynamoDB replication rollback fails due to a missing IAM permission. During the rollback process of creating a DynamoDB global table, CloudFormation tries to delete the global table with the reverted IAM role, so it throws
DELETE_FAILED
status with a status reason below.(masked confidential data)
This was found out while creating an additional DynamoDB replica(global table).
Reproduction Steps
In order to reproduce this issue, the same reproducing scenario of this issue(#10249) should be followed.
What did you expect to happen?
If creating a DynamoDB replica(global table) fails for whatever reason, then it should've reverted to the previous status without any failures.
What actually happened?
The CloudFormation's events are listed as below,
CREATE_IN_PROGRESS - Resource creation Initiated
DynamoDB Table replication starts.
CREATE_FAILED - Failed to create resource. Operation timed out
It fails due to 30 minutes timeout limit.
UPDATE_ROLLBACK_IN_PROGRESS - The following resource(s) failed to create: [***]
The rollback process initiates.
UPDATE_IN_PROGRESS
awscdkawsdynamodbReplicaProviderNestedStackawscdkawsdynamodbReplicaProviderNestedStackResource***
nested stack resource and***DataLambda***
are reverted, which contain the IAM policy for generating the global table.UPDATE_COMPLETE
Resouces are well reverted which are necessary for deleting the global table.
UPDATE_ROLLBACK_COMPLETE_CLEANUP_IN_PROGRESS
DELETE_IN_PROGRESS
It tries to remove the global table which has been interrupted due to the 30 minute default totalTimeout limitation of
CustomResource
.DELETE_FAILED
Deleting the global table fails.
Please have a look at the CF log that I partially captured from the console, (due to the confidentiality issue, it is not able to copy the whole log)
Environment
Other
operation timeout
issue while creating an additional global table. This global table timeout issue is reported in a separate issue: [aws-dynamodb] Fail to create a global table due to replication time-out #10249replica-provider
'sonEventHandler
uses is reverted before deleting a global table. In order to fix this, the rollback sequence of a DynamoDB replica provider nested stack should be reverted only after deletion is succeeded.This is 🐛 Bug Report
The text was updated successfully, but these errors were encountered: