Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: --no-rollback and a changeset that includes replacement type updates causes a failed deployment #30546

Closed
1 of 2 tasks
tmokmss opened this issue Jun 13, 2024 · 7 comments · Fixed by #31920 or softwaremill/tapir#4137 · May be fixed by NOUIY/aws-solutions-constructs#135 or NOUIY/aws-solutions-constructs#136
Assignees
Labels
@aws-cdk/core Related to core CDK functionality cli Issues related to the CDK CLI effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI

Comments

@tmokmss
Copy link
Contributor

tmokmss commented Jun 13, 2024

Describe the feature

CDK should automatically disable no-rollback flag when a change set to deploy contains a replacement type updates. Otherwise the deployment fails and it leaves the stack in a non-terminal state, which does not allow updating stack any more without resetting the state.

Use Case

I'm always frustrated when I accidentally deploy a change with no-rollback flag enabled, and the deployment fails after a while because it contains replacement type updates.

Proposed Solution

CDK CLI uses CFn changeset feature to deploy a change, so it knows whether the change contains any replacement. If there is, it should automatically disable no-rollback flag.

Or we should at least validate the flag before actually executing a CFn deployment.

Other Information

The error we get when deploying a change with no-rollback flag and replacement change:

Replacement type updates not supported on stack with disable-rollback.

The error we get when deploying a change after a deployment failed due to the above error.

This stack is currently in a non-terminal [UPDATE_FAILED] state. To update the stack from this state, please use the disable-rollback parameter with update-stack API. To rollback to the last known good state, use the rollback-stack API

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CDK version used

2.145.0

Environment details (OS name and version, etc.)

macOS

@tmokmss tmokmss added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 13, 2024
@github-actions github-actions bot added the package/tools Related to AWS CDK Tools or CLI label Jun 13, 2024
@khushail
Copy link
Contributor

Thanks for reaching out with this request @tmokmss . Marking it appropriate for team's input.

@khushail khushail added p3 effort/medium Medium work item – several days of effort cli Issues related to the CDK CLI @aws-cdk/core Related to core CDK functionality and removed needs-triage This issue or PR still needs to be triaged. labels Jun 13, 2024
@comcalvi comcalvi changed the title cli: disable no-rollback automatically when a changeset includes replacement type updates cli: throw when --no-rollback is specified a changeset includes replacement type updates Aug 23, 2024
@comcalvi comcalvi added p2 p1 and removed p3 p2 labels Aug 23, 2024
@comcalvi comcalvi changed the title cli: throw when --no-rollback is specified a changeset includes replacement type updates cli: --no-rollback and a changeset that includes replacement type updates causes a failed deployment Aug 23, 2024
@rix0rrr rix0rrr self-assigned this Sep 6, 2024
@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 6, 2024

To confirm: you're requesting that as a user, you pass in --no-rollback, but the tool silently ignores that flag because the change set contains a replacement?

@tmokmss
Copy link
Contributor Author

tmokmss commented Sep 7, 2024

Hi @rix0rrr, thank you for taking care of the FR. I think there are several options to improve the current behavior when there is a replacement type change and --no-rollback flag is set:

  1. silently ignore --no-rollback, and deploy it with rollback
  2. throw an error before actually deploying the changeset
  3. CDK CLI automatically recovers from UPDATE_FAILED state

the 3rd option does not directly resolves the issue, but it reduces the manual work to recover from the following state:

This stack is currently in a non-terminal [UPDATE_FAILED] state. To update the stack from this state, please use the disable-rollback parameter with update-stack API. To rollback to the last known good state, use the rollback-stack API

@rix0rrr
Copy link
Contributor

rix0rrr commented Sep 10, 2024

I think I would prefer doing something with handling the UPDATE_FAILED state. To that end, I'm going to do the following:

  • Add a cdk rollback command to be able to get to a stable position from one of the paused fail states.
  • For cdk deploy:
    • If the stack is currently in a paused fail state AND (--no-rollback is not specified OR a replacement is detected): prompt the user to roll back first before starting the deployment
    • If the stack is not currently in a paused fail state AND --no-rollback is specified AND a replacement is detected: notify the user that --no-rollback won't work when there are replacements to do, and prompt them to continue with a regular deployment.

None of this replacement detection will work with --deployment-method=direct, only with change sets.

@tmokmss
Copy link
Contributor Author

tmokmss commented Sep 10, 2024

@rix0rrr That makes sense, thanks! I think the last one is the highest priority, because we should not get into UPDATE_FAILED state very often if CDK CLI can successfully prevent users from deploying a stack with --no-rollback and replacement changes. (I assume the number of people using --deployment-method=direct is small).

mergify bot pushed a commit that referenced this issue Oct 2, 2024
Add a CLI feature to roll a stuck change back.

This is mostly useful for deployments performed using `--no-rollback`: if a failure occurs, the stack gets stuck in an `UPDATE_FAILED` state from which there are 2 options:

- Try again using a new template
- Roll back to the last stable state

There used to be no way to perform the second operation using the CDK CLI, but there now is.

`cdk rollback` works in 2 situations:

- A paused fail state; it will initiating a fresh rollback (on `CREATE_FAILED`, `UPDATE_FAILED`).
- A paused rollback state; it will retry the rollback, optionally skipping some resources (on `UPDATE_ROLLBACK_FAILED` -- it seems there is no way to continue a rollback in `ROLLBACK_FAILED` state).

`cdk rollback --orphan <logicalid>` can be used to skip resource rollbacks that are causing problems.

`cdk rollback --force` will look up all failed resources and continue skipping them until the rollback has finished.

This change requires new bootstrap permissions, so the bootstrap stack is updated to add the following IAM permissions to the `deploy-action` role:

```
                  - cloudformation:RollbackStack
                  - cloudformation:ContinueUpdateRollback
```

These are necessary to call the 2 CloudFormation APIs that start and continue a rollback. 

Relates to (but does not close yet) #30546.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
rix0rrr added a commit that referenced this issue Oct 7, 2024
Add a CLI feature to roll a stuck change back.

This is mostly useful for deployments performed using `--no-rollback`: if a failure occurs, the stack gets stuck in an `UPDATE_FAILED` state from which there are 2 options:

- Try again using a new template
- Roll back to the last stable state

There used to be no way to perform the second operation using the CDK CLI, but there now is.

`cdk rollback` works in 2 situations:

- A paused fail state; it will initiating a fresh rollback (on `CREATE_FAILED`, `UPDATE_FAILED`).
- A paused rollback state; it will retry the rollback, optionally skipping some resources (on `UPDATE_ROLLBACK_FAILED` -- it seems there is no way to continue a rollback in `ROLLBACK_FAILED` state).

`cdk rollback --orphan <logicalid>` can be used to skip resource rollbacks that are causing problems.

`cdk rollback --force` will look up all failed resources and continue skipping them until the rollback has finished.

This change requires new bootstrap permissions, so the bootstrap stack is updated to add the following IAM permissions to the `deploy-action` role:

```
                  - cloudformation:RollbackStack
                  - cloudformation:ContinueUpdateRollback
```

These are necessary to call the 2 CloudFormation APIs that start and continue a rollback. 

Relates to (but does not close yet) #30546.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
rix0rrr added a commit that referenced this issue Oct 28, 2024
If a user is deploying with `--no-rollback`, and the stack contains
replacements (or the `--no-rollback` flag is dropped), then a rollback
needs to be performed before a regular deployment can happen again.

In this PR, we add a prompt where we ask the user to confirm that
they are okay with performing a rollback and then a normal deployment.

Closes #30546.
@mergify mergify bot closed this as completed in #31920 Nov 5, 2024
@mergify mergify bot closed this as completed in 2f9fb1e Nov 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

1 similar comment
Copy link

github-actions bot commented Nov 5, 2024

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 5, 2024
Leo10Gama pushed a commit to Leo10Gama/aws-cdk that referenced this issue Nov 13, 2024
If a user is deploying with `--no-rollback`, and the stack contains replacements (or the `--no-rollback` flag is dropped), then a rollback needs to be performed before a regular deployment can happen again.

In this PR, we add a prompt where we ask the user to confirm that they are okay with performing a rollback and then a normal deployment.

The way this works is that `deployStack` detects a disallowed combination (replacement and no-rollback, or being in a stuck state and not being called with no-rollback), and returns a special status code. The driver of the calls, `CdkToolkit`, will see those special return codes, prompt the user, and retry.

Also get rid of a stray `Stack undefined` that gets printed to the console.

Closes aws#30546, Closes aws#31685

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
@aws-cdk/core Related to core CDK functionality cli Issues related to the CDK CLI effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
4 participants