Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollback windows for fast-tracked rollbacks #574

Closed
jessesuen opened this issue Jul 8, 2020 · 6 comments · Fixed by #2394
Closed

Rollback windows for fast-tracked rollbacks #574

jessesuen opened this issue Jul 8, 2020 · 6 comments · Fixed by #2394
Labels
enhancement New feature or request

Comments

@jessesuen
Copy link
Member

jessesuen commented Jul 8, 2020

Spawned from #557.

Currently we perform a "fast-tracked rollback" (which skips pauses, steps, analysis) in two circumstances:

  1. we detect if we are moving back to a blue-green ReplicaSet which exists and is still scaled up (within its scaleDownDelay)
  2. if we are moving back to the canary's "stable" ReplicaSet and the upgrade has not yet completed.

I think we should add more controls for intelligent, fast-tracked rollbacks for both the blue-green and canary strategies. For example, a use case is to have a fast-tracked rollback happen when moving to an older blue-green ReplicaSet even if it's scaled down (and not just if it's in the scaleDownDelay).

I think this can either be time based:

spec:
  rollbackWindow:
    duration: 24h
  strategy:
    blueGreen:
...

Or it could possibly be revision based (e.g. fast-rollback if we are moving to an n-2 revision):

spec:
  rollbackWindow:
    revisions: 2
  strategy:
    canary:
...

I propose to even place the rollbackWindow stanza outside strategy since both blue-green and canary would benefit from it.

@jessesuen jessesuen changed the title More controls on fast-tracked rollbacks More controls (rollback windows) for fast-tracked rollbacks Jul 8, 2020
@jessesuen jessesuen changed the title More controls (rollback windows) for fast-tracked rollbacks Rollback windows for fast-tracked rollbacks Jul 8, 2020
@jessesuen jessesuen added the enhancement New feature or request label Jul 8, 2020
@bysph
Copy link

bysph commented Jan 6, 2022

Have you considered the opposite situation, for example, an application needs to avoid multiple instances being started at the same time due to insufficient performance of the surrounding system, but sometimes we need to rollback during the canary. In this case, can we choose not to use "fast-tracked rollback"?

@svissarapu
Copy link

svissarapu commented Jan 19, 2022

@jessesuen this will be a good addition to the current argo rollouts. Are we tracking this against any release? thanks!

@pragmaticivan
Copy link

Howdy! Is anyone working on that feature already? will that be part of the next release?

@sys-ops
Copy link

sys-ops commented Jul 28, 2022

Hi Y'All, Is anyone working on this feature? I see it's been open for 2 years now.

It is highly demanded on production environments where during incidents we need to rollback quickly to the previous stable version.
Preanalysis and postanalysis runs have been already performed during a previous rollout, hence we do not need to make sure once again that the previous version is stable.

Could you add --full flag feature to "kubectl argo rollouts undo"?

$ kubectl argo rollouts promote --help | grep full
To skip analysis, pauses and steps entirely, use '--full' to fully promote the rollout
        kubectl argo rollouts promote guestbook --full
      --full   Perform a full promotion, skipping analysis, pauses, and steps

$ kubectl argo rollouts undo --help | grep full

The only way I can speed up the rollback now is to terminate prePromotionAnalysis and postPromotionAnalysis runs.
But that takes the essential time of the incident.

I think I could automate the three steps (undo, terminate pre, terminate post) based on the output of "kubectl argo rollouts get rollout", however the feature would be benefitial for all users of Argo Rollouts.

@bpoland
Copy link
Contributor

bpoland commented Jul 29, 2022

One workaround we are using right now is to leave a timed pause step at the end of the Rollout. We manually terminate the analysis run so that the only way we will roll back is if someone manually triggers it.

This works but it means that the "new" version is still marked as canary during that period, even though it is receiving 100% of traffic. I like the original idea proposed above but even if we could set the scaledown delay to a long period after the rollout is complete and allow quick rollback to that, it would be helpful.

@alexef
Copy link
Member

alexef commented Oct 27, 2022

@jessesuen we're also interested in this feature. I can take a stab at implementing it, can you give me some hints where should this be?

alexef added a commit to alexef/argo-rollouts that referenced this issue Nov 3, 2022
fixes: argoproj#574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>
alexef added a commit to alexef/argo-rollouts that referenced this issue Nov 3, 2022
fixes: argoproj#574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>
zachaller pushed a commit that referenced this issue Nov 25, 2022
* feature: introduce rollback windows

fixes: #574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* feature: introduce rollback windows - generated files

fixes: #574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* ran codegen again

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* More unit tests. New e2e

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* More tests to make codecov happy

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* increas lint timeout

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Exclude Experiment RS when computing rollback window

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Add documentation around new feature rollbackWindow

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Fix rollback window; cancel pauses and abort and skip to the end of analysis when the window is detected

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>
jandersen-plaid pushed a commit to jandersen-plaid/argo-rollouts that referenced this issue Nov 26, 2022
* feature: introduce rollback windows

fixes: argoproj#574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* feature: introduce rollback windows - generated files

fixes: argoproj#574
Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* ran codegen again

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* More unit tests. New e2e

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* More tests to make codecov happy

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* increas lint timeout

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Exclude Experiment RS when computing rollback window

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Add documentation around new feature rollbackWindow

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

* Fix rollback window; cancel pauses and abort and skip to the end of analysis when the window is detected

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>

Signed-off-by: Alex Eftimie <alex.eftimie@getyourguide.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants