Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: auto-promote canary taskgroups when mixed with non-canary taskgroups #11878

Merged

Conversation

kainoaseto
Copy link

When using the auto promote feature with canary deployments that have task groups without canaries, the deployment will never auto promote and hang even when the canaries are all healthy for the task groups that it has been enabled for.

This PR fixes this bug by skipping task groups that have no canaries set during the auto promote validation and adds a test to catch this case specifically. To see what occurs when this fix is not implemented (as has been observed in mainstream Nomad):

  • comment out/delete L292-L294 of nomad/deploymentwatcher/deployment_watcher.go
  • run tests for this package

Let me know if there's anything I can do to help shepard this through into a release, currently there's manual intervention in some of our deployments that this occurs in and it would be great to alleviate that.

…rolling deploy taskgroups that do not use the canary deployment system
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! I added a changelog entry. Feel free to update the wording if you can think of better phrasing.

Looking back at the original PR for this code it seems likely we just forgot this case. Great work tracking it down and adjusting a test to cover it.

Comment on lines 599 to 604
a := canaryAlloc()
b := canaryAlloc()

// Api taskgroup (1)
c := rollingAlloc()
e := rollingAlloc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to give up on 1 letter variable names at this point (ca1, ca2, ra1, and ra2 perhaps), but not a blocker.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a look at this @schmichael and updating the changelog! Nice, that was feeling a bit weird and I'll make that change shortly

@schmichael schmichael added this to the 1.3.0 milestone Jan 31, 2022
@schmichael
Copy link
Member

Merged! Will be release in 1.3.0 and backported to 1.1.x and 1.2.x at that time (or sooner).

Thanks again. Deployments are often a source of confusion for users, so if you spot any other bugs or improvements please don't hesitate to open issues or PRs.

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants