Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

Closed
antaloala opened this issue Mar 23, 2022 · 11 comments

Comments

@antaloala
Copy link

First of all: Flagger is amazing !
Now the issue (bug?):

I have just tested a non-HPA canary rollout using Flagger 1.18 with Contour (test based on https://docs.flagger.app/tutorials/contour-progressive-delivery after removing the HPA object and the reference to it from the Canary manifest).
As I used Flagger 1.18 the problem reported from #574 (and solved by PR #1106) is not happening ... but I am still missing two things:

  1. even when having previously posted the canary object in k8s API server, the .spec.replicas settings in the "canary" Deployment object are not reaching the "primary" Deployment object; you need some patch in the podTemplate of the "canary" Deployment object for that to happen (I think that should not be needed, so a missing part in PR Set primary deployment replicas when autoscaler isn't used #1106)

  2. once the canary rollout procedure finishes, the flagger controller resets the .spec.replicas field of the "canary" Deployment object (so "undoing" the desired settings). I think a GitOps based deployment would expect the .spec.replicas settings to remain; if not the GitOps agent will detect the diff and set it again in the next reconciliation loop, right?
    Yes, those pods to keep in the "canary" Deployment object are useless from a traffic point of view, but I think the right solution from a resource optimization point of view would be the usage of HPAs (but not making the flagger controller to "violate" the GitOps principles).

@stefanprodan
Copy link
Member

You need to use a HPA with min=max replicas when using a GitOps solution.

@antaloala
Copy link
Author

antaloala commented Mar 23, 2022

Thanks @stefanprodan.
Ok, then Flagger + GitOps requires HPA. That sounds reasonable as, when using HPA, the .spec.replicas on the "canary" Deployment object should not be set (as recommended in k8s HPA documentation) so when flagger controller "touches" it there is not any collision with the GitOps agent reconciliation, right?

What about that other missing part reported in first bullet above (from now not related to GitOps as it is for non-HPA cases): i.e. settings in the .spec.replicas of the "canary" Deployment object to reach the .spec.replicas of the "primary" Deployment object ... without the need to also have to touch the "canary" Deployment object's podTemplate) ?

@stefanprodan
Copy link
Member

I think this PR solved the 1st issue: #1110

It has been merged and it will be included in the next release.

@antaloala
Copy link
Author

antaloala commented Mar 24, 2022

As far as I understand that PR #1110 handles a different issue (it is about to use, in a non-HPA scenario, the "primary" spec.replicas in order to scale up the canary Deployment (at canary rollout execution) when "canary" spec.replicas is unset).

The bullet 1 above is about, in a non-HPA scenario, to make the "canary" spec.replicas settings to reach the "primary" spec.replicas setting without the need to have to run a canary rollout procedure.
As far as I understand Flagger solution (please correct me if being wrong) the flagger controllers creates the equivalent "primary" objects for the "original" (i.e. canary) Deployment and HPA objects referenced from the Canary object. So the guy creating the yaml manifets does not need to worry about those "other" Deployment and (optional) HPA objects, they are "created and handled" by flagger controller, "owned" by the Canary object referring to them).
So if that guy wants to increase the number of replicas, he/she will do that in the original/canary Deployment object and, once POSTed/PATCHed in k8s API server, flagger controller should

  • take them to update the primary Deployment
  • scale down to 0 the original/canary Deployment object (no GitOps violation as it is a non-HPA case ... as clarified above)

If not doing like that => we "force" that guy to have to update the objects created by flagger controller ... but should not it be better to avoid that?

@antaloala
Copy link
Author

antaloala commented Mar 24, 2022

Well, it is related to #1110 in the sense that, nowadays, for non-HPA scenarios, flagger requires to set the desired .spec.replicas directly in the "primary" Deployment object.
Would not it be better to always use the original/canary one for that so all objects created from flagger controller can be kept "internal" to flagger (so to say) ?

@stefanprodan
Copy link
Member

stefanprodan commented Mar 24, 2022

@antaloala we have no plans on working on the non-HPA features, a HPA with min=max is easy to create and allows users to change the desired replicas without patching the Flagger managed objects.

@antaloala
Copy link
Author

antaloala commented Mar 24, 2022

Thanks @stefanprodan. Yes, I was just wondering if being worthy to work on non-HPA cases (now that it is clear that GitOps alignment requires HPA ... and that minReplicas=maxReplicas "trick").

A final question/doubt.
I run a new flagger + contour test with that minReplicas=maxReplicas HPA. All worked as expected. After a canary rollout having been promoted (and so reflected in the "primary" Deployment's podTemplateSpec) I updated the orignal/canary HPA object (increasing the minReplicas=maxReplicas value) .. but no any change in the primary HPA. Getting the same if minReplicas != maxReplicas.

So coming back to GitOps alignment: should not flagger controller update the primary HPA with the updates applied in the original/canary HPA?

@stefanprodan
Copy link
Member

To trigger a canary deployment from Git, modify the pod annotations in the same commit with the HPA, like this: https://docs.flagger.app/faq#how-to-retry-a-failed-release

@antaloala
Copy link
Author

antaloala commented Mar 24, 2022

Then a canary rollout procedure will be triggered (pods to be terminated and (re)created and so on) .. even when it is "only" about changing the min and/or max replicas in the HPA object :-(

Would not it be better to make flagger controller to watch the original/canary HPA objects (i.e. the ones referred from the watched Canary objects) so it can update the "primary" HPA object to keep them in-sync? (in the same way flagger controller watches the original/canary Deployment objects)

@stefanprodan
Copy link
Member

A HPA change could break an app if for example the max replicas can't handle the traffic. Flagger can't promote a change without an analysis, unless you disable it with skipAnalysis: true.

@antaloala
Copy link
Author

Ok. Thanks a lot @stefanprodan for all the clarifications.
We can then close this issue (thanks again for your help and quick answers ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants