Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

antaloala · 2022-03-23T13:00:34Z

First of all: Flagger is amazing !
Now the issue (bug?):

I have just tested a non-HPA canary rollout using Flagger 1.18 with Contour (test based on https://docs.flagger.app/tutorials/contour-progressive-delivery after removing the HPA object and the reference to it from the Canary manifest).
As I used Flagger 1.18 the problem reported from #574 (and solved by PR #1106) is not happening ... but I am still missing two things:

even when having previously posted the canary object in k8s API server, the .spec.replicas settings in the "canary" Deployment object are not reaching the "primary" Deployment object; you need some patch in the podTemplate of the "canary" Deployment object for that to happen (I think that should not be needed, so a missing part in PR Set primary deployment replicas when autoscaler isn't used #1106)
once the canary rollout procedure finishes, the flagger controller resets the .spec.replicas field of the "canary" Deployment object (so "undoing" the desired settings). I think a GitOps based deployment would expect the .spec.replicas settings to remain; if not the GitOps agent will detect the diff and set it again in the next reconciliation loop, right?
Yes, those pods to keep in the "canary" Deployment object are useless from a traffic point of view, but I think the right solution from a resource optimization point of view would be the usage of HPAs (but not making the flagger controller to "violate" the GitOps principles).

stefanprodan · 2022-03-23T13:07:29Z

You need to use a HPA with min=max replicas when using a GitOps solution.

antaloala · 2022-03-23T16:58:40Z

Thanks @stefanprodan.
Ok, then Flagger + GitOps requires HPA. That sounds reasonable as, when using HPA, the .spec.replicas on the "canary" Deployment object should not be set (as recommended in k8s HPA documentation) so when flagger controller "touches" it there is not any collision with the GitOps agent reconciliation, right?

What about that other missing part reported in first bullet above (from now not related to GitOps as it is for non-HPA cases): i.e. settings in the .spec.replicas of the "canary" Deployment object to reach the .spec.replicas of the "primary" Deployment object ... without the need to also have to touch the "canary" Deployment object's podTemplate) ?

stefanprodan · 2022-03-23T19:02:15Z

I think this PR solved the 1st issue: #1110

It has been merged and it will be included in the next release.

antaloala · 2022-03-24T08:58:25Z

As far as I understand that PR #1110 handles a different issue (it is about to use, in a non-HPA scenario, the "primary" spec.replicas in order to scale up the canary Deployment (at canary rollout execution) when "canary" spec.replicas is unset).

The bullet 1 above is about, in a non-HPA scenario, to make the "canary" spec.replicas settings to reach the "primary" spec.replicas setting without the need to have to run a canary rollout procedure.
As far as I understand Flagger solution (please correct me if being wrong) the flagger controllers creates the equivalent "primary" objects for the "original" (i.e. canary) Deployment and HPA objects referenced from the Canary object. So the guy creating the yaml manifets does not need to worry about those "other" Deployment and (optional) HPA objects, they are "created and handled" by flagger controller, "owned" by the Canary object referring to them).
So if that guy wants to increase the number of replicas, he/she will do that in the original/canary Deployment object and, once POSTed/PATCHed in k8s API server, flagger controller should

take them to update the primary Deployment
scale down to 0 the original/canary Deployment object (no GitOps violation as it is a non-HPA case ... as clarified above)

If not doing like that => we "force" that guy to have to update the objects created by flagger controller ... but should not it be better to avoid that?

antaloala · 2022-03-24T09:01:06Z

Well, it is related to #1110 in the sense that, nowadays, for non-HPA scenarios, flagger requires to set the desired .spec.replicas directly in the "primary" Deployment object.
Would not it be better to always use the original/canary one for that so all objects created from flagger controller can be kept "internal" to flagger (so to say) ?

stefanprodan · 2022-03-24T09:04:32Z

@antaloala we have no plans on working on the non-HPA features, a HPA with min=max is easy to create and allows users to change the desired replicas without patching the Flagger managed objects.

antaloala · 2022-03-24T09:11:21Z

Thanks @stefanprodan. Yes, I was just wondering if being worthy to work on non-HPA cases (now that it is clear that GitOps alignment requires HPA ... and that minReplicas=maxReplicas "trick").

A final question/doubt.
I run a new flagger + contour test with that minReplicas=maxReplicas HPA. All worked as expected. After a canary rollout having been promoted (and so reflected in the "primary" Deployment's podTemplateSpec) I updated the orignal/canary HPA object (increasing the minReplicas=maxReplicas value) .. but no any change in the primary HPA. Getting the same if minReplicas != maxReplicas.

So coming back to GitOps alignment: should not flagger controller update the primary HPA with the updates applied in the original/canary HPA?

stefanprodan · 2022-03-24T09:24:54Z

To trigger a canary deployment from Git, modify the pod annotations in the same commit with the HPA, like this: https://docs.flagger.app/faq#how-to-retry-a-failed-release

antaloala · 2022-03-24T09:40:54Z

Then a canary rollout procedure will be triggered (pods to be terminated and (re)created and so on) .. even when it is "only" about changing the min and/or max replicas in the HPA object :-(

Would not it be better to make flagger controller to watch the original/canary HPA objects (i.e. the ones referred from the watched Canary objects) so it can update the "primary" HPA object to keep them in-sync? (in the same way flagger controller watches the original/canary Deployment objects)

stefanprodan · 2022-03-24T09:43:19Z

A HPA change could break an app if for example the max replicas can't handle the traffic. Flagger can't promote a change without an analysis, unless you disable it with skipAnalysis: true.

antaloala · 2022-03-24T09:46:09Z

Ok. Thanks a lot @stefanprodan for all the clarifications.
We can then close this issue (thanks again for your help and quick answers ;-)

stefanprodan closed this as completed Mar 24, 2022

antaloala mentioned this issue Jun 28, 2022

Canary upgrade without Ingress Controller/Service Mesh support #1226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

antaloala commented Mar 23, 2022

stefanprodan commented Mar 23, 2022

antaloala commented Mar 23, 2022 •

edited

Loading

stefanprodan commented Mar 23, 2022

antaloala commented Mar 24, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading

stefanprodan commented Mar 24, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading

stefanprodan commented Mar 24, 2022

antaloala commented Mar 24, 2022 •

edited

Loading

stefanprodan commented Mar 24, 2022

antaloala commented Mar 24, 2022

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

Comments

antaloala commented Mar 23, 2022

stefanprodan commented Mar 23, 2022

antaloala commented Mar 23, 2022 • edited Loading

stefanprodan commented Mar 23, 2022

antaloala commented Mar 24, 2022 • edited Loading

antaloala commented Mar 24, 2022 • edited Loading

stefanprodan commented Mar 24, 2022 • edited Loading

antaloala commented Mar 24, 2022 • edited Loading

stefanprodan commented Mar 24, 2022

antaloala commented Mar 24, 2022 • edited Loading

stefanprodan commented Mar 24, 2022

antaloala commented Mar 24, 2022

antaloala commented Mar 23, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading

stefanprodan commented Mar 24, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading

antaloala commented Mar 24, 2022 •

edited

Loading