Issue when updating the number of replicas, no effect on the number of primary pods #574

fabianpiau · 2020-04-30T14:49:42Z

Hi,

I notice when I change the number of replicas to scale up the app, then it does not have the desired effect, like if Flagger is interfering and hijack the new setting for the canary pods only. I did not try scaling down, but I assume it is the same.

PS: I do not use an HPA (I plan to try out at a later stage)

Scenario to reproduce:

Let's say I have replicaCount set to 2, I did not change it. When I did my deployment of my app v1.1, there were 2 canary pods and 2 primary pods during the analysis. At the end of the promotion 0 canary pods and 2 primary pods. That's fine and was expected.

Now, I increase the number of replica to 4 and canary deploy a new version v1.2. There were 4 canary pods and 2 primary pods during the analysis. I guess that's fine as I don't use an HPA. But, at the end of the promotion, I have 0 canary pods and 2 primary pods still.

To go further, then I deploy a v1.3 keeping the number of replica to 4. There were 1 canary pod (not sure why not 2 here?) and 2 primary pods during the analysis. And, at the end of the promotion, I have 0 canary pods and 2 primary pods. So the number 4 is totally ignored and the behavior is quite different, I can't explain why 1 canary pod and not 2.

Last test, I disabled Flagger and tried again the same scenario (i.e. replica set from 2 to 4) and it was ok and the new setting was taken into account ending up with 4 pods of my app.

I was able to reproduce the exact same scenario on my local Kubernetes but also on an AWS Sandbox Kube.

Flagger version used (the last one): 1.0.0 RC4

Can you help?

The text was updated successfully, but these errors were encountered:

fabianpiau · 2020-04-30T16:25:01Z

I am posting a second message with some extra info on this.

So I tried to use the canary release with an HPA but it does not change the behavior and the scenario I described above is still reproducible.

The impact of the HPA was the fact it spins up less canary pods.

That's strange this has never been raised before, unless I have a miss-configuration somewhere but apart from the scaling issue, the canary deployment works well.

stefanprodan · 2020-04-30T16:25:58Z

Have you added the HPA reference to the canary spec?

fabianpiau · 2020-04-30T16:30:27Z

Yes I did (FYI, avd is the app name)

autoscaling.yaml

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  labels:
    app: "avd"
  name: "avd"
spec:
  scaleTargetRef:
    apiVersion: extensions/v2beta1
    kind: Deployment
    name: "avd"
  minReplicas: 1
  maxReplicas: 6
  metrics:
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: 50

canary.yaml

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: avd
  namespace: poc-flagger
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: avd
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 600
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: avd
  service:
    # service port number
    port: 8080
    # container port number or name (optional)
    targetPort: 8080
    # Istio gateways (optional)
    gateways:
      - istio-system/wildcard-istio-gateway
    # Istio virtual service host names (optional)
    hosts:
      - avd.poc-flagger.svc.cluster.local
      - avd.istio-gateway.backend.k8s.us-west-2.hcom-sandbox-aws.aws.hcom
    # Istio retry policy (optional)
    retries:
      attempts: 3
      perTryTimeout: 1s
      retryOn: "gateway-error,connect-failure,refused-stream"
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
      - name: request-success-rate
        # minimum req success rate (non 5xx responses)
        # percentage (0-100)
        thresholdRange:
          min: 99
        interval: 30s
      - name: request-duration
        # maximum req duration P99
        # milliseconds
        thresholdRange:
          max: 500
        interval: 30s

fabianpiau · 2020-04-30T16:38:38Z

Actually, after a second thought, I think when HPA is enabled then the replicaCount is not taken into account anymore. That may explain why my number of replicas does not scale up to 4 (Kube is not using it anymore and just look at the actual cpu usage).

But it does not explain why it does not work when I don't use an HPA 🤔

stefanprodan · 2020-04-30T17:01:23Z

You should remove the replicaCount from your deployment when using HPA. As for the non-HPA setup it looks like a bug indeed.

stefanprodan · 2020-04-30T17:03:41Z

For now, if you don't want to use autoscaling you can use a dummy HPA with minReplicas= maxReplicas.

fabianpiau · 2020-04-30T18:27:16Z

I tried to scale up and down using the same values minReplicas=maxReplicas and this workaround worked 👍

I will leave the issue open so you can investigate why it does not work without a "dummy HPA".

Thanks for your support!

ridhoq · 2020-11-04T17:58:03Z

Hello there, we just ran into this issue as well on v1.0.0. Is there any progress on this?

oavdonin · 2020-11-11T09:55:13Z

The same for me, it seems that flagger doesn't track the spec.replicas change when hpa is not used.

Alpacius · 2020-12-28T08:27:49Z

I tried to scale up and down using the same values minReplicas=maxReplicas and this workaround worked 👍

I will leave the issue open so you can investigate why it does not work without a "dummy HPA".

Thanks for your support!

It may be caused by how flagger detects changes in spec:

// HasTargetChanged returns true if the canary deployment pod spec has changed
func (c *DeploymentController) HasTargetChanged(cd *flaggerv1.Canary) (bool, error) {
	targetName := cd.Spec.TargetRef.Name
	canary, err := c.kubeClient.AppsV1().Deployments(cd.Namespace).Get(context.TODO(), targetName, metav1.GetOptions{})
	if err != nil {
		return false, fmt.Errorf("deployment %s.%s get query error: %w", targetName, cd.Namespace, err)
	}

	return hasSpecChanged(cd, canary.Spec.Template)
}

For deployments, only changes in pod templates would be recognized. Change of spec.replicas is simply omitted.

segevmatuti1 · 2021-05-26T15:07:43Z

Hey @stefanprodan ,
any updates regarding this issue?
thank you !

eloo · 2022-02-10T17:30:24Z

Just stumbled over this issue.
It seems that this problem is still present and the number of replicas can not be adjusted when HPA is not used

eloo · 2022-02-14T12:46:25Z

Awesome that this is fixed so fast

@somtochiama thanks!

stefanprodan added the kind/bug Something isn't working label Apr 30, 2020

garretholland mentioned this issue Jun 3, 2021

Reconcile HPA for primary when canary changes #214

Closed

dinhanhhuy mentioned this issue Aug 12, 2021

Create annotation allow update number replicas of deployment that don't use HPA #973

Open

somtochiama mentioned this issue Feb 11, 2022

Set primary deployment replicas when autoscaler isn't used #1106

Merged

stefanprodan closed this as completed in #1106 Feb 14, 2022

antaloala mentioned this issue Mar 23, 2022

Non-HPA Canary rollouts look to violate GitOps in the "canary" deployment object when .spec.replicas being set #1157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue when updating the number of replicas, no effect on the number of primary pods #574

Issue when updating the number of replicas, no effect on the number of primary pods #574

fabianpiau commented Apr 30, 2020

fabianpiau commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

fabianpiau commented Apr 30, 2020 •

edited

Loading

fabianpiau commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

fabianpiau commented Apr 30, 2020

ridhoq commented Nov 4, 2020

oavdonin commented Nov 11, 2020

Alpacius commented Dec 28, 2020 •

edited

Loading

segevmatuti1 commented May 26, 2021

eloo commented Feb 10, 2022

eloo commented Feb 14, 2022

Issue when updating the number of replicas, no effect on the number of primary pods #574

Issue when updating the number of replicas, no effect on the number of primary pods #574

Comments

fabianpiau commented Apr 30, 2020

fabianpiau commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

fabianpiau commented Apr 30, 2020 • edited Loading

fabianpiau commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

stefanprodan commented Apr 30, 2020

fabianpiau commented Apr 30, 2020

ridhoq commented Nov 4, 2020

oavdonin commented Nov 11, 2020

Alpacius commented Dec 28, 2020 • edited Loading

segevmatuti1 commented May 26, 2021

eloo commented Feb 10, 2022

eloo commented Feb 14, 2022

fabianpiau commented Apr 30, 2020 •

edited

Loading

Alpacius commented Dec 28, 2020 •

edited

Loading