Infinite loop on canary deployment using Gateway API and EKS #1732

mketh-nhs · 2024-11-27T10:39:52Z

Describe the bug

I'm attempting to use Flagger on AWS EKS with the Gateway API making use of the AWS Gateway API Controller. I have followed the instructions in the tutorial at: https://docs.flagger.app/tutorials/gatewayapi-progressive-delivery but when triggering a canary deployment Flagger seems to get stuck in a loop of starting the canary deployment, changing the HTTRoute object weightings (in this case 10% to the canary, 90% to the primary) and then restarting the canary deployment, it never fails the canary after reaching the progress deadline timeout. It doesn't even appear to be getting to the rollout stage as the logs don't indicate the webhook ever running, however the pre-rollout check does run and succeed, but then runs again the next time round the loop. As an experiment I also disabled all metric checks as I don't think it is even getting as far as running them. Looking at the traffic weightings in AWS VPC Lattice I can see it alternating between the 90%/10% split and then briefly goes back up to 100%/0% before going back round the loop.

I have also tried setting skipAnalysis to true, which successfully promoted the canary, so the problem seems to be something to do with the analysis stage itself.

My canary configuration is as follows:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  progressDeadlineSeconds: 60
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    port: 9898
    targetPort: 9898
    hosts:
      - flaggertest.k8s.testdomain.uk
    gatewayRefs:
      - name: testgateway
        namespace: test
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    webhooks:
      - name: smoke-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 15s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://podinfo-canary.test:9898/token | grep token"
      - name: load-test-get
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

Flagger logs (in debug mode):

{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"flagger/main.go:149","msg":"Starting flagger version 1.38.0 revision b6ac5e19aa7fa2949bbc8bf37a0f6c1e31b1745d mesh provider gatewayapi:v1beta1"}
{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"flagger/main.go:441","msg":"Connected to Kubernetes API v1.30.6-eks-7f9249a"}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"flagger/main.go:294","msg":"Waiting for canary informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"flagger/main.go:301","msg":"Waiting for metric template informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"flagger/main.go:308","msg":"Waiting for alert provider informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.432Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.447Z","caller":"flagger/main.go:206","msg":"Connected to metrics server http://prometheus-server.flagger-system.svc.cluster.local:80"}
{"level":"debug","ts":"2024-11-27T10:28:51.447Z","caller":"controller/controller.go:99","msg":"Creating event broadcaster"}
{"level":"info","ts":"2024-11-27T10:28:51.447Z","caller":"server/server.go:45","msg":"Starting HTTP server on port 8080"}
{"level":"info","ts":"2024-11-27T10:28:51.448Z","caller":"controller/controller.go:191","msg":"Starting operator"}
{"level":"info","ts":"2024-11-27T10:28:51.448Z","caller":"controller/controller.go:200","msg":"Started operator workers"}
{"level":"info","ts":"2024-11-27T10:28:51.454Z","caller":"controller/controller.go:312","msg":"Synced test/podinfo"}
{"level":"info","ts":"2024-11-27T10:29:01.522Z","caller":"router/gateway_api_v1beta1.go:218","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:01.553Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.553Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Starting canary analysis for podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:01.567Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.567Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Pre-rollout check smoke-test passed"}
{"level":"info","ts":"2024-11-27T10:29:01.595Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.595Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Advance podinfo.test canary weight 10"}
{"level":"info","ts":"2024-11-27T10:29:31.525Z","caller":"router/gateway_api_v1beta1.go:218","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:31.562Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.563Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Starting canary analysis for podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:31.601Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.603Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Pre-rollout check smoke-test passed"}
{"level":"info","ts":"2024-11-27T10:29:31.629Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.629Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Advance podinfo.test canary weight 10"}

Any ideas on what might be going wrong?

To Reproduce

Install AWS Gateway API Controller on Kubernetes: https://www.gateway-api-controller.eks.aws.dev/latest/
Configure Flagger canary to use Gateway API as in above configuration
Trigger a canary deployment by changing the image version number as detailed in the tutorial

Expected behavior

Canary rollout progresses and succeeds

Additional context

Flagger version: 1.38.0
Kubernetes version: 1.30.6-eks-7f9249a
Service Mesh provider: VPC Lattice (via AWS Gateway API controller)
Ingress provider: VPC Lattice (via AWS Gateway API controller)

The text was updated successfully, but these errors were encountered:

stefanprodan · 2024-11-27T10:50:48Z

Can you please try Flagger 1.39, we fixed a drift detection problem for Gateway API

mketh-nhs · 2024-11-27T12:43:36Z

Thanks @stefanprodan. I have now upgraded but unfortunately still seem to have the same issue:

{"level":"info","ts":"2024-11-27T12:26:46.205Z","caller":"flagger/main.go:149","msg":"Starting flagger version 1.39.0 revision 4d497b2a9d2a0726071dd0b16a92f2e63a9130e2 mesh provider gatewayapi:v1beta1"}
{"level":"info","ts":"2024-11-27T12:26:46.205Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T12:26:46.206Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"flagger/main.go:441","msg":"Connected to Kubernetes API v1.30.6-eks-7f9249a"}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"flagger/main.go:294","msg":"Waiting for canary informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.326Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.326Z","caller":"flagger/main.go:301","msg":"Waiting for metric template informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.327Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"flagger/main.go:308","msg":"Waiting for alert provider informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.527Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.537Z","caller":"flagger/main.go:386","msg":"Notifications enabled for https://hooks.slack.com/servic"}
{"level":"info","ts":"2024-11-27T12:26:46.537Z","caller":"server/server.go:45","msg":"Starting HTTP server on port 8080"}
{"level":"info","ts":"2024-11-27T12:26:46.538Z","caller":"controller/controller.go:191","msg":"Starting operator"}
{"level":"info","ts":"2024-11-27T12:26:46.538Z","caller":"controller/controller.go:200","msg":"Started operator workers"}
{"level":"info","ts":"2024-11-27T12:30:22.809Z","caller":"controller/controller.go:312","msg":"Synced test/podinfo"}
{"level":"info","ts":"2024-11-27T12:30:26.566Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo-canary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.585Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.585Z","caller":"controller/events.go:33","msg":"all the metrics providers are available!","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.608Z","caller":"canary/deployment_controller.go:323","msg":"Deployment podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.612Z","caller":"controller/events.go:45","msg":"podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.554Z","caller":"controller/events.go:33","msg":"all the metrics providers are available!","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.581Z","caller":"canary/hpa_reconciler.go:104","msg":"HorizontalPodAutoscaler v2 podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.598Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.598Z","caller":"controller/scheduler.go:257","msg":"Scaling down Deployment podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.626Z","caller":"router/gateway_api_v1beta1.go:164","msg":"HTTPRoute podinfo.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.662Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:32:26.585Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:33:26.586Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:26.592Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.587Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.607Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.623Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.647Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.588Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.611Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.630Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.651Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.598Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.619Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.638Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.662Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}

mketh-nhs · 2024-12-05T13:48:44Z

Any updates on this please @stefanprodan ? It does appear to be a similar bug to the one that was fixed in 1.39 in terms of the behaviour.

stefanprodan · 2024-12-13T08:13:40Z

We can't replicate this behaviour, I think something else is changing the objects while the analysis runs. Can you post here the HTTPRoutes while this happens, run kubectl get httproute -o yaml --show-managed-fields after each step.

mketh-nhs · 2024-12-16T11:46:27Z

Hi @stefanprodan

Here are the HTTPRoute object states at various points during the steps:

Before starting canary:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 1
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:12:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:14:40Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:14:41Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "27216"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 100
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 0
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:14:40Z"
        message: ""
        observedGeneration: 1
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:14:40Z"
        message: ""
        observedGeneration: 1
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

Just after starting canary:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      application-networking.k8s.aws/lattice-assigned-domain-name: podinfo-test-0ae7ff9fdb12b1a95.7d67968.vpc-lattice-svcs.eu-west-2.on.aws
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 2
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:20:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:20:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:application-networking.k8s.aws/lattice-assigned-domain-name: {}
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:20:36Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "29029"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 90
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 10
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:20:34Z"
        message: ""
        observedGeneration: 2
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:20:34Z"
        message: ""
        observedGeneration: 2
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

Point at which it gets reset back to 100% traffic routing:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 17
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:24:06Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:24:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:24:34Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "30348"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 100
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 0
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:24:34Z"
        message: ""
        observedGeneration: 17
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:24:34Z"
        message: ""
        observedGeneration: 17
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

The only other thing I could think of that would be changing it other than Flagger is the AWS Gateway API controller itself (https://www.gateway-api-controller.eks.aws.dev/latest/). Although as far as I can tell that is meant to just apply any changes to the HTTPRoute object to the equivalent routing configuration in VPC Lattice.

Thanks

stefanprodan · 2024-12-16T14:12:22Z

Ok this is clear now, the AWS controller injects an annotation application-networking.k8s.aws/lattice-assigned-domain-name.

stefanprodan mentioned this issue Dec 16, 2024

Preserve HTTPRoute annotations injected by AWS Gateway API #1746

Merged

stefanprodan closed this as completed in #1746 Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop on canary deployment using Gateway API and EKS #1732

Infinite loop on canary deployment using Gateway API and EKS #1732

mketh-nhs commented Nov 27, 2024 •

edited

Loading

stefanprodan commented Nov 27, 2024

mketh-nhs commented Nov 27, 2024

mketh-nhs commented Dec 5, 2024

stefanprodan commented Dec 13, 2024

mketh-nhs commented Dec 16, 2024

stefanprodan commented Dec 16, 2024

Infinite loop on canary deployment using Gateway API and EKS #1732

Infinite loop on canary deployment using Gateway API and EKS #1732

Comments

mketh-nhs commented Nov 27, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

stefanprodan commented Nov 27, 2024

mketh-nhs commented Nov 27, 2024

mketh-nhs commented Dec 5, 2024

stefanprodan commented Dec 13, 2024

mketh-nhs commented Dec 16, 2024

stefanprodan commented Dec 16, 2024

mketh-nhs commented Nov 27, 2024 •

edited

Loading