Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop on canary deployment using Gateway API and EKS #1732

Closed
mketh-nhs opened this issue Nov 27, 2024 · 6 comments · Fixed by #1746
Closed

Infinite loop on canary deployment using Gateway API and EKS #1732

mketh-nhs opened this issue Nov 27, 2024 · 6 comments · Fixed by #1746

Comments

@mketh-nhs
Copy link

mketh-nhs commented Nov 27, 2024

Describe the bug

I'm attempting to use Flagger on AWS EKS with the Gateway API making use of the AWS Gateway API Controller. I have followed the instructions in the tutorial at: https://docs.flagger.app/tutorials/gatewayapi-progressive-delivery but when triggering a canary deployment Flagger seems to get stuck in a loop of starting the canary deployment, changing the HTTRoute object weightings (in this case 10% to the canary, 90% to the primary) and then restarting the canary deployment, it never fails the canary after reaching the progress deadline timeout. It doesn't even appear to be getting to the rollout stage as the logs don't indicate the webhook ever running, however the pre-rollout check does run and succeed, but then runs again the next time round the loop. As an experiment I also disabled all metric checks as I don't think it is even getting as far as running them. Looking at the traffic weightings in AWS VPC Lattice I can see it alternating between the 90%/10% split and then briefly goes back up to 100%/0% before going back round the loop.

I have also tried setting skipAnalysis to true, which successfully promoted the canary, so the problem seems to be something to do with the analysis stage itself.

My canary configuration is as follows:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  progressDeadlineSeconds: 60
  autoscalerRef:
    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    port: 9898
    targetPort: 9898
    hosts:
      - flaggertest.k8s.testdomain.uk
    gatewayRefs:
      - name: testgateway
        namespace: test
  analysis:
    interval: 1m
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    webhooks:
      - name: smoke-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 15s
        metadata:
          type: bash
          cmd: "curl -sd 'test' http://podinfo-canary.test:9898/token | grep token"
      - name: load-test-get
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: cmd
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo-canary.test:9898/"

Flagger logs (in debug mode):

{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"flagger/main.go:149","msg":"Starting flagger version 1.38.0 revision b6ac5e19aa7fa2949bbc8bf37a0f6c1e31b1745d mesh provider gatewayapi:v1beta1"}
{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T10:28:51.097Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"flagger/main.go:441","msg":"Connected to Kubernetes API v1.30.6-eks-7f9249a"}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"flagger/main.go:294","msg":"Waiting for canary informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.130Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"flagger/main.go:301","msg":"Waiting for metric template informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.230Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"flagger/main.go:308","msg":"Waiting for alert provider informer cache to sync"}
{"level":"info","ts":"2024-11-27T10:28:51.331Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.432Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T10:28:51.447Z","caller":"flagger/main.go:206","msg":"Connected to metrics server http://prometheus-server.flagger-system.svc.cluster.local:80"}
{"level":"debug","ts":"2024-11-27T10:28:51.447Z","caller":"controller/controller.go:99","msg":"Creating event broadcaster"}
{"level":"info","ts":"2024-11-27T10:28:51.447Z","caller":"server/server.go:45","msg":"Starting HTTP server on port 8080"}
{"level":"info","ts":"2024-11-27T10:28:51.448Z","caller":"controller/controller.go:191","msg":"Starting operator"}
{"level":"info","ts":"2024-11-27T10:28:51.448Z","caller":"controller/controller.go:200","msg":"Started operator workers"}
{"level":"info","ts":"2024-11-27T10:28:51.454Z","caller":"controller/controller.go:312","msg":"Synced test/podinfo"}
{"level":"info","ts":"2024-11-27T10:29:01.522Z","caller":"router/gateway_api_v1beta1.go:218","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:01.553Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.553Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Starting canary analysis for podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:01.567Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.567Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Pre-rollout check smoke-test passed"}
{"level":"info","ts":"2024-11-27T10:29:01.595Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:01.595Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456463\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Advance podinfo.test canary weight 10"}
{"level":"info","ts":"2024-11-27T10:29:31.525Z","caller":"router/gateway_api_v1beta1.go:218","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:31.562Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.563Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Starting canary analysis for podinfo.test"}
{"level":"info","ts":"2024-11-27T10:29:31.601Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.603Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Pre-rollout check smoke-test passed"}
{"level":"info","ts":"2024-11-27T10:29:31.629Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"debug","ts":"2024-11-27T10:29:31.629Z","logger":"event-broadcaster","caller":"record/event.go:377","msg":"Event(v1.ObjectReference{Kind:\"Canary\", Namespace:\"test\", Name:\"podinfo\", UID:\"e138fd88-a30e-4e98-ba2d-86b9214f3e5f\", APIVersion:\"flagger.app/v1beta1\", ResourceVersion:\"2456677\", FieldPath:\"\"}): type: 'Normal' reason: 'Synced' Advance podinfo.test canary weight 10"}

Any ideas on what might be going wrong?

To Reproduce

Expected behavior

Canary rollout progresses and succeeds

Additional context

  • Flagger version: 1.38.0
  • Kubernetes version: 1.30.6-eks-7f9249a
  • Service Mesh provider: VPC Lattice (via AWS Gateway API controller)
  • Ingress provider: VPC Lattice (via AWS Gateway API controller)
@stefanprodan
Copy link
Member

Can you please try Flagger 1.39, we fixed a drift detection problem for Gateway API

@mketh-nhs
Copy link
Author

Thanks @stefanprodan. I have now upgraded but unfortunately still seem to have the same issue:

{"level":"info","ts":"2024-11-27T12:26:46.205Z","caller":"flagger/main.go:149","msg":"Starting flagger version 1.39.0 revision 4d497b2a9d2a0726071dd0b16a92f2e63a9130e2 mesh provider gatewayapi:v1beta1"}
{"level":"info","ts":"2024-11-27T12:26:46.205Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T12:26:46.206Z","caller":"clientcmd/client_config.go:659","msg":"Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work."}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"flagger/main.go:441","msg":"Connected to Kubernetes API v1.30.6-eks-7f9249a"}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"flagger/main.go:294","msg":"Waiting for canary informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.225Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.326Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.326Z","caller":"flagger/main.go:301","msg":"Waiting for metric template informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.327Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"flagger/main.go:308","msg":"Waiting for alert provider informer cache to sync"}
{"level":"info","ts":"2024-11-27T12:26:46.427Z","caller":"cache/shared_informer.go:313","msg":"Waiting for caches to sync for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.527Z","caller":"cache/shared_informer.go:320","msg":"Caches are synced for flagger"}
{"level":"info","ts":"2024-11-27T12:26:46.537Z","caller":"flagger/main.go:386","msg":"Notifications enabled for https://hooks.slack.com/servic"}
{"level":"info","ts":"2024-11-27T12:26:46.537Z","caller":"server/server.go:45","msg":"Starting HTTP server on port 8080"}
{"level":"info","ts":"2024-11-27T12:26:46.538Z","caller":"controller/controller.go:191","msg":"Starting operator"}
{"level":"info","ts":"2024-11-27T12:26:46.538Z","caller":"controller/controller.go:200","msg":"Started operator workers"}
{"level":"info","ts":"2024-11-27T12:30:22.809Z","caller":"controller/controller.go:312","msg":"Synced test/podinfo"}
{"level":"info","ts":"2024-11-27T12:30:26.566Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo-canary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.585Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.585Z","caller":"controller/events.go:33","msg":"all the metrics providers are available!","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.608Z","caller":"canary/deployment_controller.go:323","msg":"Deployment podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:26.612Z","caller":"controller/events.go:45","msg":"podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less than desired generation","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.554Z","caller":"controller/events.go:33","msg":"all the metrics providers are available!","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.581Z","caller":"canary/hpa_reconciler.go:104","msg":"HorizontalPodAutoscaler v2 podinfo-primary.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.598Z","caller":"router/kubernetes_default.go:175","msg":"Service podinfo.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.598Z","caller":"controller/scheduler.go:257","msg":"Scaling down Deployment podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.626Z","caller":"router/gateway_api_v1beta1.go:164","msg":"HTTPRoute podinfo.test created","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:30:56.662Z","caller":"controller/events.go:33","msg":"Initialization done! podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:32:26.585Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:33:26.586Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:26.592Z","caller":"controller/events.go:33","msg":"New revision detected! Scaling up podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.587Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.607Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.623Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:35:56.647Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.588Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.611Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.630Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:26.651Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.598Z","caller":"router/gateway_api_v1beta1.go:220","msg":"HTTPRoute podinfo.test updated","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.619Z","caller":"controller/events.go:33","msg":"Starting canary analysis for podinfo.test","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.638Z","caller":"controller/events.go:33","msg":"Pre-rollout check smoke-test passed","canary":"podinfo.test"}
{"level":"info","ts":"2024-11-27T12:36:56.662Z","caller":"controller/events.go:33","msg":"Advance podinfo.test canary weight 10","canary":"podinfo.test"}

@mketh-nhs
Copy link
Author

Any updates on this please @stefanprodan ? It does appear to be a similar bug to the one that was fixed in 1.39 in terms of the behaviour.

@stefanprodan
Copy link
Member

We can't replicate this behaviour, I think something else is changing the objects while the analysis runs. Can you post here the HTTPRoutes while this happens, run kubectl get httproute -o yaml --show-managed-fields after each step.

@mketh-nhs
Copy link
Author

Hi @stefanprodan

Here are the HTTPRoute object states at various points during the steps:

Before starting canary:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 1
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:12:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:14:40Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:14:41Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "27216"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 100
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 0
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:14:40Z"
        message: ""
        observedGeneration: 1
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:14:40Z"
        message: ""
        observedGeneration: 1
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

Just after starting canary:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      application-networking.k8s.aws/lattice-assigned-domain-name: podinfo-test-0ae7ff9fdb12b1a95.7d67968.vpc-lattice-svcs.eu-west-2.on.aws
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 2
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:20:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:20:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:application-networking.k8s.aws/lattice-assigned-domain-name: {}
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:20:36Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "29029"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 90
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 10
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:20:34Z"
        message: ""
        observedGeneration: 2
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:20:34Z"
        message: ""
        observedGeneration: 2
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

Point at which it gets reset back to 100% traffic routing:

apiVersion: v1
items:
- apiVersion: gateway.networking.k8s.io/v1beta1
  kind: HTTPRoute
  metadata:
    annotations:
      helm.toolkit.fluxcd.io/driftDetection: disabled
      kustomize.toolkit.fluxcd.io/reconcile: disabled
    creationTimestamp: "2024-12-16T11:12:34Z"
    finalizers:
    - httproute.k8s.aws/resources
    generation: 17
    managedFields:
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            .: {}
            v:"httproute.k8s.aws/resources": {}
      manager: manager
      operation: Update
      time: "2024-12-16T11:24:06Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:helm.toolkit.fluxcd.io/driftDetection: {}
            f:kustomize.toolkit.fluxcd.io/reconcile: {}
          f:ownerReferences:
            .: {}
            k:{"uid":"705f474d-a479-45b2-812a-bf1e0e06582e"}: {}
        f:spec:
          .: {}
          f:hostnames: {}
          f:parentRefs: {}
          f:rules: {}
      manager: flagger
      operation: Update
      time: "2024-12-16T11:24:34Z"
    - apiVersion: gateway.networking.k8s.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:parents: {}
      manager: manager
      operation: Update
      subresource: status
      time: "2024-12-16T11:24:34Z"
    name: podinfo
    namespace: test
    ownerReferences:
    - apiVersion: flagger.app/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Canary
      name: podinfo
      uid: 705f474d-a479-45b2-812a-bf1e0e06582e
    resourceVersion: "30348"
    uid: 670a2dd5-ee33-41da-a98d-c14d6d4f9a7d
  spec:
    hostnames:
    - flaggertest.k8s.testing.uk
    parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: test-gateway
      namespace: flagger-system
    rules:
    - backendRefs:
      - group: ""
        kind: Service
        name: podinfo-primary
        port: 9898
        weight: 100
      - group: ""
        kind: Service
        name: podinfo-canary
        port: 9898
        weight: 0
      matches:
      - path:
          type: PathPrefix
          value: /
  status:
    parents:
    - conditions:
      - lastTransitionTime: "2024-12-16T11:24:34Z"
        message: ""
        observedGeneration: 17
        reason: Accepted
        status: "True"
        type: Accepted
      - lastTransitionTime: "2024-12-16T11:24:34Z"
        message: ""
        observedGeneration: 17
        reason: ResolvedRefs
        status: "True"
        type: ResolvedRefs
      controllerName: application-networking.k8s.aws/gateway-api-controller
      parentRef:
        group: gateway.networking.k8s.io
        kind: Gateway
        name: test-gateway
        namespace: flagger-system
kind: List
metadata:
  resourceVersion: ""

The only other thing I could think of that would be changing it other than Flagger is the AWS Gateway API controller itself (https://www.gateway-api-controller.eks.aws.dev/latest/). Although as far as I can tell that is meant to just apply any changes to the HTTPRoute object to the equivalent routing configuration in VPC Lattice.

Thanks

@stefanprodan
Copy link
Member

Ok this is clear now, the AWS controller injects an annotation application-networking.k8s.aws/lattice-assigned-domain-name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants