Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Refrences Triggers Drift for Gitops reconcilers #2361

Open
jaguer0 opened this issue Feb 26, 2025 · 1 comment
Open

Using Refrences Triggers Drift for Gitops reconcilers #2361

jaguer0 opened this issue Feb 26, 2025 · 1 comment
Labels
area/resource-references Issues or PRs related to resource references kind/bug Categorizes issue or PR as related to a bug.

Comments

@jaguer0
Copy link

jaguer0 commented Feb 26, 2025

Describe the bug
When using Flux CD, the first apply is successful, and the ECS ACK controller correctly updates references (e.g., taskDefinitionRef and targetGroupRef). However, after Flux performs an SSA (Server side Apply) reconcile, it updates these references back to match the Git repository state. This causes a difference between the current ECS manifest and the Git state, triggering an unnecessary ECS deployment.

I have already tried setting Flux’s kustomize.toolkit.fluxcd.io/ssa: merge annotation, but the issue persists.

Current Behavior:
Flux reconciles successfully, but an unnecessary redeployment occurs when the AWS ACK controller reconciles. This happens due to the controller’s default reconciliation interval of 10 hours. As a result, ECS tasks are redeployed multiple times per day even when no changes have been made.

Steps to Reproduce:

  1. Set up Flux CD with the ECS ACK controller.
  2. Apply the initial configuration (first apply works fine).
  3. Wait for Flux to perform a reconcile.
  4. The ECS task redeploys unnecessarily after the AWS ACK controller reconciles, which happens every 10 hours by default.

Added context, I shortened my Flux reconciler to every 5 minutes during debug and it seems like the ecs controller is seeing the changes / diffs but not acting upon them which I think is expected because the ECS reached a healthy state, so the extra deployment seems to occur when ack reconciler kicks in.

also seems to appear with other references like elbv2.services.k8s.aws/v1alpha1 Rule , when using targetGroupRef so seems to be references in general as it updates the manifest

Steps to reproduce

apiVersion: ecs.services.k8s.aws/v1alpha1
kind: Service
metadata:
  name: foo-bar
spec:
  name: foo-bar
  capacityProviderStrategy:
  - base: 0
    capacityProvider: FARGATE
    weight: 1
  cluster: staging
  deploymentConfiguration:
    alarms:
      alarmNames:
      - none
      enable: false
      rollback: false
    deploymentCircuitBreaker:
      enable: true
      rollback: true
    maximumPercent: 200
    minimumHealthyPercent: 100
  deploymentController:
    type: ECS
  desiredCount: 1
  enableECSManagedTags: true
  enableExecuteCommand: false
  healthCheckGracePeriodSeconds: 0
  loadBalancers:
  - containerName: foo-bar
    containerPort: 8080
    targetGroupRef:
      from:
        name: foo-bar-tg-staging
  networkConfiguration:
    awsVPCConfiguration:
      assignPublicIP: DISABLED
      securityGroups:
      - sg-xxxxx
      subnets:
      - sg-xxxxxx
      - sg-xxxxxx
  platformVersion: 1.4.0
  propagateTags: NONE
  schedulingStrategy: REPLICA
  taskDefinitionRef:
    from:
      name: foo-bar-staging

As Flux reconciles, you see this diff

{
  "level": "info",
  "ts": "2025-02-26T14:21:05.618Z",
  "logger": "ackrt",
  "msg": "desired resource state has changed",
  "kind": "Service",
  "namespace": "foo-bar",
  "name": "foo-bar-staging",
  "account": "xxxxxx",
  "role": "",
  "region": "us-west-2",
  "is_adopted": false,
  "generation": 4134,
  "diff": [
    {
      "Path": {
        "Parts": [
          "Spec",
          "LoadBalancers"
        ]
      },
      "A": [
        {
          "containerName": "foo-bar",
          "containerPort": 8080,
          "targetGroupARN": "arn:aws:elasticloadbalancing:us-west-2:xxxxxx:targetgroup/foo-bar/cbeebab4exxxx",
          "targetGroupRef": {
            "from": {
              "name": "foo-bar-tg-staging"
            }
          }
        }
      ],
      "B": [
        {
          "containerName": "foo-bar",
          "containerPort": 8080,
          "targetGroupARN": "arn:aws:elasticloadbalancing:us-west-2:xxxxxx:targetgroup/foo-bar/cbeebab4exxxx"
        }
      ]
    },
    {
      "Path": {
        "Parts": [
          "Spec",
          "TaskDefinition"
        ]
      },
      "A": "foo-bar-staging",
      "B": "arn:aws:ecs:us-west-2:xxxxxx:task-definition/foo-bar-staging:32"
    }
  ]
}

Expected outcome
The controller should either not detect any differences or should properly work with the Flux reconciler to prevent unwanted updates.
The ECS task should not be redeployed unless actual changes have occurred.

Environment

  • Using EKS: Yes v1.29.13-eks-8cce635
  • AWS service targeted: ECS
@michaelhtm michaelhtm added kind/bug Categorizes issue or PR as related to a bug. area/resource-references Issues or PRs related to resource references labels Feb 26, 2025
@jaguer0
Copy link
Author

jaguer0 commented Feb 27, 2025

Adding a Flux Diff output

.. Kustomization diffing...: processing inventory
✓  Kustomization diffing...
► Rule/foo-bar/foo-bar-rule-staging drifted

metadata.generation
  ± value change
    - 4610
    + 4611

spec.actions.0
  - two map entries removed:
    forwardConfig:
      targetGroupStickinessConfig:
        enabled: false
      targetGroups:
      - targetGroupARN: "arn:aws:elasticloadbalancing:us-west-2:xxxxxx:targetgroup/foo-bar/cbeebab4exxxx"
        weight: 1
    targetGroupARN: "arn:aws:elasticloadbalancing:us-west-2:xxxxxx:targetgroup/foo-bar/cbeebab4exxxx"
    
  
  + one map entry added:
    targetGroupRef:
      from:
        name: foo-bar-tg-staging
  

spec.conditions.0
  - one map entry removed:
    values:
    - foo.bar.net

► Service/foo-bar/foo-bar-staging drifted

metadata.generation
  ± value change
    - 4607
    + 4608

spec.loadBalancers.0
  - one map entry removed:
    targetGroupARN: "arn:aws:elasticloadbalancing:us-west-2:xxxxxx:targetgroup/foo-bar/cbeebab4exxxx"
    
  
  + one map entry added:
    targetGroupRef:
      from:
        name: foo-bar-tg-staging
  

here's a managed fields output

apiVersion: ecs.services.k8s.aws/v1alpha1
kind: Service
metadata:
  annotations:
    kustomize.toolkit.fluxcd.io/ssa: merge
  creationTimestamp: "2025-02-26T22:47:20Z"
  finalizers:
  - finalizers.ecs.services.k8s.aws/Service
  generation: 4610
  labels:
    build_dir: base
    env: staging
    kustomize.toolkit.fluxcd.io/name: foo-bar-staging
    kustomize.toolkit.fluxcd.io/namespace: foo-bar
  managedFields:
  - apiVersion: ecs.services.k8s.aws/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:kustomize.toolkit.fluxcd.io/ssa: {}
        f:labels:
          f:build_dir: {}
          f:env: {}
          f:kustomize.toolkit.fluxcd.io/name: {}
          f:kustomize.toolkit.fluxcd.io/namespace: {}
      f:spec:
        f:capacityProviderStrategy: {}
        f:cluster: {}
        f:deploymentConfiguration:
          f:alarms:
            f:alarmNames: {}
            f:enable: {}
            f:rollback: {}
          f:deploymentCircuitBreaker:
            f:enable: {}
            f:rollback: {}
          f:maximumPercent: {}
          f:minimumHealthyPercent: {}
        f:deploymentController:
          f:type: {}
        f:desiredCount: {}
        f:enableECSManagedTags: {}
        f:enableExecuteCommand: {}
        f:healthCheckGracePeriodSeconds: {}
        f:name: {}
        f:networkConfiguration:
          f:awsVPCConfiguration:
            f:assignPublicIP: {}
            f:securityGroups: {}
            f:subnets: {}
        f:platformVersion: {}
        f:propagateTags: {}
        f:schedulingStrategy: {}
        f:taskDefinitionRef:
          f:from:
            f:name: {}
    manager: kustomize-controller
    operation: Apply
    time: "2025-02-27T10:02:14Z"
  - apiVersion: ecs.services.k8s.aws/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"finalizers.ecs.services.k8s.aws/Service": {}
      f:spec:
        f:loadBalancers: {}
    manager: controller
    operation: Update
    time: "2025-02-27T10:02:15Z"
  - apiVersion: ecs.services.k8s.aws/v1alpha1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        .: {}
        f:ackResourceMetadata:
          .: {}
          f:arn: {}
          f:ownerAccountID: {}
          f:region: {}
        f:clusterARN: {}
        f:conditions: {}
        f:createdAt: {}
        f:createdBy: {}
        f:deployments: {}
        f:events: {}
        f:pendingCount: {}
        f:platformFamily: {}
        f:roleARN: {}
        f:runningCount: {}
        f:status: {}
    manager: controller
    operation: Update
    subresource: status
    time: "2025-02-27T10:02:15Z"
  name: foo-bar-staging
  namespace: foo-bar
  resourceVersion: "441679050"
  uid: 9dcd800f-0b3f-4b9e-8d05-0368b10d815e

I noticed the diff output has one thing in common, they're both using ack Refrences in nested objects
ex.
Service

  loadBalancers:
  - containerName: foo-bar
    containerPort: 8080
    targetGroupRef:
      from:
        name: foo-bar-tg-staging

elbv2 Rule noted in the diff

  actions:
  - type: "forward"
    targetGroupRef:
      from:
        name: foo-bar-tg-staging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/resource-references Issues or PRs related to resource references kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants