Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync not working on every subsequent attempt when using PreSync or Sync Resource Hooks #20119

Open
shivvinaykanswal opened this issue Sep 26, 2024 · 6 comments
Labels
bug Something isn't working component:hooks component:sync sync-waves version:2.11 Latest confirmed affected version is 2.11

Comments

@shivvinaykanswal
Copy link

ArgoCD Version: v2.9.3+6eba5be (we have also tries this with v2.11.4+e1284e1)
Templating: Helm

This issue occurs for me when i am using PreSync Hooks.
I have an internal helm repository which is being used with ArgoCD to deploy my applications. We have a requirement where we want to run some init steps(these include updates to db, grafana dashboards, etc.) prior to running the deployment
Hence PreSync Hooks lfor ArgoCD seemed to be the clear choice for us.

Here are the manifests for resources we want to create with Pre-Sync Hook:

  1. SecretProviderClass: This will create a secret object which will have all the envs needed by our application.
  2. PreSync Job: this job will run the init steps(these include updates to db, grafana dashboards, etc.)

Here are the defination for both the objects:-
SecretProviderClass:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/sync-wave: '-2'
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"secrets-store.csi.x-k8s.io/v1","kind":"SecretProviderClass","metadata":{"annotations":{"argocd.argoproj.io/hook":"PreSync","argocd.argoproj.io/sync-wave":"-2"},"labels":{"argocd.argoproj.io/instance":"my-app-name"},"name":"my-app-name-pre-install-secret-class","namespace":"my-namespace"},"spec":{"parameters":{"objects":"-
      objectName: \"/prod/my-app-name/APP_ENV\"\n  objectAlias: APP_ENV\n 
      objectType:
      \"ssmparameter\"\n","region":"ap-south-1"},"provider":"aws","secretObjects":[{"data":[{"key":"APP_ENV","objectName":"APP_ENV"}],"secretName":"my-app-name-pre-install-csi-secret","type":"Opaque"}]}}
  creationTimestamp: '2024-09-26T10:43:17Z'
  generation: 1
  labels:
    argocd.argoproj.io/instance: my-app-name
  name: my-app-name-pre-install-secret-class
  namespace: my-namespace
  resourceVersion: '1832361971'
  uid: f7307c2b-9235-4329-b86b-065cddbf5ac9
spec:
  parameters:
    objects: |
      - objectName: "/prod/my-app-name/APP_ENV"
        objectAlias: APP_ENV
        objectType: "ssmparameter"
    region: ap-south-1
  provider: aws
  secretObjects:
    - data:
        - key: APP_ENV
          objectName: APP_ENV
      secretName: my-app-name-pre-install-csi-secret
      type: Opaque

PreSync Job:

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/sync-wave: '-1'
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{"argocd.argoproj.io/hook":"PreSync","argocd.argoproj.io/sync-wave":"-1"},"labels":{"app":"my-app-name-pre-sync-job","app.kubernetes.io/instance":"my-app-name","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"my-app-name","app.kubernetes.io/version":"1.0.27","argocd.argoproj.io/instance":"my-app-name"},"name":"my-app-name-pre-sync-job","namespace":"my-app-name"},"spec":{"backoffLimit":5,"completions":1,"parallelism":1,"template":{"metadata":{"labels":{"app":"my-app-name-pre-sync-job","app.kubernetes.io/instance":"my-app-name","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"my-app-name","app.kubernetes.io/version":"1.0.27"},"name":"my-app-name-pre-sync-job"},"spec":{"containers":[{"args":["120"],"command":["sleep"],"envFrom":[{"secretRef":{"name":"my-app-name-pre-install-csi-secret"}}],"image":"583463116790.dkr.ecr.us-west-2.amazonaws.com/my-app-name:b15cd7d","imagePullPolicy":"Always","name":"my-app-name-pre-sync-job","volumeMounts":[{"mountPath":"/mnt/secrets-store","name":"my-app-name-pre-install-secret","readOnly":true}]}],"restartPolicy":"Never","serviceAccountName":"my-app-name","volumes":[{"csi":{"driver":"secrets-store.csi.k8s.io","readOnly":true,"volumeAttributes":{"secretProviderClass":"my-app-name-pre-install-secret-class"}},"name":"my-app-name-pre-install-secret"}]}}}}
  creationTimestamp: '2024-09-26T10:43:19Z'
  generation: 1
  labels:
    app: my-app-name-pre-sync-job
    app.kubernetes.io/instance: my-app-name
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: my-app-name
    app.kubernetes.io/version: 1.0.27
    argocd.argoproj.io/instance: my-app-name
  name: my-app-name-pre-sync-job
  namespace: my-namespace
  resourceVersion: '1832368981'
  uid: 55874630-5863-4b24-8300-1ec927dae323
spec:
  backoffLimit: 5
  completionMode: NonIndexed
  completions: 1
  manualSelector: false
  parallelism: 1
  podReplacementPolicy: TerminatingOrFailed
  selector:
    matchLabels:
      batch.kubernetes.io/controller-uid: 55874630-5863-4b24-8300-1ec927dae323
  suspend: false
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: my-app-name-pre-sync-job
        app.kubernetes.io/instance: my-app-name
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: my-app-name
        app.kubernetes.io/version: 1.0.27
        batch.kubernetes.io/controller-uid: 55874630-5863-4b24-8300-1ec927dae323
        batch.kubernetes.io/job-name: my-app-name-pre-sync-job
        controller-uid: 55874630-5863-4b24-8300-1ec927dae323
        job-name: my-app-name-pre-sync-job
      name: my-app-name-pre-sync-job
    spec:
      containers:
        - args:
            - '120'
          command:
            - sleep
          envFrom:
            - secretRef:
                name: my-app-name-pre-install-csi-secret
          image: '583463116790.dkr.ecr.us-west-2.amazonaws.com/my-app-name:b15cd7d'
          imagePullPolicy: Always
          name: my-app-name-pre-sync-job
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /mnt/secrets-store
              name: my-app-name-pre-install-secret
              readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: my-app-name
      serviceAccountName: my-app-name
      terminationGracePeriodSeconds: 30
      volumes:
        - csi:
            driver: secrets-store.csi.k8s.io
            readOnly: true
            volumeAttributes:
              secretProviderClass: my-app-name-pre-install-secret-class
          name: my-app-name-pre-install-secret
status:
  completionTime: '2024-09-26T10:45:25Z'
  conditions:
    - lastProbeTime: '2024-09-26T10:45:25Z'
      lastTransitionTime: '2024-09-26T10:45:25Z'
      status: 'True'
      type: Complete
  ready: 0
  startTime: '2024-09-26T10:43:19Z'
  succeeded: 1
  terminating: 0
  uncountedTerminatedPods: {}

The application.yaml looks like this:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-app-name
  namespace: argocd
spec:
  destination:
    namespace: my-app-name
    server: 'https://kubernetes.default.svc'
  source:
    repoURL: 'https://github.com/my-github-repo/my-app-name.git'
    targetRevision: ${CURRENT_BRANCH}
    path: helm-files/prod
    helm:
      valueFiles:
        - values.yaml
      parameters:
        - name: 'my-app-name.statefulset.image.tag'
          value: '${IMAGE_TAG}'
  project: default
operation:
  initiatedBy:
    username: ${USER}
  sync:
    syncStrategy:
      hook: {}

Now for the first time i run a sync the application syncs correctly and all the resources are created.
image

SecretProviderClass Object:
image

we haven't specified hook deletion policy, hence it should default to BeforeHookCreation as specified in the documentation.
For the second time when sync is run, the first pre-sync object gets deleted, and sync gets stuck.
At this moment, the last sync window shows sync successful, while sync status shows out of sync.
image

SecretProviderClass Object:
image

So the first pre-sync object is never recreated on the second run and the entire workflow comes to a hung state

This is happening to us every alternative sync.

Expected behavior
Sync Should work every time. All objects with annotation pre-sync should be recreated and deployment should not hang.

Version

v2.9.3+6eba5be
@reggie-k
Copy link
Member

Hi, we are talking about full syncs only, right? (Selective syncs do not trigger hooks)

@shivvinaykanswal
Copy link
Author

shivvinaykanswal commented Sep 30, 2024

@reggie-k yes, this is happening for full syncs

@acelinkio
Copy link

Copy/Pasting a similar issue referenced in #16835

Also facing a similar issue. The helm chart version is the same between releases, however updating the container images inside is not properly triggering the Helm hooks as expected. In order for the helm hooks to actually run, we have to manually execute the sync despite having autosync enabled.

Greatly appreciate if anyone has additional information on how to overcome this issue. Thanks!

@shivvinaykanswal
Copy link
Author

In our case we observe the issue only with PreSync Hooks and when using "argocd.argoproj.io/hook-delete-policy": "BeforeHookCreation"
At the moment we are working with hook-delete-policy: HookSucceeded
And we are creating a Job with the PreSync Workflow.
There are certain limitations we are facing with this case too:

  1. Since PreSync Resources are deleted once the PreSync completes, we are unable to retain the events and logs for the Job on ArgoCD
  2. If the Job Fails, Any new Sync also fails. This happens because any new sync will try to patch existing Job, and for Job spec.template is immutable. And this cannot be avoided as older job still exists because its not deleted before new creation or on failure.
    Hence from our POC till now, BeforeHookCreation is the most suitable hook-delete-policy for us but it is giving us the aforementioned issue.
    Is this a solved problem, or is there a workaround for this issue that we might be missing.

@acelinkio
Copy link

acelinkio commented Oct 17, 2024

Hey @shivvinaykanswal

I am interested in exploring the workaround you mentioned. This sounds like there is some logic in ArgoCD that is associating a previous job with a current sync and skipping over.

To address the shortcomings of that workaround:
*1 could be handled via shipping logs with something like fluentbit and relying upon an external system.
*2 could potentially be handled by setting the job TTL to a very short time.

Definitely not ideal user experience but something I'll be checking out.

@andrii-korotkov-verkada andrii-korotkov-verkada added the version:2.11 Latest confirmed affected version is 2.11 label Nov 11, 2024
@keyolk
Copy link

keyolk commented Nov 27, 2024

I got a similar issue with PreSync Job and ExternalSecret. In addition, ArgoCD keeps deleting the job pod, before it is finished.
I know it is somewhat weird but it has been resolved after reducing the application controller replicas 2 -> 1.
How was your case? Are you using multiple application controller replicas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component:hooks component:sync sync-waves version:2.11 Latest confirmed affected version is 2.11
Projects
None yet
Development

No branches or pull requests

6 participants