Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8sClient get resources across all namespaces (#601) #854

Conversation

jortkoopmans
Copy link
Contributor

@jortkoopmans jortkoopmans commented Sep 2, 2024

Work in progress. Updating Applications using this branch works for me, when the RBAC permissions are configured. (#601)

However, there are several considerations;

  • Clearly the original code intents to call these functions per namespace (this is clear from the tests)
  • The tests have not been fixed (yet)
  • There could be performance issues by listing/filtering all Applications across namespaces (on the other hand, traditionally all Applications were in the same namespace)
  • Alternatively an array variable could be introduced, listing namespaces to be monitored instead. These could be looped through
  • Alternatively (again), a configuration variable could be introduced, to switch from single namespace to all namespaces (explicitly for the k8sClient).
  • Since the argocd api mechanism does not have this distinction, it might make sense to keep the design similar.

@wd
Copy link

wd commented Sep 3, 2024

Great! Maybe you can consider reusing the sourceNamespace settings from ArgoCD
and they recently supported regex in the value. https://github.com/argoproj/argo-cd/pull/19017/files

@jortkoopmans
Copy link
Contributor Author

Great! Maybe you can consider reusing the sourceNamespace settings from ArgoCD and they recently supported regex in the value. https://github.com/argoproj/argo-cd/pull/19017/files

It does make sense to wanting to align the functionality to ArgoCD itself. AIU is effectively following that featureset, as we're seeing here with the 'app in any namespace' feature.
Code-wise, it would be better to share specific modules (e.g. regex.go), instead of duplicating them manually.

If we go for this approach, a fair amount of change is needed;

  • Get namespaces and filter these using a regex (also include RBAC for ns)
  • Get Applications in these namespaces (and keep that relation)
  • Modify existing Application update/patch functions to use these App + namespace combinations

Copy link
Contributor

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortkoopmans @wd I was thinking about keeping this PR focused on fixing the currently broken apps-in-any-namespace feature and adding wildcard support in a separate PR. WDYT?

I reviewed and tested the PR in regards to fixing the currently broken apps-in-any-namespace feature and changes look good. However, the unit tests still need to be fixed.

@wd
Copy link

wd commented Sep 6, 2024

@ishitasequeira I'm good with that. I'm just trying to bring information from Argo here.

@ishitasequeira
Copy link
Contributor

@wd, It's a good callout for sure and something which can be looked into as a next step forward for the feature.

@jortkoopmans, let me know if you need any help in fixing the unit tests.

- Modify ks8Client functions to always get Application resources across all namespaces
- Add required RBAC permissions

Signed-off-by: Jort Koopmans <jort.koopmans@entrnce.com>
@jortkoopmans jortkoopmans force-pushed the bugfix/601_k8sclient_all_namespaces branch from 6c9e2ee to 4902053 Compare September 16, 2024 07:28
@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.94%. Comparing base (5403b3e) to head (fd619ae).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #854      +/-   ##
==========================================
+ Coverage   75.53%   75.94%   +0.41%     
==========================================
  Files          31       31              
  Lines        3151     3184      +33     
==========================================
+ Hits         2380     2418      +38     
+ Misses        636      633       -3     
+ Partials      135      133       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Fix UpdateSpec to handle partial updates without specified appNamespace
Fix tests to work with Applications across namespaces

Signed-off-by: Jort Koopmans <jort.koopmans@entrnce.com>
@jortkoopmans jortkoopmans force-pushed the bugfix/601_k8sclient_all_namespaces branch from 2748faa to d440337 Compare September 17, 2024 17:08
- Error wrapping for improved reporting in higher-lvl code
- Change UpdateSpec retry to stop trying, with exponential backoff
- Add and enhance tests. Improve code coverage.

Signed-off-by: Jort Koopmans <jort.koopmans@entrnce.com>
@jortkoopmans
Copy link
Contributor Author

Thank you for the feedback and guidance @wd @ishitasequeira .
I have reviewed the fix with the scope to get AIU to work across namespaces (according to guidance).

It turned out that some of the functions strictly require a namespace to be provided, I have modified several functions to deal with this correctly (and introduce some helper functions). Specifically:

  • GetApplication needs a namespace for the Application, I think it's defined here: argo-cd
  • UpdateSpec can only work when it can match the UpdateSpec with a unique Application. Note that the UpdateSpec does not have to include the namespace.

For my use case, I only need AIU to monitor the Applications (ListApplications) across namespaces, since I use it exclusively to overwrite the sha256 image hashes (and not Get or Update the Application spec). This is probably why it worked for me previously (?).

Subsequently I refactored and extended some of the tests. But feel free to amend or change this. Lastly, while testing I noticed that retrying on conflict is perpetual, I implemented maxRetries and exponential backoff to resolve that.

@jortkoopmans jortkoopmans marked this pull request as ready for review September 18, 2024 14:32
@jortkoopmans jortkoopmans changed the title k8sClient get resources across all namespaces. WIP (#601) k8sClient get resources across all namespaces (#601) Sep 18, 2024
@jortkoopmans
Copy link
Contributor Author

I see that we have related efforts between this PR and #831 , based on different solution directions. Just noting it here so it can be taken into account.

Copy link
Contributor

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortkoopmans left some comments

@jannfis @chengfang @pasha-codefresh any concerns on this approach?

}

// Retrieve the application in the specified namespace
return client.kubeClient.ApplicationsClientset.ArgoprojV1alpha1().Applications(app.Namespace).Get(ctx, app.Name, v1.GetOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to retrieve the application again, as it should be already retrieved as part of appList, err := client.ListApplications("") ?

Comment on lines +89 to +94
if overrideRetries, ok := os.LookupEnv("OVERRIDE_MAX_RETRIES"); ok {
var retries int
if _, err := fmt.Sscanf(overrideRetries, "%d", &retries); err == nil {
maxRetries = retries
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to move this logic to env.go and follow the same pattern as this to read numeric values.

Copy link
Contributor

@ishitasequeira ishitasequeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jortkoopmans did you get a chance to look at the comments?

@chengfang chengfang merged commit ad9648f into argoproj-labs:master Oct 14, 2024
10 checks passed
ishitasequeira added a commit to ishitasequeira/argocd-image-updater that referenced this pull request Oct 14, 2024
Signed-off-by: Ishita Sequeira <ishiseq29@gmail.com>
chengfang pushed a commit that referenced this pull request Oct 14, 2024
Signed-off-by: Ishita Sequeira <ishiseq29@gmail.com>
@jortkoopmans
Copy link
Contributor Author

Sorry for my late response, but thank you @ishitasequeira and @chengfang for pushing this forward.
While I did read the provided comments, I should have probably just posted a response here as I hit a snag during testing of my own version on this branch, which left me a bit baffled.
Now my use case is perhaps a bit exotic, but what I'm seeing is that this version is seemingly setting the parameter, but in reality it doesn't get persisted. Strangely enough, the incomplete version at this commit, which fails the tests, does work for me (and logging is exactly the same).

It could be that I need to modify some of the ignore rules for argo-cd itself to avoid fights between them, but previously it worked.

Outlining my configuration just for clarity:

  • Using app-in-app pattern with the leaf apps having a helm chart
  • Using a tag parameter on the helm chart
  • Wanting to override that tag parameter with a sha256 tag in digest mode
  • My images are on AWS ECR (but it shouldn't matter)

Logs of 2 (redacted) update cycles (this goes on indefinitely):

time="2024-10-14T19:09:36Z" level=info msg="argocd-image-updater v99.9.9+02eee1d starting [loglevel:INFO, interval:2m0s, healthport:8080]"
time="2024-10-14T19:09:36Z" level=warning msg="commit message template at /app/config/commit.template does not exist, using default"
time="2024-10-14T19:09:36Z" level=info msg="Loaded 1 registry configurations from /app/config/registries.conf"
time="2024-10-14T19:09:36Z" level=info msg="ArgoCD configuration: [apiKind=kubernetes, server=argocd-server.argo-cd, auth_token=false, insecure=false, grpc_web=false, plaintext=false]"
time="2024-10-14T19:09:36Z" level=info msg="Starting health probe server TCP port=8080"
time="2024-10-14T19:09:36Z" level=info msg="Starting metrics server on TCP port=8081"
time="2024-10-14T19:09:36Z" level=info msg="Warming up image cache"
time="2024-10-14T19:09:37Z" level=info msg=/scripts/ecr.sh dir= execID=15aa1
time="2024-10-14T19:10:47Z" level=info msg="Finished cache warm-up, pre-loaded 78 meta data entries from 2 registries"
time="2024-10-14T19:10:47Z" level=info msg="Starting image update cycle, considering 92 annotated application(s) for update"
time="2024-10-14T19:10:56Z" level=info msg="Processing results: applications=92 images_considered=92 images_skipped=0 images_updated=0 errors=0"
time="2024-10-14T19:12:57Z" level=info msg="Starting image update cycle, considering 92 annotated application(s) for update"
time="2024-10-14T19:13:00Z" level=info msg="Setting new image to 123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@sha256:89c99fabd5b8cd1210c64e67e65bad6c1666b362a409958b199acb43c6db8c62" alias=my-argo-application application=my-argo-application image_name=ecrrepo/my-app-image image_tag=dummy registry=123456780912.dkr.ecr.us-east-1.amazonaws.com
time="2024-10-14T19:13:00Z" level=info msg="Successfully updated image '123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@dummy' to '123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@sha256:89c99fabd5b8cd1210c64e67e65bad6c1666b362a409958b199acb43c6db8c62', but pending spec update (dry run=false)" alias=my-argo-application application=my-argo-application image_name=ecrrepo/my-app-image image_tag=dummy registry=123456780912.dkr.ecr.us-east-1.amazonaws.com
time="2024-10-14T19:13:00Z" level=info msg="Committing 1 parameter update(s) for application my-argo-application" application=my-argo-application
time="2024-10-14T19:13:00Z" level=info msg="Successfully updated the live application spec" application=my-argo-application
time="2024-10-14T19:13:05Z" level=info msg="Processing results: applications=92 images_considered=92 images_skipped=0 images_updated=1 errors=0"
time="2024-10-14T19:15:06Z" level=info msg="Starting image update cycle, considering 92 annotated application(s) for update"
time="2024-10-14T19:15:07Z" level=info msg="Setting new image to 123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@sha256:89c99fabd5b8cd1210c64e67e65bad6c1666b362a409958b199acb43c6db8c62" alias=my-argo-application application=my-argo-application image_name=ecrrepo/my-app-image image_tag=dummy registry=123456780912.dkr.ecr.us-east-1.amazonaws.com
time="2024-10-14T19:15:07Z" level=info msg="Successfully updated image '123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@dummy' to '123456780912.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int@sha256:89c99fabd5b8cd1210c64e67e65bad6c1666b362a409958b199acb43c6db8c62', but pending spec update (dry run=false)" alias=my-argo-application application=my-argo-application image_name=ecrrepo/my-app-image image_tag=dummy registry=123456780912.dkr.ecr.us-east-1.amazonaws.com
time="2024-10-14T19:15:07Z" level=info msg="Committing 1 parameter update(s) for application my-argo-application" application=my-argo-application
time="2024-10-14T19:15:08Z" level=info msg="Successfully updated the live application spec" application=my-argo-application
time="2024-10-14T19:15:14Z" level=info msg="Processing results: applications=92 images_considered=92 images_skipped=0 images_updated=1 errors=0"

The tree(/parent) App has these settings:

syncPolicy:
  automated:
    prune: true
    selfHeal: true
  syncOptions:
    - RespectIgnoreDifferences=true
ignoreDifferences:
  - group: argoproj.io
    kind: Application
    jqPathExpressions:
      - >-
        .spec.source.helm.parameters[] | select(.name |
        contains(".image.tag",".image.repository"))

The leaf app with the helm chart in it has:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd-image-updater.argoproj.io/image-list: >-
      my-argo-application=123456789012.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image:int
    argocd-image-updater.argoproj.io/my-argo-application.helm.image-name: mychart.image.repository
    argocd-image-updater.argoproj.io/my-argo-application.helm.image-tag: mychart.image.tag
    argocd-image-updater.argoproj.io/my-argo-application.update-strategy: digest
spec:
  destination:
    namespace: mynamespace
    server: 'https://kubernetes.default.svc'
  project: myproject
  source:
    helm:
      parameters:
        - forceString: true
          name: mychart.image.tag
          value: >-
            int
        - forceString: true
          name: mychart.image.repository
          value: >-
            123456789012.dkr.ecr.us-east-1.amazonaws.com/ecrrepo/my-app-image

Unfortunately the same happens when using the now merged code on master (which includes some refactoring, nice work!), for my particular use case.
I will try to add some debugging and investigate what argo-cd is doing to see whether they are indeed getting into a fight (where previously they were not!).
Meanwhile, if you have any ideas or pointers, I'm looking forward to those! 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants