Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[helm-oci] ECR auth expires #787

Closed
nalbury opened this issue Jun 19, 2022 · 7 comments · Fixed by #799
Closed

[helm-oci] ECR auth expires #787

nalbury opened this issue Jun 19, 2022 · 7 comments · Fixed by #799
Labels
area/oci OCI related issues and pull requests

Comments

@nalbury
Copy link

nalbury commented Jun 19, 2022

While attempting to set up ECR as an OCI chart repo, we followed the recommended pattern here to configure a Kube secret with the required registry credentials for the OCI repo, but noticed that the source controller only seems to fetch this secret once on boot. This unfortunately means that once the ECR token expires, the source controller needs to be restarted before authentication will work again and the repo/charts can be reconciled.

Example of the state post expiration:

  • I can login to ECR via the helm cli with the data in the kube secret
[root@jumbox ~]# kubectl get secret -n flux-system ecr-auth -o json |jq '.data.".dockerconfigjson"' -r |base64 --decode |jq '.auths."<redacted>.dkr.ecr.us-west-2.amazonaws.com/helm".password' -r |helm registry login -u AWS --password-stdin <redacted>.dkr.ecr.us-west-2.amazonaws.com/helm
Login Succeeded
[root@jumpbox ~]# helm show chart oci://<redacted>.dkr.ecr.us-west-2.amazonaws.com/helm/my-chart |grep apiVersion
apiVersion: v2
  • But if I look at the status of an ECR hosted helm chart there's a chart pull error saying the token has expired
[root@jumpbox ~]# flux get source chart my-chart
NAME            REVISION	SUSPENDED	READY	MESSAGE
my-chart	0.2.0   	False    	False	chart pull error: chart pull error: failed to get chart version for remote reference: GET "https://<redacted>.dkr.ecr.us-west-2.amazonaws.com/v2/helm/my-chart/tags/list": unexpected status code 403: denied: Your authorization token has expired. Reauthenticate and try again.
  • If I restart the source-controller (delete the pod), then the secret is seemingly reloaded on boot and the chart can reconcile again until the newly loaded token has expired
[root@jumpbox ~]# kubectl delete pod -n flux-system source-controller-644c69fbf7-vpczd
pod "source-controller-644c69fbf7-vpczd" deleted
[root@jumpbox ~]# flux get source chart my-chart
NAME            REVISION	SUSPENDED	READY	MESSAGE
my-chart	0.2.0   	False    	True 	pulled 'my-chart' chart with version '0.2.0'

I know the recommended pattern linked above is from the documentation for the image automation controllers, so wondering if the source-controller is supposed to operate in the same way? It was mentioned here that some caching may be at play.

@souleb
Copy link
Member

souleb commented Jun 21, 2022

Hello @nalbury, can you post kubectl describe helmrepository and kubectl describe helmrelease here please?

Also can you post the source-controller logs as well please?

@stefanprodan stefanprodan added the area/oci OCI related issues and pull requests label Jun 22, 2022
@stefanprodan stefanprodan changed the title ECR auth expires with helm OCI repos [helm-oci] ECR auth expires Jun 22, 2022
@nalbury
Copy link
Author

nalbury commented Jun 27, 2022

As requested (had to redact some account IDs and some of the values as they're work specific):

HelmRepostitory:

Name:         ecr
Namespace:    flux-system
Labels:       kustomize.toolkit.fluxcd.io/name=sources
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  reconcile.fluxcd.io/requestedAt: 2022-06-20T12:51:19.602886928Z
API Version:  source.toolkit.fluxcd.io/v1beta2
Kind:         HelmRepository
Metadata:
  Creation Timestamp:  2022-06-09T14:57:39Z
  Finalizers:
    finalizers.fluxcd.io
  Generation:  5
  Managed Fields:
    API Version:  source.toolkit.fluxcd.io/v1beta2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name:
          f:kustomize.toolkit.fluxcd.io/namespace:
      f:spec:
        f:interval:
        f:secretRef:
          f:name:
        f:type:
        f:url:
    Manager:      kustomize-controller
    Operation:    Apply
    Time:         2022-06-09T17:55:56Z
    API Version:  source.toolkit.fluxcd.io/v1beta2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizers.fluxcd.io":
      f:status:
        f:conditions:
        f:lastHandledReconcileAt:
        f:observedGeneration:
    Manager:      source-controller
    Operation:    Update
    Time:         2022-06-10T11:32:36Z
    API Version:  source.toolkit.fluxcd.io/v1beta2
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:reconcile.fluxcd.io/requestedAt:
    Manager:         flux
    Operation:       Update
    Time:            2022-06-10T11:34:24Z
  Resource Version:  13055348
  UID:               9e31f644-3840-4a7e-a34e-fbbc96e403c0
Spec:
  Interval:  5m0s
  Secret Ref:
    Name:   ecr-auth
  Timeout:  60s
  Type:     oci
  URL:      oci://<redacted>.dkr.ecr.us-west-2.amazonaws.com/helm
Status:
  Conditions:
    Last Transition Time:     2022-06-16T13:52:02Z
    Message:                  Helm repository is ready
    Observed Generation:      5
    Reason:                   Succeeded
    Status:                   True
    Type:                     Ready
  Last Handled Reconcile At:  2022-06-20T12:51:19.602886928Z
  Observed Generation:        5
Events:                       <none>

HelmRelease

Name:         my-app
Namespace:    my-namespace
Labels:       kustomize.toolkit.fluxcd.io/name=my-namespace
              kustomize.toolkit.fluxcd.io/namespace=flux-system
Annotations:  reconcile.fluxcd.io/requestedAt: 2022-06-20T12:51:56.444579522Z
API Version:  helm.toolkit.fluxcd.io/v2beta1
Kind:         HelmRelease
Metadata:
  Creation Timestamp:  2022-06-09T15:05:42Z
  Finalizers:
    finalizers.fluxcd.io
  Generation:  5
  Managed Fields:
    API Version:  helm.toolkit.fluxcd.io/v2beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          f:kustomize.toolkit.fluxcd.io/name:
          f:kustomize.toolkit.fluxcd.io/namespace:
      f:spec:
        f:chart:
          f:spec:
            f:chart:
            f:sourceRef:
              f:kind:
              f:name:
              f:namespace:
            f:version:
        f:install:
          f:remediation:
            f:retries:
        f:interval:
        f:releaseName:
        f:targetNamespace:
        f:timeout:
        f:values:
    Manager:      kustomize-controller
    Operation:    Apply
    Time:         2022-06-10T13:00:17Z
    API Version:  helm.toolkit.fluxcd.io/v2beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:reconcile.fluxcd.io/requestedAt:
    Manager:      flux
    Operation:    Update
    Time:         2022-06-20T12:52:19Z
    API Version:  helm.toolkit.fluxcd.io/v2beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .:
          v:"finalizers.fluxcd.io":
      f:status:
        f:conditions:
        f:failures:
        f:helmChart:
        f:lastAppliedRevision:
        f:lastAttemptedRevision:
        f:lastAttemptedValuesChecksum:
        f:lastHandledReconcileAt:
        f:lastReleaseRevision:
        f:observedGeneration:
    Manager:         helm-controller
    Operation:       Update
    Time:            2022-06-21T00:01:06Z
  Resource Version:  19441662
  UID:               313cd68e-e8bd-4bab-b98b-352bce3d7a64
Spec:
  Chart:
    Spec:
      Chart:               my-chart
      Reconcile Strategy:  ChartVersion
      Source Ref:
        Kind:       HelmRepository
        Name:       ecr
        Namespace:  flux-system
      Version:      0.2.0
  Install:
    Remediation:
      Retries:       3
  Interval:          1m
  Release Name:      my-app
  Target Namespace:  my-namespace
  Timeout:           10m0s
  Values:
    Image:
      Tag:  my-app-1.0.19
Status:
  Conditions:
    Last Transition Time:          2022-06-21T00:01:06Z
    Message:                       HelmChart 'flux-system/my-namespace-my-app' is not ready
    Reason:                        ArtifactFailed
    Status:                        False
    Type:                          Ready
    Last Transition Time:          2022-06-20T12:52:27Z
    Message:                       Helm upgrade succeeded
    Reason:                        UpgradeSucceeded
    Status:                        True
    Type:                          Released
  Failures:                        9337
  Helm Chart:                      flux-system/my-namespace-my-app
  Last Applied Revision:           0.2.0
  Last Attempted Revision:         0.2.0
  Last Attempted Values Checksum:  6dd0181482c19c5a1d858af61822bc1e954ac809
  Last Handled Reconcile At:       2022-06-20T12:51:56.444579522Z
  Last Release Revision:           4
  Observed Generation:             5
Events:
  Type    Reason  Age                      From             Message
  ----    ------  ----                     ----             -------
  Normal  info    3m30s (x21603 over 17d)  helm-controller  HelmChart 'flux-system/my-namespace-my-app' is not ready

Source Controller logs:

{"level":"info","ts":"2022-06-27T11:55:44.970Z","logger":"controller.gitrepository","msg":"no changes since last reconcilation: observed revision 'master/c2be202685fd4c5218d6da49d9eff23480ce7d2f'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:56:11.803Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: '1.4.0'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-aws-load-balancer-controller","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:56:12.217Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: 'v3.22.0'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"calico-system-calico","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:56:13.836Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: 'v2.3.1'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kyverno-kyverno","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:56:45.205Z","logger":"controller.gitrepository","msg":"no changes since last reconcilation: observed revision 'master/c2be202685fd4c5218d6da49d9eff23480ce7d2f'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system"}
{"level":"error","ts":"2022-06-27T11:56:46.008Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"my-namespace-my-app","namespace":"flux-system","error":"chart pull error: chart pull error: failed to get chart version for remote reference: GET \"https://<redacted>.dkr.ecr.us-west-2.amazonaws.com/v2/helm/my-chart/tags/list\": unexpected status code 403: denied: Your authorization token has expired. Reauthenticate and try again."}
{"level":"info","ts":"2022-06-27T11:57:11.826Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: '1.4.0'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-aws-load-balancer-controller","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:57:12.222Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: 'v3.22.0'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"calico-system-calico","namespace":"flux-system"}
{"level":"info","ts":"2022-06-27T11:57:13.854Z","logger":"controller.helmchart","msg":"artifact up-to-date with remote revision: 'v2.3.1'","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kyverno-kyverno","namespace":"flux-system"}

@souleb
Copy link
Member

souleb commented Jun 27, 2022

Thanks @nalbury we have identified the issue. Working on fixing this.

@nalbury
Copy link
Author

nalbury commented Jun 27, 2022

Amazing thank you!

@stefanprodan
Copy link
Member

We'll probably have to use @souleb's fork of Helm until this gets merged: helm/helm#11086

@souleb
Copy link
Member

souleb commented Jun 28, 2022

I have tested the fix with the following scenarios

  • ecr registry -> eks cluster with a cronjob rotating the ecr token
    1. create the secret
    2. create a helmrepository and a helmrelease
    3. confirm successful reconciliation
    4. wait 12h hours for the token expiration
    5. confirm that it still works
  • github registry -> manually rotating the PAT in the kubernetes secret
    1. create the secret
    2. create a helmrepository and a helmrelease
    3. confirm successful reconciliation
    4. delete and recreate PAT on github
    5. update secret with new token
    6. confirm that it still works

@nalbury do you have the possibility to test the fix? See #799

@nalbury
Copy link
Author

nalbury commented Jun 29, 2022

Yup deployed an image built from your branch this morning. Should be able to verify this evening once the currently loaded token expires.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/oci OCI related issues and pull requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants