Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace" #9630

Closed
2 of 3 tasks
vitalyrychkov opened this issue Sep 20, 2022 · 12 comments · Fixed by #11614
Labels
P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@vitalyrychkov
Copy link
Contributor

vitalyrychkov commented Sep 20, 2022

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

We are storing container images of our application in a private image registry. We are deploying Argo using Helm. It seems that the workflow server in the v3.4 tries to read the container image manifest (to lookup the cmd/args) using the argo-helm-argo-workflows-workflow-controller service account from the argo namespace.
Reading the manifest requires registry access credentials in case of a private image registry and we provide the secret with credentials in deployments:

      imagePullSecrets:
        - name: registry-credentials

When we submit a workflow the workflow controller's service account fails to read the registry access credentials from the secret located in the namespace of the application:

Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace"

Earlier, we have tested one of the latest 3.3.9 builds and it could pull and read the image successfully, see the issue #9139

We are using argo service account in the application's namespace to submit workflows (--serviceaccount option) which can read the secret in the same namespace. Would it be possible to use this service account to pull the image manifest? Otherwise the user system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller must be able to read secrets in all namespaces where an application is deployed?

Please explain how to use images from a private registry with access credentials in the v.3.4.0.

Version

3.4.0

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

[The issue seems to be specific to accessing credentials for private registries from the secret in the application's namespace.]

Logs from the workflow controller

time="2022-09-20T12:46:03.478Z" level=info msg="Processing workflow" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.649Z" level=info msg="Updated phase  -> Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.649Z" level=info msg="DAG node app-adhoc-ac-db-version-1663677914 initialized Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.649Z" level=info msg="All of node app-adhoc-ac-db-version-1663677914.db-version dependencies [] completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.656Z" level=info msg="DAG node app-adhoc-ac-db-version-1663677914-749901051 initialized Running" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.656Z" level=info msg="All of node app-adhoc-ac-db-version-1663677914.db-version.db-version-task dependencies [] completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.666Z" level=info msg="Pod node app-adhoc-ac-db-version-1663677914-3159602526 initialized Pending" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=error msg="Mark error node" error="failed to look-up entrypoint/cmd for image \"myregistry.cloud/releases/myapp:myimage\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"app-registry-creds\" is forbidden: User \"system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller\" cannot get resource \"secrets\" in API group \"\" in the namespace \"san-app-test\"" namespace=san-app-test nodeName=app-adhoc-ac-db-version-1663677914.db-version.db-version-task workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 phase Pending -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 message: failed to look-up entrypoint/cmd for image \"myregistry.cloud/releases/myapp:myimage\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"app-registry-creds\" is forbidden: User \"system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller\" cannot get resource \"secrets\" in API group \"\" in the namespace \"san-app-test\"" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 finished: 2022-09-20 12:46:03.674633014 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=error msg="Mark error node" error="task 'app-adhoc-ac-db-version-1663677914.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image \"myregistry.cloud/releases/myapp:myimage\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"app-registry-creds\" is forbidden: User \"system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller\" cannot get resource \"secrets\" in API group \"\" in the namespace \"san-app-test\"" namespace=san-app-test nodeName=app-adhoc-ac-db-version-1663677914.db-version.db-version-task workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.674Z" level=info msg="node app-adhoc-ac-db-version-1663677914-3159602526 message: task 'app-adhoc-ac-db-version-1663677914.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image \"myregistry.cloud/releases/myapp:myimage\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"app-registry-creds\" is forbidden: User \"system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller\" cannot get resource \"secrets\" in API group \"\" in the namespace \"san-app-test\"" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.686Z" level=info msg="Outbound nodes of app-adhoc-ac-db-version-1663677914-749901051 set to [app-adhoc-ac-db-version-1663677914-3159602526]" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.686Z" level=info msg="node app-adhoc-ac-db-version-1663677914-749901051 phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.686Z" level=info msg="node app-adhoc-ac-db-version-1663677914-749901051 finished: 2022-09-20 12:46:03.686553147 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.686Z" level=info msg="Checking daemoned children of app-adhoc-ac-db-version-1663677914-749901051" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Outbound nodes of app-adhoc-ac-db-version-1663677914 set to [app-adhoc-ac-db-version-1663677914-3159602526]" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="node app-adhoc-ac-db-version-1663677914 phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="node app-adhoc-ac-db-version-1663677914 finished: 2022-09-20 12:46:03.69151054 +0000 UTC" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Checking daemoned children of app-adhoc-ac-db-version-1663677914" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="TaskSet Reconciliation" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg=reconcileAgentPod namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Updated phase Running -> Error" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Marking workflow completed" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Marking workflow as pending archiving" namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.691Z" level=info msg="Checking daemoned children of " namespace=san-app-test workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.696Z" level=info msg="cleaning up pod" action=deletePod key=san-app-test/app-adhoc-ac-db-version-1663677914-1340600742-agent/deletePod
time="2022-09-20T12:46:03.704Z" level=info msg="Workflow update successful" namespace=san-app-test phase=Error resourceVersion=100074719 workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.707Z" level=info msg="archiving workflow" namespace=san-app-test uid=e25bc895-59ec-46ae-8e39-4dea893eb0f7 workflow=app-adhoc-ac-db-version-1663677914
time="2022-09-20T12:46:03.727Z" level=info msg="Queueing Error workflow san-app-test/app-adhoc-ac-db-version-1663677914 for delete in 5m0s due to TTL"
time="2022-09-20T12:51:04.000Z" level=info msg="Deleting garbage collected workflow 'san-app-test/app-adhoc-ac-db-version-1663677914'"
time="2022-09-20T12:51:04.014Z" level=info msg="Successfully deleted 'san-app-test/app-adhoc-ac-db-version-1663677914'"

Logs from in your workflow's wait container

[no output, as the workflow could not be submitted due to manifest pull error]

@sarabala1979
Copy link
Member

@vitalyrychkov can you provide your k8s version?

@sarabala1979
Copy link
Member

There is PR for supporting v1.24 service account secret change. #9620

@vitalyrychkov
Copy link
Contributor Author

vitalyrychkov commented Sep 23, 2022

@sarabala1979
K8s cluster version: 1.23.2
kubectl client version: 1.23

@sarabala1979 sarabala1979 added the P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority label Sep 23, 2022
@sarabala1979
Copy link
Member

@terrytangyuan will work on this.

@vitalyrychkov
Copy link
Contributor Author

vitalyrychkov commented Oct 5, 2022

Hi @sarabala1979 and @terrytangyuan

We have tried to use a private image registry with anonymous pull enabled.

We use the same image to start a pod (service) and to submit a task in Argo.
The service account of the workflow-controller was given RBAC permissions to read the secret defined in the "imagePullSecrets" parameter of our deployments.

We have tested the following scenarios:

  • Password protected access only. The imagePullSecret exists in our namespace.
    Our pod starts fine using the registry credentials from the secret.
    Submitted task starts fine using the registry credentials from the secret.

  • Anonymous access enabled. The imagePullSecret does not exist.
    Our pod starts fine without using registry credentials.
    Submitted task fails to lookup entrypoint/cmd with the error message "secrets not found".

  • Anonymous access enabled. The imagePullSecret exists in our namespace.
    Our pod starts fine.
    Submitted task starts fine using the registry credentials from the secret.

Seems that if the imagePullSecret is specified in the deployment, the workflow-controller always tries to authenticate instead of anonymous pull?
Would it be possible to try first the anonymous and then password-protected pull or to add a parameter to switch between them?
Shall we discuss this issue here or open a separate issue?

@vitalyrychkov
Copy link
Contributor Author

vitalyrychkov commented Oct 12, 2022

Shall we discuss this issue here or open a separate issue?

Created a separate issue for this: #9802

@terrytangyuan terrytangyuan removed their assignment Oct 12, 2022
@sarabala1979 sarabala1979 added the type/regression Regression from previous behavior (a specific type of bug) label Oct 20, 2022
@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Nov 12, 2022
@scravy
Copy link
Contributor

scravy commented Nov 14, 2022

Bumping this issue, not stale. Also: https://drewdevault.com/2021/10/26/stalebot.html

@stale stale bot removed the problem/stale This has not had a response in some time label Nov 14, 2022
@stale

This comment was marked as resolved.

1 similar comment
@stale

This comment was marked as resolved.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 25, 2023
@tico24 tico24 removed the problem/stale This has not had a response in some time label Mar 27, 2023
@kcirtapfromspace
Copy link

kcirtapfromspace commented Mar 29, 2023

Feel like I have a similar situation. When running on kind cluster using Tiltdev to build container into private nonsecure docker docker registry(with ctlptl). There is no issue pulling from public registries, but argo workflow just cannot seem to figure out the local registry bit, (while directly pushing a k8s Job there are no issue.)

https://docs.docker.com/registry/deploying/

Scenario available here:
https://github.com/kcirtapfromspace/got99prblms

@tribuipi
Copy link

tribuipi commented Aug 14, 2023

Just add my case for reference:

Context:

  • Cloud: Google Cloud
  • Argo Workflows 3.4.9 deployed on GKE 1.27 in Project A
  • Private Docker Registry (asia.gcr.io) in Project B

I configured the imagePullSecrets for default service account in default namespace. (Private images are pulled successfully to default namespace)

submitted my workflow to this namespace and got this error:

User "system:serviceaccount:argo:argo" cannot get resource "secrets" in API group "" in the namespace "default"

Add a new role, and grant the argo service account permission:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: secret-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-secret-reader
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: secret-reader
subjects:
- kind: ServiceAccount
  name: argo
  namespace: argo

The above error gone, but got a new one:

..failed to look-up entrypoint/cmd for image "asia.gcr.io/....", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary, DENIED: Permission denied for...

Tried to figure out what's wrong for almost 4 days, even looked at the source code. Finally, found this issue: crossplane/crossplane#3023 (comment)

For anyone who has the same setup as me, you must grant permissions for the Default Compute Engine service account of the project that runs Argo on GKE (Project A) to access the Container Registry/Artifact Registry in the project that hosts the container registry (Project B)

For team, please update to a newer version of go-containerregistry that respects the chain order (k8s first).

Thanks.

sonbui00 added a commit to sonbui00/argo-workflows that referenced this issue Aug 19, 2023
sonbui00 added a commit to sonbui00/argo-workflows that referenced this issue Aug 19, 2023
sonbui00 added a commit to sonbui00/argo-workflows that referenced this issue Aug 19, 2023
terrytangyuan pushed a commit that referenced this issue Aug 22, 2023
@agilgur5 agilgur5 added area/upstream This is an issue with an upstream dependency, not Argo itself and removed area/upstream This is an issue with an upstream dependency, not Argo itself labels Feb 21, 2024
@agilgur5 agilgur5 changed the title Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace" Image pull error: User "system:serviceaccount:argo:argo-helm-argo-workflows-workflow-controller" cannot get resource "secrets" in API group "" in the namespace "mynamespace" Apr 26, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this issue May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants