Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executor tries to use imagePullSecrets to pull a container image even if anonymous pull is enabled #9802

Open
2 of 3 tasks
vitalyrychkov opened this issue Oct 12, 2022 · 8 comments

Comments

@vitalyrychkov
Copy link
Contributor

vitalyrychkov commented Oct 12, 2022

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

Kubernetes pulls the same images without using imagePullSecrets if anonymous pull is allowed.
Argo executor shall also pull a container image (to check the cmd/args value) without using imagePullSecrets if anonymous pull is enabled.

Version

3.4.1

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

The issue is related to a private images

Logs from the workflow controller

kubectl logs -n argo deploy/argo-helm-argo-workflows-workflow-controller | grep acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.499Z" level=info msg="Processing workflow" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.589Z" level=info msg="Updated phase  -> Running" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.589Z" level=info msg="DAG node acm-adhoc-bps-db-version-1665584107 initialized Running" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.590Z" level=info msg="All of node acm-adhoc-bps-db-version-1665584107.db-version dependencies [] completed" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.591Z" level=info msg="DAG node acm-adhoc-bps-db-version-1665584107-3268391737 initialized Running" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.592Z" level=info msg="All of node acm-adhoc-bps-db-version-1665584107.db-version.db-version-task dependencies [] completed" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.594Z" level=info msg="Pod node acm-adhoc-bps-db-version-1665584107-1331475456 initialized Pending" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=error msg="Mark error node" error="failed to look-up entrypoint/cmd for image \"artifacts-scm.dstcorp.net/algo-docker/acm/bps:acm-5.5.3-00\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"acm-registry-creds\" not found" namespace=acmtmp nodeName=acm-adhoc-bps-db-version-1665584107.db-version.db-version-task workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-1331475456 phase Pending -> Error" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-1331475456 message: failed to look-up entrypoint/cmd for image \"artifacts-scm.dstcorp.net/algo-docker/acm/bps:acm-5.5.3-00\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"acm-registry-creds\" not found" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-1331475456 finished: 2022-10-12 16:17:29.605737508 +0000 UTC" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=error msg="Mark error node" error="task 'acm-adhoc-bps-db-version-1665584107.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image \"artifacts-scm.dstcorp.net/algo-docker/acm/bps:acm-5.5.3-00\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"acm-registry-creds\" not found" namespace=acmtmp nodeName=acm-adhoc-bps-db-version-1665584107.db-version.db-version-task workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.605Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-1331475456 message: task 'acm-adhoc-bps-db-version-1665584107.db-version.db-version-task' errored: failed to look-up entrypoint/cmd for image \"artifacts-scm.dstcorp.net/algo-docker/acm/bps:acm-5.5.3-00\", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets \"acm-registry-creds\" not found" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.608Z" level=info msg="Outbound nodes of acm-adhoc-bps-db-version-1665584107-3268391737 set to [acm-adhoc-bps-db-version-1665584107-1331475456]" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.608Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-3268391737 phase Running -> Error" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.608Z" level=info msg="node acm-adhoc-bps-db-version-1665584107-3268391737 finished: 2022-10-12 16:17:29.608557778 +0000 UTC" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.608Z" level=info msg="Checking daemoned children of acm-adhoc-bps-db-version-1665584107-3268391737" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Outbound nodes of acm-adhoc-bps-db-version-1665584107 set to [acm-adhoc-bps-db-version-1665584107-1331475456]" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="node acm-adhoc-bps-db-version-1665584107 phase Running -> Error" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="node acm-adhoc-bps-db-version-1665584107 finished: 2022-10-12 16:17:29.61131521 +0000 UTC" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Checking daemoned children of acm-adhoc-bps-db-version-1665584107" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="TaskSet Reconciliation" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg=reconcileAgentPod namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Updated phase Running -> Error" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Marking workflow completed" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Marking workflow as pending archiving" namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.611Z" level=info msg="Checking daemoned children of " namespace=acmtmp workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.617Z" level=info msg="cleaning up pod" action=deletePod key=acmtmp/acm-adhoc-bps-db-version-1665584107-1340600742-agent/deletePod
time="2022-10-12T16:17:29.624Z" level=info msg="Workflow update successful" namespace=acmtmp phase=Error resourceVersion=104803467 workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.628Z" level=info msg="archiving workflow" namespace=acmtmp uid=6f8c2dca-17f4-4081-b564-b3f82720a28e workflow=acm-adhoc-bps-db-version-1665584107
time="2022-10-12T16:17:29.659Z" level=info msg="Queueing Error workflow acmtmp/acm-adhoc-bps-db-version-1665584107 for delete in 5m0s due to TTL"

Logs from in your workflow's wait container

No wait logs available, seems argo did not come that far.

@sarabala1979
Copy link
Member

@vitalyrychkov can you provide more details like failed PodSpec and your env setup? Is there a way to reproduce locally?

@vitalyrychkov
Copy link
Contributor Author

vitalyrychkov commented Oct 20, 2022

@sarabala1979
Hi, thank you for patience, it took some time as I tried to create a meaningful test case for you. I have created a primitive workflow and a server deployment based on the same image in my private artifactory. The server, which has the imagePullSecret definition in the deployment, started fine in a pod. I did not create the mentioned secret. Then I have submitted the workflow based on the same image :

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: cmdtest-
  labels:
    workflows.argoproj.io/archive-strategy: "false"
  annotations:
    workflows.argoproj.io/description: |
      This is a test for image command and entrypoint
spec:
  entrypoint: cmdtest
  templates:
  - name: cmdtest
    container:
      image: 'artifacts.mycorp.net/docker/docserver:latest'

In the first run I got the same error, although I did not specify any imagePullSecret in the Workflow:

failed to look-up entrypoint/cmd for image "artifacts.mycorp.net/docker/docserver:latest", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: secrets "app-registry-creds" not found

Then I tried with a non-existent version of my image: image: 'artifacts.mycorp.net/docker/hugo:1.2.3'
In this case Argo receives the corresponding message from the registry:

failed to look-up entrypoint/cmd for image "artifacts.mycorp.net/docker/hugo:1.2.3", you must either explicitly specify the command, or list the image's command in the index: https://argoproj.github.io/argo-workflows/workflow-executors/#emissary-emissary: GET https://artifacts.mycorp.net/v2/docker/hugo/manifests/1.2.3: MANIFEST_UNKNOWN: The named manifest is not known to the registry.; map[manifest:hugo/1.2.3/manifest.json]

Reverted back to the latest and submitted:image: 'artifacts.mycorp.net/docker/docserver:latest'
And all of a sudden Argo can download the image and run the container!
I guess it could be a some kind of cached first deployment's settings in Argo or Kubelet ???

I will keep an eye on the issue and try to nail it down as soon as it re-occurs or maybe someone else reports the same.
For now I would appreciate if someone checks the code if the manifest pull function uses anything like a credentials cache.

Thank you

@anhqqt
Copy link

anhqqt commented Nov 7, 2022

I met the same issue Image pull error: User "system:serviceaccount:argo-workflows:argo-workflows-controller" cannot get resource "secrets" in API group "" in the namespace "argo-workflows" . From the previous, the workflow could normally run in EKS 1.22

But after I created another EKS 1.23 and install the Argo Workflow helm chart with the same values.yaml file, this issue happened. Even if the workflow crd is in the same namespace as the controller, the controller cannot read the imagePullSecret.

My workaround is to turn off the controller.rbac.create and controller.serviceAccount.create in the helm values file. Manually create the ServiceAccount, ClusterRole (with Get Secret permission), ClusterRoleBinding, and put the SA name into controller.serviceAccount.name. Then the workflow controller pod will use the SA I created and is able to read the imagePullSecret

@vitalyrychkov
Copy link
Contributor Author

RIght, my workaround was the same, however I did not have to disable *.create values, just added a ClusterRole to read the secret with a specific name in all namespaces and a ClusterRoleBinding to the workflow-controller's SA.

@stale

This comment was marked as resolved.

1 similar comment
@stale

This comment was marked as duplicate.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 25, 2023
@terrytangyuan terrytangyuan removed the problem/stale This has not had a response in some time label Sep 20, 2023
@noamokman
Copy link

Hey,
I just added this to the values yaml for the chart

argo-workflows:
  controller:
    rbac:
      secretWhitelist:
        - image-pull-secret

replace the name of the secret with any secret you may have.
Works for me when pulling from ECR.

@illomi7
Copy link

illomi7 commented Nov 6, 2024

Hi,
anyone have found a way to solve the issue, we have the same problem when upgrade EKS from 1.28 to 1.31 all the workflow start to failed due to error failed to look-up entrypoint/cmd for image ... our image is on ECR private repo.
If i set the command field in the workflow is working, but this means i need to change all our production workflow, there is a better way to achieve it without need to change all the definition.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants