Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-node-image-builder container fails in Azure pipeline #991

Closed
mboersma opened this issue Oct 5, 2022 · 10 comments · Fixed by #1451
Closed

cluster-node-image-builder container fails in Azure pipeline #991

mboersma opened this issue Oct 5, 2022 · 10 comments · Fixed by #1451
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@mboersma
Copy link
Contributor

mboersma commented Oct 5, 2022

What steps did you take and what happened:

When running Azure image-builder tasks from this repository in pipelines, we want to use the container produced by make -C images/capi docker-build as recommended in the Image Builder book. But in practice it errors out, as shown in the below log.

This same startup script with useradd runs successfully when backed by the old deis/go-dev container.

What did you expect to happen:

Anything else you would like to add:

Starting: Initialize containers
/usr/bin/docker version --format '{{.Server.APIVersion}}'
'1.41'
Docker daemon API version: '1.41'
/usr/bin/docker version --format '{{.Client.APIVersion}}'
'1.41'
Docker client API version: '1.41'
/usr/bin/docker ps --all --quiet --no-trunc --filter "label=d03b08"
/usr/bin/docker network prune --force --filter "label=d03b08"
/usr/bin/docker pull k8s.gcr.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.13
v0.1.13: Pulling from scl-image-builder/cluster-node-image-builder-amd64
675920708c8b: Pulling fs layer
...
b955c67dff1f: Pull complete
Digest: sha256:cd095a2718006fffeced8acdf296e65b108999e1c742672d6dea83ee45c6e060
Status: Downloaded newer image for k8s.gcr.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.13
k8s.gcr.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.13
/usr/bin/docker info -f "{{range .Plugins.Network}}{{println .}}{{end}}"
bridge
host
ipvlan
macvlan
null
overlay
/usr/bin/docker network create --label d03b08 vsts_network_14e080230a5f4d178a8f7af4a7ce14d9
063dbf403c87b9cf2d4fef33f83d6c772495242ce91f2baf87fe13b254efacb7
/usr/bin/docker inspect --format="{{index .Config.Labels \"com.azure.dev.pipelines.agent.handler.node.path\"}}" k8s.gcr.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.13
/usr/bin/docker create --name dbeddeee79b04fad83f8ca3caadbcedb_k8sgcriosclimagebuilderclusternodeimagebuilderamd64v0113_ab9354 --label d03b08 --network vsts_network_14e080230a5f4d178a8f7af4a7ce14d9  -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/mnt/vss/_work/1":"/__w/1" -v "/mnt/vss/_work/_temp":"/__w/_temp" -v "/mnt/vss/_work/_tasks":"/__w/_tasks" -v "/mnt/vss/_work/_tool":"/__t" -v "/usr/local/vss-agent/2.211.0/externals":"/__a/externals":ro -v "/mnt/vss/_work/.taskkey":"/__w/.taskkey" k8s.gcr.io/scl-image-builder/cluster-node-image-builder-amd64:v0.1.13 "/__a/externals/node/bin/node" -e "setInterval(function(){}, 24 * 60 * 60 * 1000);"
de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf
/usr/bin/docker start de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf
de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf
/usr/bin/docker ps --all --filter id=de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf --filter status=running --no-trunc --format "{{.ID}} {{.Status}}"
de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf Up Less than a second
/usr/bin/docker exec  de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf sh -c "command -v bash"
whoami 
cloudtest
id -u cloudtest
1001
id -g cloudtest
1001
id -gn cloudtest
cloudtest
Try to create a user with UID '1001' inside the container.
/usr/bin/docker exec  de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf bash -c "getent passwd 1001 | cut -d: -f1 "
/usr/bin/docker exec  de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf useradd -m -u 1001 cloudtest_azpcontainer
##[error]Docker-exec executed: `useradd -m -u 1001 cloudtest_azpcontainer`; container id: `de54b37a05ca92436abb12da1038bfc8efd01a1aed922c70459375e7085cddcf`; exit code: `1`; command output: `useradd: Permission denied.`, `useradd: cannot lock /etc/passwd; try again later.`
Finishing: Initialize containers

Environment:

Project (Image Builder for Cluster API, kube-deploy/imagebuilder, konfigadm):

Additional info for Image Builder for Cluster API related issues:

  • OS (e.g. from /etc/os-release, or cmd /c ver):
  • Packer Version:
  • Packer Provider:
  • Ansible Version:
  • Cluster-api version (if using):
  • Kubernetes version: (use kubectl version):

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 5, 2022
@mboersma
Copy link
Contributor Author

mboersma commented Oct 5, 2022

It's odd that the script pokes at the container, finds there is already a "cloudtest" user with UID 1001, then goes ahead and creates a "cloudtest_azpcontainer" user with the same UID.

I would think "cannot lock /etc/passwd" might be an appropriate error in this case, except that the same thing happens when backed by the deis/go-dev container, and succeeds in that case.

@willie-yao
Copy link
Contributor

/assign

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2023
@mboersma
Copy link
Contributor Author

mboersma commented Jan 4, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2023
@willie-yao
Copy link
Contributor

/unassign

Will pick this back up once I have enough cycles

@mboersma
Copy link
Contributor Author

/assign

@mboersma
Copy link
Contributor Author

This failure probably arises from either the fact that the container has a non-root user and defines an ENTRYPOINT (although I can see in the DevOps logs that the container is started with an override to this, so I don't think that's actually a problem).

Here are the general requirements for a container to be used in Azure DevOps pipelines: https://learn.microsoft.com/azure/devops/pipelines/process/container-phases?view=azure-devops#requirements

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 29, 2023
@mboersma
Copy link
Contributor Author

mboersma commented Jun 5, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2023
@mboersma mboersma removed their assignment Oct 24, 2023
@willie-yao
Copy link
Contributor

/priority backlog

@k8s-ci-robot k8s-ci-robot added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants