Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Debian] could not ensure we are a cloned binary: No such file or directory #2552

Closed
zonk1 opened this issue Aug 11, 2020 · 14 comments
Closed

Comments

@zonk1
Copy link

zonk1 commented Aug 11, 2020

os: debian/sid (w/o systemd)

uname -a: Linux machine 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux

runc:

runc version 1.0.0-rc92+dev
commit: dedadbf9ea76c888dd9067fd6f29f3c441aba421
spec: 1.0.2-dev

(Originally I tried with with 1.0.0-rc10; spec: 1.0.1-dev with same results)

When I upgraded from kernel 5.6.7 to 5.6.14 (currently running 5.7.10 - still broken) docker stopped working:

root@machine(0)root% docker run -d --name cont debian:buster sleep 1000
7886af2b1b752e35db7d612145322b94bd2f230bf9b21f051e5dbfb6b89df2e1
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:315: copying bootstrap data to pipe caused \"write init-p: broken pipe\"": unknown.

It fails when starting container via runc. I've replicated the issue with current runc version 1.0.0-rc92+dev.

# mkdir /tmp/container/rootfs -p
# cd /tmp/container
# docker export $(docker create busybox) | tar -C rootfs -xvf -
# runc spec
# runc --debug run mycontainerid
FATA[0000] nsexec:599 nsenter: could not ensure we are a cloned binary: No such file or directory
DEBU[0000] log pipe has been closed: EOF
WARN[0000] exit status 1
ERRO[0000] container_linux.go:370: starting container process caused: process_linux.go:334: copying bootstrap data to pipe caused: write init-p: broken pipe 

Booting back to 5.6.7 - Linux machine 5.6.0-1-rt-amd64 #1 SMP PREEMPT_RT Debian 5.6.7-1 (2020-04-29) x86_64 GNU/Linux everything works as expected:

root@machine(0)container# /tmp/runc/runc run mycontainerid
/ # uname -a
Linux runc 5.6.0-1-rt-amd64 #1 SMP PREEMPT_RT Debian 5.6.7-1 (2020-04-29) x86_64 GNU/Linux
/ # 
@kolyshkin
Copy link
Contributor

Looks like runc is not able to open "/proc/self/exe" which is weird (and the error is not reported since it's written to stderr rather than via write_log()).

Can you try stracing it to see what is going on?

@AkihiroSuda
Copy link
Member

cc @zhsj PTAL? Regression on Debian kernel?

@AkihiroSuda AkihiroSuda changed the title could not ensure we are a cloned binary: No such file or directory [Debian] could not ensure we are a cloned binary: No such file or directory Aug 12, 2020
@zhsj
Copy link
Contributor

zhsj commented Aug 12, 2020

cc @zhsj PTAL? Regression on Debian kernel?

I can't reproduce it on Debian/testing(both have Linux zhsj-debian 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux), with systemd. Not sure if systemd is relevant here, but @zonk1 says w/o systemd.

docker info
Client:
 Debug Mode: false

Server:
 Containers: 6
  Running: 2
  Paused: 0
  Stopped: 4
 Images: 139
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: kata-runtime runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.7.0-2-amd64
 Operating System: Debian GNU/Linux bullseye/sid
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 23.44GiB
 Name: zhsj-debian
 ID: QJ6V:IQZD:ISBY:GLGJ:QXVM:44GY:MCPZ:RQ4B:DBKB:BMBR:34LJ:ZXPG
 Docker Root Dir: /data/docker/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
 Live Restore Enabled: false

WARNING: No swap limit support

@zonk1
Copy link
Author

zonk1 commented Aug 12, 2020

@kolyshkin wrote:

Looks like runc is not able to open "/proc/self/exe" which is weird (and the error is not reported since it's written to stderr rather than via write_log()).

Can you try stracing it to see what is going on?

Gladly: strace --> https://gist.github.com/zonk1/a187e2668df74d5a904f82df6ef9d763

@zhsj wrote:

I can't reproduce it on Debian/testing(both have Linux zhsj-debian 5.7.0-2-amd64 #1 SMP Debian 5.7.10-1 (2020-07-26) x86_64 GNU/Linux), with systemd. Not sure if systemd is relevant here, but @zonk1 says w/o systemd.

Yesterday I've tried installing clean sid on VM, worked; after disabling systemd (apt-get install sysvinit-core + reboot) it broke. Looks like something changed between (debian) kernels 5.6.7 -> 5.6.14 that systemd "hotfixes"?

@kolyshkin
Copy link
Contributor

Gladly: strace --> https://gist.github.com/zonk1/a187e2668df74d5a904f82df6ef9d763

Alas, the error is different under strace, it is now

[pid 2503] write(2, "container_linux.go:349: starting container process caused "process_linux.go:315: copying bootstrap data to pipe caused \"write init-p: broken pipe\""\n", 150container_linux.go:349: starting container process caused "process_linux.go:315: copying bootstrap data to pipe caused "write init-p: broken pipe""

Will try to repro locally once I have time.

@zonk1
Copy link
Author

zonk1 commented Aug 13, 2020

Gladly: strace --> https://gist.github.com/zonk1/a187e2668df74d5a904f82df6ef9d763

Alas, the error is different under strace, it is now

[pid 2503] write(2, "container_linux.go:349: starting container process caused "process_linux.go:315: copying bootstrap data to pipe caused "write init-p: broken pipe""\n", 150container_linux.go:349: starting container process caused "process_linux.go:315: copying bootstrap data to pipe caused "write init-p: broken pipe""

I did not study the source properly, but it seems to me that the sequence is still the same. I'm guessing there's some piping going on and one process cannot find /proc/self/exe (pid 2500) whereas the other (pid 2003) bails when that one fails:

$ egrep 'level=(fatal|debug|warn|error)' strace
[pid  2500] write(2, "time=\"2020-08-11T22:16:12+02:00\" level=fatal msg=\"nsexec:599 nsenter: could not ensure we are a cloned binary: No such file or directory\"\n", 138 <unfinished ...>
time="2020-08-11T22:16:12+02:00" level=fatal msg="nsexec:599 nsenter: could not ensure we are a cloned binary: No such file or directory"
[pid  2503] write(2, "time=\"2020-08-11T22:16:12+02:00\" level=warning msg=\"exit status 1\"\n", 67time="2020-08-11T22:16:12+02:00" level=warning msg="exit status 1"
[pid  2503] write(2, "time=\"2020-08-11T22:16:12+02:00\" level=error msg=\"container_linux.go:349: starting container process caused \\\"process_linux.go:315: copying bootstrap data to pipe caused \\\\\\\"write init-p: broken pipe\\\\\\\"\\\"\"\n", 207time="2020-08-11T22:16:12+02:00" level=error msg="container_linux.go:349: starting container process caused \"process_linux.go:315: copying bootstrap data to pipe caused \\\"write init-p: broken pipe\\\"\""

Will try to repro locally once I have time.

Thanks!

@zhsj
Copy link
Contributor

zhsj commented Aug 16, 2020

@zonk1 I notice sysvinit was updated recently, https://tracker.debian.org/news/1168996/accepted-sysvinit-296-4-source-into-unstable/

I tried in a VM and it seems docker and runc work without problem after upgrading sysvinit package.

From the changelog of sysvinit, maybe https://bugs.debian.org/968199 is related.

@zhsj
Copy link
Contributor

zhsj commented Aug 16, 2020

Verified that lacking of /dev/fd will cause the problem. But I can't find the source which is expected to use this path...

@cyphar
Copy link
Member

cyphar commented Aug 16, 2020

That's odd, we use /proc/self/fd directly, not /dev/fd. But it's possible (though unlikely) that glibc is trying to use /dev/fd. I'll take a look this week.

@cyphar
Copy link
Member

cyphar commented Aug 16, 2020

Huh, the issue is that execveat is returning ENOENT (when called with AT_EMPTY_PATH). That shouldn't happen (unless you're trying to execute a binary that has an interpreter which cannot be found) but that shouldn't happen here. We are doing a lot of bind-mount trickery here (we're trying to re-exec a temporary ro-bind-mount of /proc/self/exe which has been MNT_DETACH'd) but that definitely works on stock kernels -- I wonder if this is some debian patch causing the issue?

@zonk1
Copy link
Author

zonk1 commented Aug 16, 2020

@zonk1 I notice sysvinit was updated recently, https://tracker.debian.org/news/1168996/accepted-sysvinit-296-4-source-into-unstable/

I tried in a VM and it seems docker and runc work without problem after upgrading sysvinit package.

From the changelog of sysvinit, maybe https://bugs.debian.org/968199 is related.

I can confirm. After upgrade and reboot runc is working once again.

@zonk1
Copy link
Author

zonk1 commented Aug 19, 2020

Should I close the issue?

@XiaodongLoong
Copy link
Contributor

XiaodongLoong commented Aug 31, 2020

ERRO[0000] container_linux.go:370: starting container process caused: process_linux.go:334: copying bootstrap data to pipe caused: write init-p: broken pipe

@zonk1 You can check you syscall number for writing pipe, I ever met an issue that have the same log with your issue.

@zonk1
Copy link
Author

zonk1 commented Sep 8, 2020

ERRO[0000] container_linux.go:370: starting container process caused: process_linux.go:334: copying bootstrap data to pipe caused: write init-p: broken pipe

@zonk1 You can check you syscall number for writing pipe, I ever met an issue that have the same log with your issue.

I'm not sure I understand your question. I believe the issue is described in description (nsexec:599 nsenter: could not ensure we are a cloned binary: No such file or directory). Please reopen if you disagree.

As all is well, many thanks to all who have helped me solve the issue.

@zonk1 zonk1 closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants