Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

Closed
cui-liqiang opened this issue Feb 24, 2018 · 26 comments

Comments

@cui-liqiang
Copy link

os:
centos 7.2

uname -a:
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

docker -v
Docker version 17.12.0-ce, build c97c6d6

docker-runc -v
runc version 1.0.0-rc4+dev
commit: b2567b3
spec: 1.0.0

When I run docker run or docker build, the following error appears from time to time. The Probability is around 5% .

docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown.

Any clues?

@teddyking
Copy link
Contributor

Have you enabled user namespace support on your machine? User namespaces are required to create unprivileged containers, and I don't think they're enabled by default on that version of centos.

@frezbo
Copy link

frezbo commented Feb 25, 2018

what does cat /proc/cmdline return?

@cui-liqiang
Copy link
Author

@frezbo it returns:
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=80b9b662-0a1d-4e84-b07b-c1bf19e62d97 ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8

@cui-liqiang
Copy link
Author

@teddyking I checked this:

$ uname -a
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_USER_NS /boot/config-3.10.0-693.11.6.el7.x86_64
CONFIG_USER_NS=y

BTW, if user namespace is disabled, should it always fail or just for sometimes.

@frezbo
Copy link

frezbo commented Feb 26, 2018

@cui-liqiang make sure to go through this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/index#user_namespaces_options

the kernel boot parameter and the kernel para
meter both needs to be set

@cui-liqiang
Copy link
Author

Hi @frezbo
I checked the link. As I understand, it talks about enabling user namespaces mapping. I am actually not using this feature. Do I still need to follow the steps in the link?

My docker daemon options:

$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since 五 2018-02-02 18:06:12 CST; 3 weeks 2 days ago
     Docs: https://docs.docker.com
 Main PID: 11877 (dockerd)
   Memory: 17.7G
   CGroup: /system.slice/docker.service
           ├─11877 /usr/bin/dockerd
           ├─11889 docker-containerd --config /var/run/docker/containerd/containerd.toml
           └─19417 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1139c89d09ff03e8f01daf4d37b58c97146dc58e0a81b640438d717d0a2074d2...

@frezbo
Copy link

frezbo commented Feb 26, 2018

hmm I'm not sure, I thought you were using runc natively.

@cyphar
Copy link
Member

cyphar commented Feb 26, 2018

The error means that unshare failed to unshare the relevant namespaces -- this might not necessarily just be the user namespace. Since you're using CentOS this may be related to SELinux (it can cause permission issues when you don't expect it) -- do you have setenforce 1?

@cui-liqiang
Copy link
Author

@cyphar

$ getenforce
Disabled

Does this look right?

@cyphar
Copy link
Member

cyphar commented Mar 5, 2018

Oh, you're using CentOS 7.2! In older RHEL kernel versions they deny creation of mount namespaces inside a user namspace because of an out-of-tree patch. See #1513 -- apparently RHEL 7.5 will fix this.

In fact, looking at this again, this looks like a duplicate of #1513 -- while the issue is that you cannot run Docker with --userns-remap the underlying problem is the same AFAICS. Can you check whether this command fails:

% sudo unshare -Um

And whether this command works:

% sudo unshare -U

However, this part of the bug report still doesn't make sense to me (the above explanation would make containers always fail to start, I don't understand how it could be probabilistic):

the following error appears from time to time. The Probability is around 5% .

@cui-liqiang
Copy link
Author

@cyphar Neither works.(The output characters: "failed, invalid arguments")

$ sudo unshare -Um
unshare: unshare 失败: 无效的参数
$ sudo unshare -U
unshare: unshare 失败: 无效的参数

@cyphar
Copy link
Member

cyphar commented Mar 5, 2018

Sorry, I forgot to ask for the contents of /etc/docker/daemon.json (if you have one) and the output of docker info.

@cui-liqiang
Copy link
Author

cui-liqiang commented Mar 5, 2018

@cyphar

$ cat /etc/docker/daemon.json
{
  "registry-mirrors": ["https://srgc54k8.mirror.aliyuncs.com"]
}
$ docker info
Containers: 8
 Running: 0
 Paused: 0
 Stopped: 8
Images: 1203
Server Version: 17.12.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.26GiB
Name: iZ2ze43t8c42mqytqholpuZ
ID: TNVJ:PNUD:XTYK:WZGD:HLST:VZG3:JT5A:UNFM:JVDY:VVDK:Y4HR:S22Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://srgc54k8.mirror.aliyuncs.com/
Live Restore Enabled: false

@vikaschoudhary16
Copy link
Contributor

facing same error on make test. I am using rhel7.3 and docker 1.13

@cui-liqiang
Copy link
Author

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

@xyfigo
Copy link

xyfigo commented Nov 9, 2018

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I use gitlab CI to build my own App image on gitlab runner, and I get the same problem. Thank you for your answer and I run echo 1 > /proc/sys/vm/drop_caches in the gitlab runner server, and it works!

@cyphar cyphar closed this as completed Nov 14, 2018
@arukompas
Copy link

arukompas commented Dec 17, 2018

For me it was a memory allocation error as described here - https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory

Mine was a 24GB RAM server with over 15GB allocated to page cache and only 600-800MB of free RAM. I noticed my docker failed to start containers if the "free" memory would drop below 1GB, so I set my vm.min_free_kbytes to 1GB:

#change value for this boot
sysctl -w vm.min_free_kbytes=1048576

#change value for subsequent boots
echo "vm.min_free_kbytes=1048576" >> /etc/sysctl.conf

now it will allocate less to page cache and I won't have to continuously purge it with drop_caches.

Hope it helps anyone.

EDIT:

Forget the above, it's a dirty hack which might lead to other issues on the system. I restarted the server over a month ago. The restarted had reset the fragmented memory and there hasn't been any issues since.

@andreaswolf
Copy link

We had the same error on a Virtuozzo-based system. In our case, it apparently was related to the number of NETFILTER (iptables) rules – raising the value of numiptent from 2000 to 4000 fixed the issue.

@wadeholler
Copy link

we have been fighting an issue where this was the main error we observed. Please look at the total system memory vs cat /proc/meminfo | grep Commit. when swap is disabled we see the CommitLimit being half of the server memory. we are currently testing the parameters: vm.overcommit_memory=2 && vm.overcommit_ratio=200

@jeschaF
Copy link

jeschaF commented Jun 28, 2019

Hey, we have that issue as well and its related to kernel memory fragmentation in Centos/RHEL:

cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.968 0.984 0.992 0.996 
Node 0, zone   Normal -1.000 -1.000 0.747 0.874 0.937 0.969 0.985 0.993 0.997 0.999 0.999 

@wadeholler
Copy link

@jeschaF - do you have a remediation at this time ? or do you reboot the host as well ?

@canerK
Copy link

canerK commented Jun 28, 2019

having the same issue as given below:

:/home/ubuntu# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Unable to find image 'nvidia/cuda:9.0-base' locally
9.0-base: Pulling from nvidia/cuda
9ff7e2e5f967: Pull complete
59856638ac9f: Pull complete
6f317d6d954b: Pull complete
a9dde5e2a643: Pull complete
3dab314fc51e: Pull complete
1a4e7e8b3753: Pull complete
388ed6e4a282: Pull complete
Digest: sha256:09ee586c314e599f7b82317fccfbf4717e037e5b83a9c9a9d7a5ccfe810a3071
Status: Downloaded newer image for nvidia/cuda:9.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=13505 /var/lib/docker/overlay2/88efd338b570e5d09dd7f18fd0b508f962b0060bcdbb448a538b43f9b0b50b66/merged]\\nnvidia-container-cli: initialization error: cuda error: invalid device ordinal\\n\""": unknown.

@Narven
Copy link

Narven commented Aug 20, 2019

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

@rlam3
Copy link

rlam3 commented Nov 25, 2019

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I'm using minikube witht he same issue. I tried ssh into minikube and tried this command bust was denied even with sudo. Any workarounds here that you would recommend?
Thanks!

@ommmid
Copy link

ommmid commented Mar 28, 2020

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

I was sharing folders and forgot the slash at the end

@Mahanotrahul
Copy link

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

Can anyone please help me how do I stop cleaning cache once after running this echo 1 > /proc/sys/vm/drop_caches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests