Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

cui-liqiang · 2018-02-24T11:54:33Z

os:
centos 7.2

uname -a:
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

docker -v
Docker version 17.12.0-ce, build c97c6d6

docker-runc -v
runc version 1.0.0-rc4+dev
commit: b2567b3
spec: 1.0.0

When I run docker run or docker build, the following error appears from time to time. The Probability is around 5% .

docker: Error response from daemon: OCI runtime create failed: container_linux.go:296: starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown.

Any clues?

The text was updated successfully, but these errors were encountered:

teddyking · 2018-02-25T13:52:43Z

Have you enabled user namespace support on your machine? User namespaces are required to create unprivileged containers, and I don't think they're enabled by default on that version of centos.

frezbo · 2018-02-25T16:32:25Z

what does cat /proc/cmdline return?

cui-liqiang · 2018-02-26T01:27:19Z

@frezbo it returns:
BOOT_IMAGE=/boot/vmlinuz-3.10.0-693.11.6.el7.x86_64 root=UUID=80b9b662-0a1d-4e84-b07b-c1bf19e62d97 ro crashkernel=auto rhgb quiet LANG=en_US.UTF-8

cui-liqiang · 2018-02-26T01:34:02Z

@teddyking I checked this:

$ uname -a
Linux iZ2ze43t8c42mqytqholpuZ 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ grep CONFIG_USER_NS /boot/config-3.10.0-693.11.6.el7.x86_64
CONFIG_USER_NS=y

BTW, if user namespace is disabled, should it always fail or just for sometimes.

frezbo · 2018-02-26T05:00:04Z

@cui-liqiang make sure to go through this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html-single/getting_started_with_containers/index#user_namespaces_options

the kernel boot parameter and the kernel para
meter both needs to be set

cui-liqiang · 2018-02-26T09:53:02Z

Hi @frezbo
I checked the link. As I understand, it talks about enabling user namespaces mapping. I am actually not using this feature. Do I still need to follow the steps in the link?

My docker daemon options:

$ systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
   Active: active (running) since 五 2018-02-02 18:06:12 CST; 3 weeks 2 days ago
     Docs: https://docs.docker.com
 Main PID: 11877 (dockerd)
   Memory: 17.7G
   CGroup: /system.slice/docker.service
           ├─11877 /usr/bin/dockerd
           ├─11889 docker-containerd --config /var/run/docker/containerd/containerd.toml
           └─19417 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/1139c89d09ff03e8f01daf4d37b58c97146dc58e0a81b640438d717d0a2074d2...

frezbo · 2018-02-26T10:05:59Z

hmm I'm not sure, I thought you were using runc natively.

cyphar · 2018-02-26T14:24:59Z

The error means that unshare failed to unshare the relevant namespaces -- this might not necessarily just be the user namespace. Since you're using CentOS this may be related to SELinux (it can cause permission issues when you don't expect it) -- do you have setenforce 1?

cui-liqiang · 2018-03-05T01:27:34Z

@cyphar

$ getenforce
Disabled

Does this look right?

cyphar · 2018-03-05T03:57:44Z

Oh, you're using CentOS 7.2! In older RHEL kernel versions they deny creation of mount namespaces inside a user namspace because of an out-of-tree patch. See #1513 -- apparently RHEL 7.5 will fix this.

In fact, looking at this again, this looks like a duplicate of #1513 -- while the issue is that you cannot run Docker with --userns-remap the underlying problem is the same AFAICS. Can you check whether this command fails:

% sudo unshare -Um

And whether this command works:

% sudo unshare -U

However, this part of the bug report still doesn't make sense to me (the above explanation would make containers always fail to start, I don't understand how it could be probabilistic):

the following error appears from time to time. The Probability is around 5% .

cui-liqiang · 2018-03-05T05:49:15Z

@cyphar Neither works.(The output characters: "failed, invalid arguments")

$ sudo unshare -Um
unshare: unshare 失败: 无效的参数
$ sudo unshare -U
unshare: unshare 失败: 无效的参数

cyphar · 2018-03-05T06:23:13Z

Sorry, I forgot to ask for the contents of /etc/docker/daemon.json (if you have one) and the output of docker info.

cui-liqiang · 2018-03-05T06:58:59Z

@cyphar

$ cat /etc/docker/daemon.json
{
  "registry-mirrors": ["https://srgc54k8.mirror.aliyuncs.com"]
}
$ docker info
Containers: 8
 Running: 0
 Paused: 0
 Stopped: 8
Images: 1203
Server Version: 17.12.0-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.6.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.26GiB
Name: iZ2ze43t8c42mqytqholpuZ
ID: TNVJ:PNUD:XTYK:WZGD:HLST:VZG3:JT5A:UNFM:JVDY:VVDK:Y4HR:S22Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://srgc54k8.mirror.aliyuncs.com/
Live Restore Enabled: false

vikaschoudhary16 · 2018-03-07T04:34:03Z

facing same error on make test. I am using rhel7.3 and docker 1.13

cui-liqiang · 2018-04-04T11:30:29Z

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

xyfigo · 2018-11-09T01:30:02Z

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I use gitlab CI to build my own App image on gitlab runner, and I get the same problem. Thank you for your answer and I run echo 1 > /proc/sys/vm/drop_caches in the gitlab runner server, and it works!

arukompas · 2018-12-17T11:56:31Z

For me it was a memory allocation error as described here - https://serverfault.com/questions/236170/page-allocation-failure-am-i-running-out-of-memory

Mine was a 24GB RAM server with over 15GB allocated to page cache and only 600-800MB of free RAM. I noticed my docker failed to start containers if the "free" memory would drop below 1GB, so I set my vm.min_free_kbytes to 1GB:

#change value for this boot
sysctl -w vm.min_free_kbytes=1048576

#change value for subsequent boots
echo "vm.min_free_kbytes=1048576" >> /etc/sysctl.conf

now it will allocate less to page cache and I won't have to continuously purge it with drop_caches.

Hope it helps anyone.

EDIT:

Forget the above, it's a dirty hack which might lead to other issues on the system. I restarted the server over a month ago. The restarted had reset the fragmented memory and there hasn't been any issues since.

andreaswolf · 2019-02-05T10:36:52Z

We had the same error on a Virtuozzo-based system. In our case, it apparently was related to the number of NETFILTER (iptables) rules – raising the value of numiptent from 2000 to 4000 fixed the issue.

wadeholler · 2019-04-18T14:32:05Z

we have been fighting an issue where this was the main error we observed. Please look at the total system memory vs cat /proc/meminfo | grep Commit. when swap is disabled we see the CommitLimit being half of the server memory. we are currently testing the parameters: vm.overcommit_memory=2 && vm.overcommit_ratio=200

jeschaF · 2019-06-28T11:58:58Z

Hey, we have that issue as well and its related to kernel memory fragmentation in Centos/RHEL:

cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.968 0.984 0.992 0.996 
Node 0, zone   Normal -1.000 -1.000 0.747 0.874 0.937 0.969 0.985 0.993 0.997 0.999 0.999

wadeholler · 2019-06-28T17:48:24Z

@jeschaF - do you have a remediation at this time ? or do you reboot the host as well ?

canerK · 2019-06-28T20:33:49Z

having the same issue as given below:

:/home/ubuntu# docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
Unable to find image 'nvidia/cuda:9.0-base' locally
9.0-base: Pulling from nvidia/cuda
9ff7e2e5f967: Pull complete
59856638ac9f: Pull complete
6f317d6d954b: Pull complete
a9dde5e2a643: Pull complete
3dab314fc51e: Pull complete
1a4e7e8b3753: Pull complete
388ed6e4a282: Pull complete
Digest: sha256:09ee586c314e599f7b82317fccfbf4717e037e5b83a9c9a9d7a5ccfe810a3071
Status: Downloaded newer image for nvidia/cuda:9.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused "process_linux.go:413: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=13505 /var/lib/docker/overlay2/88efd338b570e5d09dd7f18fd0b508f962b0060bcdbb448a538b43f9b0b50b66/merged]\\nnvidia-container-cli: initialization error: cuda error: invalid device ordinal\\n\""": unknown.

Narven · 2019-08-20T21:53:36Z

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

rlam3 · 2019-11-25T12:36:17Z

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

I'm using minikube witht he same issue. I tried ssh into minikube and tried this command bust was denied even with sudo. Any workarounds here that you would recommend?
Thanks!

ommmid · 2020-03-28T18:36:12Z

Most of this errors that I found are related to bad mounting of volumes.

in my case I was mounting a file to a folder

- ./kibana/config.yml:/usr/share/kibana/config/

I was sharing folders and forgot the slash at the end

Mahanotrahul · 2020-10-06T15:08:11Z

It turns out not a problem of docker.
The machine was doing a lot of file reading work. So most of the memories are consumed by page cache, which I can tell from the free -m command.

After I periodically clean the page cache by running echo 1 > /proc/sys/vm/drop_caches, the problem disappears.

Can anyone please help me how do I stop cleaning cache once after running this echo 1 > /proc/sys/vm/drop_caches

cyphar closed this as completed Nov 14, 2018

arjunsbabu mentioned this issue May 24, 2019

Issue during docker build in microclimate jenkins pipeline microclimate-dev2ops/microclimate-dev2ops.github.io#78

Open

panpan0000 mentioned this issue Oct 12, 2019

kube-keepalived-vip sutck in RunContainerError state aledbf/kube-keepalived-vip#104

Open

kevinszuchet mentioned this issue Feb 28, 2020

No puedo levantar el container de configmap-reload del alertmanager pablokbs/peladonerd#54

Closed

BenTheElder mentioned this issue Jul 14, 2020

kind create cluster failing kubernetes-sigs/kind#1730

Closed

githubsaturn mentioned this issue Aug 23, 2020

how can i fix this caprover/caprover#805

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

cui-liqiang commented Feb 24, 2018

teddyking commented Feb 25, 2018

frezbo commented Feb 25, 2018

cui-liqiang commented Feb 26, 2018

cui-liqiang commented Feb 26, 2018

frezbo commented Feb 26, 2018 •

edited

Loading

cui-liqiang commented Feb 26, 2018

frezbo commented Feb 26, 2018

cyphar commented Feb 26, 2018 •

edited

Loading

cui-liqiang commented Mar 5, 2018

cyphar commented Mar 5, 2018 •

edited

Loading

cui-liqiang commented Mar 5, 2018

cyphar commented Mar 5, 2018

cui-liqiang commented Mar 5, 2018 •

edited

Loading

vikaschoudhary16 commented Mar 7, 2018

cui-liqiang commented Apr 4, 2018

xyfigo commented Nov 9, 2018

arukompas commented Dec 17, 2018 •

edited

Loading

andreaswolf commented Feb 5, 2019

wadeholler commented Apr 18, 2019

jeschaF commented Jun 28, 2019 •

edited

Loading

wadeholler commented Jun 28, 2019

canerK commented Jun 28, 2019

Narven commented Aug 20, 2019

rlam3 commented Nov 25, 2019

ommmid commented Mar 28, 2020

Mahanotrahul commented Oct 6, 2020

Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

Got starting container process caused "process_linux.go:301: running exec setns process for init caused \"exit status 40\"": unknown. from time to time #1740

Comments

cui-liqiang commented Feb 24, 2018

teddyking commented Feb 25, 2018

frezbo commented Feb 25, 2018

cui-liqiang commented Feb 26, 2018

cui-liqiang commented Feb 26, 2018

frezbo commented Feb 26, 2018 • edited Loading

cui-liqiang commented Feb 26, 2018

frezbo commented Feb 26, 2018

cyphar commented Feb 26, 2018 • edited Loading

cui-liqiang commented Mar 5, 2018

cyphar commented Mar 5, 2018 • edited Loading

cui-liqiang commented Mar 5, 2018

cyphar commented Mar 5, 2018

cui-liqiang commented Mar 5, 2018 • edited Loading

vikaschoudhary16 commented Mar 7, 2018

cui-liqiang commented Apr 4, 2018

xyfigo commented Nov 9, 2018

arukompas commented Dec 17, 2018 • edited Loading

andreaswolf commented Feb 5, 2019

wadeholler commented Apr 18, 2019

jeschaF commented Jun 28, 2019 • edited Loading

wadeholler commented Jun 28, 2019

canerK commented Jun 28, 2019

Narven commented Aug 20, 2019

rlam3 commented Nov 25, 2019

ommmid commented Mar 28, 2020

Mahanotrahul commented Oct 6, 2020

frezbo commented Feb 26, 2018 •

edited

Loading

cyphar commented Feb 26, 2018 •

edited

Loading

cyphar commented Mar 5, 2018 •

edited

Loading

cui-liqiang commented Mar 5, 2018 •

edited

Loading

arukompas commented Dec 17, 2018 •

edited

Loading

jeschaF commented Jun 28, 2019 •

edited

Loading