-
Notifications
You must be signed in to change notification settings - Fork 2k
docker daemon fails to start with Nvidia runtime container #761
Comments
Check docker status
Le sam. 9 juin 2018 à 10:38, ctxrag <notifications@github.com> a écrit :
… 1. Issue or feature description
My setup was working fine, and suddenly docker stopped working.
2. Steps to reproduce the issue
just did "sudo apt-get update"
My files look like this now:
***@***.***:$ sudo cat /etc/systemd/system/docker.service.d/override.conf
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd://
--add-runtime=nvidia=/usr/bin/nvidia-container-runtime
***@***.***:$ sudo cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
{
“dns”: [“172.20.130.181”]
}
3. Information to attach
<https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/>
(optional if deemed irrelevant)
- Kernel version from uname -a
***@***.***:$ uname -a
Linux dtlu16 **4.13.0-43-generic #48
<#48>16.04.1-Ubuntu** SMP
Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
***@***.***:~$
***@***.***:$ systemctl daemon-reload
==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: dtlu,,, (dtlu)
Password:
==== AUTHENTICATION COMPLETE ===
***@***.***:$ sudo service docker status
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor
preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: inactive (dead) (Result: exit-code) since Sat 2018-06-09 01:16:10
PDT; 21min ago
Docs: https://docs.docker.com
Main PID: 2299 (code=exited, status=1/FAILURE)
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application
Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Unit entered failed
state.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Failed with result
'exit-code'.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Service hold-off time
over, scheduling restart.
Jun 09 01:16:10 dtlu16 systemd[1]: Stopped Docker Application Container
Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Start request repeated
too quickly.
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application
Container Engine.
***@***.***:~$
sudo docker version
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the
docker daemon running?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#761>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKeVupdUsDw_TUQWKzATVULsx6UOlGRMks5t64mRgaJpZM4UhPW7>
.
|
#749 (comment) |
Make sure to |
I had a similar issue today after having to reboot my computer. uname -a systemctl status nvidia-docker Jun 12 10:04:57 pcvp19 systemd[1]: Failed to start Docker Application Container Engine. systemctl daemon-reload and systemctl reload docker left the following message. |
Did you edit the |
No. I was not even aware of override.conf until looking at the error message. I assume the file got created as part of the nvidia-docker install? Here is override.conf contents: |
@artificialbrains this is weird, did you setup the machine yourself? |
I was given the machine a while ago, but have been maintaining since. Here is what I got from running the dpkg command: sudo dpkg -S /etc/systemd/system/docker.service.d/override.conf However, dpkg command may have been affect by the fact that I just tried to remove nvidia-docker2 using sudo apt-get update and sudo apt-get remove nvidia-docker2. I want to uninstall and reinstall nvidia-docker2 to see if the problem gets fixed. |
The following actions fixed the issue for me. I uninstalled nvidia-docker using: Rebooted the computer. Installed nvidia-docker2 |
Sadly, I still have the same issue with override.conf. I pushed my luck and rebooted my machine to find docker fails to start because of override.conf. If I removed override.conf, docker and nvidia-docker services start correctly; however, the nvidia-docker fails to open previously constructed docker containers. As a temporary solution, I removed nvidia-docker2, rebooted my machine, and reinstalled nvidia-docker2. This will work till I have to reboot my machine again. |
The |
Thanks! That fixed the problem. I am now able to reboot the machine and nvidia-docker/docker both function properly. |
I had the same problem and the fix proposed by @flx42 worked for me! |
Closing this issue for now, but I'm surprised so many people are facing this issue. I'm wondering if another package conflicts with our settings. |
hello,i meet this problem
hello, i met this problem now. but in my override.conf i dont find --add-runtime part. and when i use "sudo service restart docker" then error occured. my docker version is 24.0.6 and i install nvidia-docker2 i need your help, thank you |
1. Issue or feature description
My setup was working fine, and suddenly docker stopped working.
2. Steps to reproduce the issue
just did "sudo apt-get update"
My files look like this now:
dtlu@dtlu16:
$ sudo cat /etc/systemd/system/docker.service.d/override.conf$ sudo cat /etc/docker/daemon.json[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
dtlu@dtlu16:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
{
“dns”: [“172.20.130.181”]
}
3. Information to attach (optional if deemed irrelevant)
uname -a
dtlu@dtlu16:
$ uname -a16.04.1-Ubuntu** SMP Thu May 17 12:56:46 UTC 2018 x86_64 x86_64 x86_64 GNU/LinuxLinux dtlu16 **4.13.0-43-generic Add support for cross-device volumes #48
dtlu@dtlu16:~$
dtlu@dtlu16:
$ systemctl daemon-reload$ sudo service docker status==== AUTHENTICATING FOR org.freedesktop.systemd1.reload-daemon ===
Authentication is required to reload the systemd state.
Authenticating as: dtlu,,, (dtlu)
Password:
==== AUTHENTICATION COMPLETE ===
dtlu@dtlu16:
● docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/docker.service.d
└─override.conf
Active: inactive (dead) (Result: exit-code) since Sat 2018-06-09 01:16:10 PDT; 21min ago
Docs: https://docs.docker.com
Main PID: 2299 (code=exited, status=1/FAILURE)
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Unit entered failed state.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Failed with result 'exit-code'.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Service hold-off time over, scheduling restart.
Jun 09 01:16:10 dtlu16 systemd[1]: Stopped Docker Application Container Engine.
Jun 09 01:16:10 dtlu16 systemd[1]: docker.service: Start request repeated too quickly.
Jun 09 01:16:10 dtlu16 systemd[1]: Failed to start Docker Application Container Engine.
dtlu@dtlu16:~$
sudo docker version
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:17:20 2018
OS/Arch: linux/amd64
Experimental: false
Orchestrator: swarm
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
The text was updated successfully, but these errors were encountered: