Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting containerd leaves dockerd stopped #1155

Closed
2 of 3 tasks
sudo-bmitch opened this issue Dec 1, 2020 · 22 comments
Closed
2 of 3 tasks

Restarting containerd leaves dockerd stopped #1155

sudo-bmitch opened this issue Dec 1, 2020 · 22 comments

Comments

@sudo-bmitch
Copy link

sudo-bmitch commented Dec 1, 2020

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

After a containerd upgrade, the docker daemon should be running.

Actual behavior

If you restart containerd, dockerd remains down.

Steps to reproduce the behavior

root@vm-11:/home/bmitch# systemctl stop containerd

root@vm-11:/home/bmitch# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─override.conf
   Active: inactive (dead) since Tue 2020-12-01 14:08:55 EST; 11s ago
     Docs: https://docs.docker.com
  Process: 467 ExecStart=/usr/bin/dockerd (code=exited, status=0/SUCCESS)
 Main PID: 467 (code=exited, status=0/SUCCESS)

Dec 01 14:07:30 vm-11 dockerd[467]: time="2020-12-01T14:07:30.747469349-05:00" level=info msg="Loading containers: done."
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.281078096-05:00" level=info msg="Docker daemon" commit=633a0ea838 graphdriver(s)=overlay
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.281966418-05:00" level=info msg="Daemon has completed initialization"
Dec 01 14:07:31 vm-11 systemd[1]: Started Docker Application Container Engine.
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.325684247-05:00" level=info msg="API listen on /var/run/docker.sock"
Dec 01 14:08:55 vm-11 systemd[1]: Stopping Docker Application Container Engine...
Dec 01 14:08:55 vm-11 dockerd[467]: time="2020-12-01T14:08:55.080820926-05:00" level=info msg="Processing signal 'terminated'"
Dec 01 14:08:55 vm-11 dockerd[467]: time="2020-12-01T14:08:55.081301227-05:00" level=info msg="Daemon shutdown complete"
Dec 01 14:08:55 vm-11 systemd[1]: docker.service: Succeeded.
Dec 01 14:08:55 vm-11 systemd[1]: Stopped Docker Application Container Engine.

root@vm-11:/home/bmitch# systemctl start containerd

root@vm-11:/home/bmitch# systemctl status docker
● docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─override.conf
   Active: inactive (dead) since Tue 2020-12-01 14:08:55 EST; 22s ago
     Docs: https://docs.docker.com
  Process: 467 ExecStart=/usr/bin/dockerd (code=exited, status=0/SUCCESS)
 Main PID: 467 (code=exited, status=0/SUCCESS)

Dec 01 14:07:30 vm-11 dockerd[467]: time="2020-12-01T14:07:30.747469349-05:00" level=info msg="Loading containers: done."
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.281078096-05:00" level=info msg="Docker daemon" commit=633a0ea838 graphdriver(s)=overlay
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.281966418-05:00" level=info msg="Daemon has completed initialization"
Dec 01 14:07:31 vm-11 systemd[1]: Started Docker Application Container Engine.
Dec 01 14:07:31 vm-11 dockerd[467]: time="2020-12-01T14:07:31.325684247-05:00" level=info msg="API listen on /var/run/docker.sock"
Dec 01 14:08:55 vm-11 systemd[1]: Stopping Docker Application Container Engine...
Dec 01 14:08:55 vm-11 dockerd[467]: time="2020-12-01T14:08:55.080820926-05:00" level=info msg="Processing signal 'terminated'"
Dec 01 14:08:55 vm-11 dockerd[467]: time="2020-12-01T14:08:55.081301227-05:00" level=info msg="Daemon shutdown complete"
Dec 01 14:08:55 vm-11 systemd[1]: docker.service: Succeeded.
Dec 01 14:08:55 vm-11 systemd[1]: Stopped Docker Application Container Engine.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.13
 API version:       1.40
 Go version:        go1.13.15
 Git commit:        4484c46d9d
 Built:             Wed Sep 16 17:02:55 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.13
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       4484c46d9d
  Built:            Wed Sep 16 17:01:25 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.9
  GitCommit:        ea765aba0d05254012b0b9e595e995c09186427f
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 4
  Running: 0
  Paused: 0
  Stopped: 4
 Images: 2
 Server Version: 19.03.13
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ea765aba0d05254012b0b9e595e995c09186427f
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
  userns
 Kernel Version: 4.19.0-5-amd64
 Operating System: Debian GNU/Linux 10 (buster)
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 987.4MiB
 Name: vm-11
 ID: LMZU:DXKC:DD7P:4K45:XPTD:4BKB:DKYU:N2O3:WS2Q:RMPR:6FF2:ZMBW
 Docker Root Dir: /var/lib/docker/231072.231072
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  from_ansible=true
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.)

Above is from a small test VM. I'm posting this after seeing:

The cause from the above I suspect to be an update to containerd without a corresponding update to the docker daemon that would restart both services.

@Sloth-on-meth
Copy link

30+ reports in my reddit thread and friends are talking about it too...

@brandond
Copy link

brandond commented Dec 1, 2020

Isn’t this how systemd unit dependencies work? A depends on B. You stop B, A is also stopped. If you start B, A does not start again automatically.

@Sloth-on-meth
Copy link

Isn’t this how systemd unit dependencies work? A depends on B. You stop B, A is also stopped. If you start B, A does not start again automatically.

I don't know but it shouldn't nuke docker in production wtf. This is a Lts image

@thaJeztah
Copy link
Member

Were these all unattended updates, or also manual updates?

@sudo-bmitch
Copy link
Author

My suspicion is the unattended updates noticed the issue since they run often enough to get only an update to containerd, and when it fails it's overnight leading to a long outage or page.

@thaJeztah
Copy link
Member

I tried updating containerd manually on a machine, and docker was restarted successfully; I recall we received reports in the past where unattended upgrades didn't always result in the same, so was curious if that was the difference here

@chdsbd
Copy link

chdsbd commented Dec 1, 2020

This also happened to me via unattended upgrades on Ubuntu.

These two lines make me think the upgrade somehow killed Docker.

systemd[1]: Starting Daily apt upgrade and clean activities...
systemd[1]: Stopping Docker Application Container Engine...

@sudo-bmitch
Copy link
Author

I'm also failing to recreate this in the lab with manual upgrades and downgrades of containerd.

@Sloth-on-meth @chdsbd can you provide your own output of docker version, docker info, dpkg -l 'docker*' and dpkg -l 'containerd*'. I want to be sure we're debugging the same packages.

@Sloth-on-meth
Copy link

@sudo-bmitch sorry, i updated all my docker installs to the docker repo and uninstalled the Ubuntu repo versions after this happened, someone on reddit said that might prevent this in the future

@vincent-heatseekr
Copy link

Faced the same problem today and back in mid-October. Caused a massive outage back in October but less so today (only because of separate HA mitigations I put in place). Had this issue on AWS EC2 instances running Ubuntu 20.04.

@sudo-bmitch
Copy link
Author

@Sloth-on-meth that confirms my suspicion that this could be a Ubuntu issue rather than a Docker one. I'll leave this issue open for a bit in case others have logs showing it happened with the Docker packages. Otherwise, anyone with the issue should follow up with Canonical. Searching their bug tracker I'm seeing the following: https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1870514

@sudo-bmitch
Copy link
Author

@vincent-heatseekr see the debugging details requested above, and verify whether you are installing the docker packages from the ubuntu or the docker repositories.

@adagari
Copy link

adagari commented Dec 2, 2020

@sudo-bmitch I woke up this morning, on my Ubuntu machine, docker daemon was down. It was installed from Ubuntu repositories. On my Amazon Linux EC2, no issues. Think you may be on to something.

@glader
Copy link

glader commented Dec 2, 2020

@sudo-bmitch the same problem. Output attached.
docker.txt

@devtekve
Copy link

devtekve commented Dec 2, 2020

We experienced the same issue in a corporate environment with production services. None of our docker machines that got updated unattended survived the restart.

@sudo-bmitch
Copy link
Author

@glader thanks. That shows docker installed from the Ubuntu repositories, rather than the upstream Docker ones. Seems consistent with other reports so I'm going to close the issue here since there's nothing for Docker to fix, and they've already provided a solution for users experiencing this issue.

@Sloth-on-meth
Copy link

For anyone interested, the issue is now on the launchpad tracker at CRITICAL priority.

https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1870514

@thaJeztah
Copy link
Member

/cc @chris-crone @glours @RomainBelorgey FYI: this was the issue we were discussing Yesterday

@thaJeztah
Copy link
Member

I see https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1870514 also mentions docker/docker-ce-packaging#508 (I was eyeing that fix as well as possibly related); however

yes we're tracking the upstream changes but are not finding them sufficient to address the issue. We're finding we need more than just that change to BindsTo

I opened a backport for the 19.03 branch to get the BindsTo change in (docker/docker-ce-packaging#511). Might have to check with Ubuntu what other changes they were considering necessary (and if those are in packaging, or in Ubuntu itself).

@thaJeztah
Copy link
Member

Let me ask here in case someone knows; is there a way to manually trigger unattended updates? (also; are they only performed for distro-packages or also third-party package repositories)? If course would be great if we could verify the behaviour of unattended upgrades, but, well, them being "unattended" makes it slightly difficult 😅

@sudo-bmitch
Copy link
Author

@thaJeztah I ran a unattended-upgrade and unattended-upgrades in my lab, at least one of those triggered a log in /var/log/unattended-upgrades/unattended-upgrades.log that indicated:

2020-12-02 00:06:27,414 INFO Allowed origins are: o=Ubuntu,a=bionic, o=Ubuntu,a=bionic-security, o=UbuntuESMApps,a=bionic-apps-security, o=UbuntuESM,a=bionic-infra-security                                                                                                   
2020-12-02 00:06:28,609 INFO No packages found that can be upgraded unattended and no pending auto-removals

So I believe the update must be a security update from the Ubuntu repos to trigger the unattended update. This was with containerd back versioned, but installed from the Docker repos, so the update was available, just not from Ubuntu:

root@vm-12:~# apt-get upgrade                  
Reading package lists... Done
Building dependency tree                                                              
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
  grub-pc-bin libdumbnet1
Use 'apt autoremove' to remove them.
The following packages will be upgraded:                        
  containerd.io                                                
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.          
Need to get 0 B/24.4 MB of archives.                                                
After this operation, 32.8 kB of additional disk space will be used.         
Do you want to continue? [Y/n] n
Abort.                        

@thaJeztah
Copy link
Member

2020-12-02 00:06:27,414 INFO Allowed origins are: o=Ubuntu,a=bionic, o=Ubuntu,a=bionic-security, o=UbuntuESMApps,a=bionic-apps-security, o=UbuntuESM,a=bionic-infra-security                                                                                                   

Thank you, @sudo-bmitch ! That's useful information, so looks indeed that these only are triggered for packages from those repositories.

That makes docker/docker-ce-packaging#511 "good to have", but in practice, users wouldn't run into the same problem if they installed the official docker packages from download.docker.com (not the distro-maintained packages of containerd/docker)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants