Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[20.10.2] start docker failed #1185

Closed
1 of 3 tasks
herugen opened this issue Jan 14, 2021 · 7 comments · Fixed by moby/moby#41854
Closed
1 of 3 tasks

[20.10.2] start docker failed #1185

herugen opened this issue Jan 14, 2021 · 7 comments · Fixed by moby/moby#41854

Comments

@herugen
Copy link

herugen commented Jan 14, 2021

  • This is a bug report
  • This is a feature request
  • I searched existing issues before opening this one

Expected behavior

Dockerd start succeed

Actual behavior

Dockerd start failed with a segmentfault.

Jan 14 21:49:17 VM-0-15-centos systemd[1]: Starting Docker Application Container Engine...
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.899611518+08:00" level=info msg="Starting up"
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.900790943+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.900813286+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.900835249+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.900853888+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.902203356+08:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.902222268+08:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.902235744+08:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.902243447+08:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.952199129+08:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Jan 14 21:49:17 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:17.980230802+08:00" level=info msg="Loading containers: start."
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:18.261333845+08:00" level=info msg="Removing stale sandbox 05f9293186e85bbbdf35ba7d5da1b7021b8f04137ff1815ce01a7b4ab7f5d9d0 (970f41c848cb7c56ac1b534efbe55e36e6f23e2c1143d135cd0cd28fb1f5ce7f)"
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:18.423608080+08:00" level=info msg="Removing stale sandbox a41c9987b57e6d2f821258a39c3817567a6234a940b09a4158db588209b4e164 (da51e38b3ac35e1679eaecf8c8595e66d1a08843fc4a253cc0f15985bef37214)"
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:18.430171958+08:00" level=info msg="Removing stale endpoint harbor-core (7d967f0b1913f226596b758cf7869e5a8091fa7b1be81f5ebd146523fb50ef36)"
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:18.452293440+08:00" level=info msg="Removing stale endpoint registryctl (c6bb66d9e506156d1e429661ea6f3ced6bf5a1a2951c0be1d3983c20282197ef)"
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: time="2021-01-14T21:49:18.513936138+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.18.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: panic: runtime error: invalid memory address or nil pointer dereference
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x55cdb6d2ec8c]
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: goroutine 275 [running]:
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: github.com/docker/docker/daemon.(*Daemon).getLibcontainerdCreateOptions(0xc00000c1e0, 0xc000742280, 0x0, 0x0, 0x0, 0xc0009b2600, 0xc000bf0870, 0x55cdb54a468d)
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/start_unix.go:22 +0x8c
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: github.com/docker/docker/daemon.(*Daemon).containerStart(0xc00000c1e0, 0xc000742280, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0)
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/start.go:174 +0x485
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: github.com/docker/docker/daemon.(*Daemon).restore.func5(0xc000284140, 0xc00000c1e0, 0xc0008a9050, 0xc0003e4680, 0xc000742280, 0xc000c67b60)
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/daemon.go:537 +0x33d
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: created by github.com/docker/docker/daemon.(*Daemon).restore
Jan 14 21:49:18 VM-0-15-centos dockerd[32123]: /root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/daemon.go:513 +0x72f
Jan 14 21:49:18 VM-0-15-centos systemd[1]: docker.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jan 14 21:49:18 VM-0-15-centos systemd[1]: Failed to start Docker Application Container Engine.
Jan 14 21:49:18 VM-0-15-centos systemd[1]: Unit docker.service entered failed state.
Jan 14 21:49:18 VM-0-15-centos systemd[1]: docker.service failed.

Steps to reproduce the behavior

yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine
yum install -y yum-utils
yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
yum install -y docker-ce docker-ce-cli containerd.io
systemctl start docker

Output of docker version:

docker -v

Docker version 20.10.2, build 2291f61

Output of docker info:

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info

Additional environment details (AWS, VirtualBox, physical, etc.)

# cat /proc/version
Linux version 3.10.0-1127.19.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Aug 25 17:23:54 UTC 2020
@thaJeztah
Copy link
Member

Thanks for reporting; was this a fresh install, or did you have an older version of docker installed before this?

Do you have a daemon configuration file with custom options? (/etc/docker/daemon.json); if so; could you post the contents of that file? (removing sensitive data if set)

@thaJeztah
Copy link
Member

I think this is a duplicate of #1169, and will be fixed by moby/moby#41854

@herugen
Copy link
Author

herugen commented Jan 14, 2021

Thanks for reporting; was this a fresh install, or did you have an older version of docker installed before this?

Do you have a daemon configuration file with custom options? (/etc/docker/daemon.json); if so; could you post the contents of that file? (removing sensitive data if set)

There's an old version of 1.13.1, and I remove this old version via yum remove xxx
yum remove docker \ docker-client \ docker-client-latest \ docker-common \ docker-latest \ docker-latest-logrotate \ docker-logrotate \ docker-engine

And I have clear /etc/docker/daemon.json and try it again and failed either:
# cat /etc/docker/daemon.conf { }

@herugen
Copy link
Author

herugen commented Jan 14, 2021

I think this is a duplicate of #1169, and will be fixed by moby/moby#41854

Please note that the coredump stack is different between this two issues

@thaJeztah
Copy link
Member

Please note that the coredump stack is different between this two issues

Yes, correct, the stack is different; the panic is in the second line here; https://github.com/moby/moby/blob/v20.10.2/daemon/start_unix.go#L21-L22

rt := daemon.configStore.GetRuntime(container.HostConfig.Runtime)
if rt.Shim == nil {

At but at least the panic should be fixed by the getRuntime function that is added in that pull request;

func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) {
 	rt := daemon.configStore.GetRuntime(name)
 	if rt == nil {
 		return nil, errdefs.InvalidParameter(errors.Errorf("runtime not found in config: %s", name))
 	}

Question is, why it doesn't find a runtime; what happens if you remove the /etc/docker/daemon.json file? Is it possible that there's modifications in the systemd unit files? What does systemctl cat docker.service show?

Note that the previously installed docker 1.13.1 daemon was the Red Hat fork of docker; that fork is known to have patches that are incompatible with the official Docker version, so there is a chance that there's state files in /var/lib/docker that cause the issue (if you didn't have important data (volumes, containers, images) in the existing install, you could try a "factory reset", and remove /var/lib/docker (but beware that that removes all docker containers, volumes, images, etc)

@herugen
Copy link
Author

herugen commented Jan 14, 2021

Please note that the coredump stack is different between this two issues

Yes, correct, the stack is different; the panic is in the second line here; https://github.com/moby/moby/blob/v20.10.2/daemon/start_unix.go#L21-L22

rt := daemon.configStore.GetRuntime(container.HostConfig.Runtime)
if rt.Shim == nil {

At but at least the panic should be fixed by the getRuntime function that is added in that pull request;

func (daemon *Daemon) getRuntime(name string) (*types.Runtime, error) {
 	rt := daemon.configStore.GetRuntime(name)
 	if rt == nil {
 		return nil, errdefs.InvalidParameter(errors.Errorf("runtime not found in config: %s", name))
 	}

Question is, why it doesn't find a runtime; what happens if you remove the /etc/docker/daemon.json file? Is it possible that there's modifications in the systemd unit files? What does systemctl cat docker.service show?

Note that the previously installed docker 1.13.1 daemon was the Red Hat fork of docker; that fork is known to have patches that are incompatible with the official Docker version, so there is a chance that there's state files in /var/lib/docker that cause the issue (if you didn't have important data (volumes, containers, images) in the existing install, you could try a "factory reset", and remove /var/lib/docker (but beware that that removes all docker containers, volumes, images, etc)

It started succeed after mv /var/lib/docker /var/lib/docker.bk. Thanks a lot!

@herugen herugen closed this as completed Jan 14, 2021
@thaJeztah
Copy link
Member

You're welcome! If you manage to figure out what exactly caused the issue, let us know as well, because then we could possibly include it in a unit/integration-test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants