Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Docker to 19.03.8 #262

Merged
merged 3 commits into from
May 28, 2020
Merged

Conversation

tuxity
Copy link

@tuxity tuxity commented Feb 3, 2020

As commented in #244 few changes as been made to docker dind.

Per default, Docker daemon now listen on a UNIX AND TCP with TLS sockets, the client also tries to connect on the openned TCP socket (default behavior)

Since we don't need to expose Docker API with a TCP socket, I just started the daemon with UNIX socket and made the client trying to connect using UNIX socket like before.

I double checked docker entrypoint flow for this PR https://github.com/docker-library/docker/blob/master/19.03/dind/dockerd-entrypoint.sh#L129#L132 and since we start using drone-docker binary, almost all of this entrypoint code is useless

The only doubt I have is, I needed to run my test image tuxity/drone-docker with privileged: true in order to correctly start the docker daemon. I guess it's because it's not an official image

FROM docker:18.09.0-dind
FROM docker:19.03.5-dind

ENV DOCKER_HOST=unix:///var/run/docker.sock
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is required? The command execution is not loading env variables, so I think this change can simply get reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it won't work if you revert this, docker client will try to connect on default value of DOCKER_HOST tcp://docker:2375 otherwise

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the entrypoint calling dockerd-entrypoint.sh which calls docker-entrypoint.sh which sets DOCKER_HOST when unset. I don't think there's anything needed from those entrypoint scripts for drone so we could change the entrypoint to:

ENTRYPOINT ["/bin/drone-docker"]

and then remove the ENV DOCKER_HOST for the same result. There's also no need for --host when starting dockerd.

@tuxity
Copy link
Author

tuxity commented Feb 3, 2020

This docker update will also fix a docker daemon crash I have on production when drone is heavily loaded.

time="2020-02-03T14:06:24.709320063Z" level=error msg="failed connecting to containerd" error="failed to dial \"/var/run/docker/containerd/containerd.sock\": context deadline exceeded" module=libcontainerd
time="2020-02-03T14:06:24.809534997Z" level=info msg="killing and restarting containerd" module=libcontainerd pid=32
time="2020-02-03T14:06:24Z" level=info msg="=== BEGIN goroutine stack dump ===
goroutine 91 [running]:
github.com/containerd/containerd/cmd/containerd/command.dumpStacks()
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/cmd/containerd/command/main_unix.go:78 +0x8c
github.com/containerd/containerd/cmd/containerd/command.handleSignals.func1(0xc42052c4e0, 0xc42052c480, 0x1818c40, 0xc420044068, 0xc4200489c0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/cmd/containerd/command/main_unix.go:53 +0x274
created by github.com/containerd/containerd/cmd/containerd/command.handleSignals
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/cmd/containerd/command/main_unix.go:43 +0x8b

goroutine 1 [syscall]:
syscall.Syscall(0x4b, 0x3, 0x0, 0x0, 0x643c44, 0xc420372080, 0xc4202d6000)
	/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Fdatasync(0x3, 0x3, 0x0)
	/usr/local/go/src/syscall/zsyscall_linux_amd64.go:446 +0x42
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.fdatasync(0xc4202681e0, 0x4000, 0x4000)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/bolt_linux.go:9 +0x3f
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.(*DB).init(0xc4202681e0, 0x0, 0xc4205516c0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/db.go:382 +0x1bd
github.com/containerd/containerd/vendor/github.com/boltdb/bolt.Open(0xc420533810, 0x48, 0x1a4, 0x1f55220, 0xc42054db88, 0x1, 0x0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/boltdb/bolt/db.go:199 +0x20b
github.com/containerd/containerd/server.LoadPlugins.func2(0xc42052b260, 0xc42024fdd0, 0x21, 0xc420040d20, 0x1e)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/server/server.go:250 +0x49d
github.com/containerd/containerd/plugin.(*Registration).Init(0xc420370190, 0xc42052b260, 0xc420370190)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/plugin/plugin.go:98 +0x3a
github.com/containerd/containerd/server.New(0x7f3765e1a178, 0xc420044068, 0xc420473440, 0x1, 0xc420527c80, 0x18)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/server/server.go:106 +0x600
github.com/containerd/containerd/cmd/containerd/command.App.func1(0xc420486160, 0xc420486160, 0xc420527d07)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/cmd/containerd/command/main.go:132 +0x5fb
github.com/containerd/containerd/vendor/github.com/urfave/cli.HandleAction(0x161c0a0, 0x17f7398, 0xc420486160, 0xc42052c420, 0x0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:502 +0xca
github.com/containerd/containerd/vendor/github.com/urfave/cli.(*App).Run(0xc42004d340, 0xc42003a140, 0x5, 0x5, 0x0, 0x0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/urfave/cli/app.go:268 +0x60e
main.main()
	github.com/containerd/containerd/cmd/containerd/main.go:28 +0x51

goroutine 5 [chan receive]:
github.com/containerd/containerd/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0x1f30620)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/golang/glog/glog.go:879 +0x8d
created by github.com/containerd/containerd/vendor/github.com/golang/glog.init.0
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/golang/glog/glog.go:410 +0x205

goroutine 6 [syscall]:
os/signal.signal_recv(0x1808600)
	/usr/local/go/src/runtime/sigqueue.go:139 +0xa8
os/signal.loop()
	/usr/local/go/src/os/signal/signal_unix.go:22 +0x24
created by os/signal.init.0
	/usr/local/go/src/os/signal/signal_unix.go:28 +0x43

goroutine 92 [select, locked to thread]:
runtime.gopark(0x17f9c30, 0x0, 0x10b0d56, 0x6, 0x18, 0x1)
	/usr/local/go/src/runtime/proc.go:291 +0x120
runtime.selectgo(0xc4204c3750, 0xc420048a80)
	/usr/local/go/src/runtime/select.go:392 +0xe56
runtime.ensureSigM.func1()
	/usr/local/go/src/runtime/signal_unix.go:549 +0x1f6
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1

goroutine 98 [select]:
github.com/containerd/containerd/vendor/github.com/docker/go-events.(*Broadcaster).run(0xc4203701e0)
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:117 +0x3c4
created by github.com/containerd/containerd/vendor/github.com/docker/go-events.NewBroadcaster
	/tmp/tmp.TCVt41ZT6h/src/github.com/containerd/containerd/vendor/github.com/docker/go-events/broadcast.go:39 +0x1b1

=== END goroutine stack dump ===" 
time="2020-02-03T14:06:28.791093430Z" level=error msg="containerd did not exit successfully" error="signal: killed" module=libcontainerd
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1522e20]

goroutine 96 [running]:
github.com/docker/docker/vendor/github.com/containerd/containerd.(*Client).Close(0x0, 0x0, 0x0)
	/go/src/github.com/docker/docker/vendor/github.com/containerd/containerd/client.go:536 +0x30
github.com/docker/docker/libcontainerd/supervisor.(*remote).monitorDaemon(0xc4208b8680, 0x26a15e0, 0xc4206ef1c0)
	/go/src/github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:321 +0x262
created by github.com/docker/docker/libcontainerd/supervisor.Start
	/go/src/github.com/docker/docker/libcontainerd/supervisor/remote_daemon.go:90 +0x3fa
+ /usr/local/bin/docker version

Crash fixed in moby/moby#38653 and backport here docker-archive/engine#162 released with version 18.09.3 and we use 18.09.0

@tuxity
Copy link
Author

tuxity commented Feb 5, 2020

3 days I'm using my custom image of this PR on production server with ~20/30 builds/day and so far so good!

@sudo-bmitch
Copy link

Here's an alternative that doesn't require setting DOCKER_HOST. I didn't see anything in the docker*-entrypoint.sh scripts that appeared to be needed, so I removed that.

sudo-bmitch@1c70a16

Let me know if I should sent that over as a PR.

@ashwilliams1
Copy link

ashwilliams1 commented Feb 9, 2020

@sudo-bmitch the dockerd-entrypoint.sh file is important because it executes a bunch of commands [1] that are a per-requisite for running docker-in-docker. At least this used to be the case, not sure about newer versions of Docker. So unless something changed I would expect the docker-in-docker daemon to fail which would break the plugin.

[1] https://github.com/docker-library/docker/blob/master/19.03/dind/dockerd-entrypoint.sh

@bradrydzewski
Copy link
Member

This change looks good to me. Before merging we should tag 18.09 so that we can rollback if needed.

@ashwilliams1 @sudo-bmitch I am not sure if the docker-entrypoint file is still required. It used to run a bunch of commands required to bootstrap the container so that it could run docker-in-docker (mount various filesystems, configure cgroups, etc). This may no longer be the case. Before changing the entrypoint we would need to see published test results that would need to be reproduced and verified by others. I am not opposed and it would be great if we no longer needed this shell script, but at the same time I prefer to take a very conservative approach with this plugin given its widespread use.

@sudo-bmitch
Copy link

I don't see anything in the entrypoint scripts even being run with our default arguments, most every line is bypassed since our entrypoint arg is not a flag or docker command. Here's a run with the files modified with set -x:

$ docker run -it --rm --entrypoint /bin/sh --privileged docker:19.03.5-dind

/ # set -x

/ # vi /usr/local/bin/docker*.sh
+ vi /usr/local/bin/docker-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh

/ # /usr/local/bin/dockerd-entrypoint.sh /bin/true
+ /usr/local/bin/dockerd-entrypoint.sh /bin/true
+ '[' 1 -eq 0 ]
+ '[' /bin/true '!=' /bin/true ]
+ '[' /bin/true '=' dockerd ]
+ set -- docker-entrypoint.sh /bin/true
+ exec docker-entrypoint.sh /bin/true
+ '[' /bin/true '!=' /bin/true ]
+ docker help /bin/true
+ '[' -z  ]
+ '[' -S /var/run/docker.sock ]
+ '[' -z  ]
+ id -u
+ XDG_RUNTIME_DIR=/run/user/0
+ '[' -S /run/user/0/docker.sock ]
+ '[' -z  ]
+ _should_tls
+ '[' -n /certs ]
+ '[' -s /certs/client/ca.pem ]
+ '[' -n  ]
+ export 'DOCKER_HOST=tcp://docker:2375'
+ '[' //docker:2375 '!=' tcp://docker:2375 ]
+ '[' -z  ]
+ '[' -z  ]
+ _should_tls
+ '[' -n /certs ]
+ '[' -s /certs/client/ca.pem ]
+ '[' /bin/true '=' dockerd ]
+ exec /bin/true

The XDG_RUNTIME_DIR is a new feature for rootless support. And the DOCKER_HOST we have to unset because it breaks things. I'm running my variant without issue, so count me as a single use case.

@tuxity FYI, I've also set DRONE_RUNNER_PRIVILEGED_IMAGES on my drone server to avoid needing to define privileged on my builds.

@bradrydzewski
Copy link
Member

@sudo-bmitch I think you are right but we still need people to test the change and verify the results. There are thousands of active installations using the latest version of this plugin, so we are extra careful with changes, even those that seem obvious.

@tuxity
Copy link
Author

tuxity commented Feb 9, 2020

There is multiple ways to do it, I choose the way the closer of the actual flow to avoid any issues.
Later we can make some changes to improve it.

I'm running this since 1 week now on production and no problems detected

@ashwilliams1
Copy link

ashwilliams1 commented Feb 9, 2020

In this case I recommend we merge this pull request once we have a tag in place for 18.09. Separately I recommend @sudo-bmitch send a pull request that changes the entrypoint. We can ask the community to test the pull request and merge once these changes are tested and verified.

@tuxity
Copy link
Author

tuxity commented Feb 10, 2020

I agree with @ashwilliams1, first getting an updated docker image, then improve the flow. Not both at the same time, it's too risky.

Should I submit another PR using docker 18.09.8 then @bradrydzewski tag it ? And after that I can rebase this PR

@tuxity
Copy link
Author

tuxity commented Feb 29, 2020

Some news, everything works fine after 21 days using my custom image with the changes at work. Sometimes containerd fail to start but I think it’s another issue since I already had this one before.

@kradalby
Copy link

kradalby commented Mar 22, 2020

Any update on this?

Edit:
I tried @tuxity image from dockerhub, and when running drone exec on my mac I get:

> drone exec                                                                                                                                ❄️  [ terra-prod ]
[build:0] + /usr/local/bin/dockerd --data-root /var/lib/docker --host=unix:///var/run/docker.sock --dns 1.1.1.1
[build:1] Registry credentials not provided. Guest mode enabled.
[build:2] + /usr/local/bin/docker version
[build:3] Client: Docker Engine - Community
[build:4]  Version:           19.03.5
[build:5]  API version:       1.40
[build:6]  Go version:        go1.12.12
[build:7]  Git commit:        633a0ea838
[build:8]  Built:             Wed Nov 13 07:22:05 2019
[build:9]  OS/Arch:           linux/amd64
[build:10]  Experimental:      false
[build:11] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[build:12] exit status 1
2020/03/22 09:31:32 build : exit code 1

The same happens on my Drone setup with kubernetes runners.

@sudo-bmitch
Copy link

@kradalby I've been running my own version for months without issue (not counting an unrelated self inflicted TLS certificate expiration). Make sure you run the image as privileged since those we are testing are not in the whitelist by default.

@tuxity
Copy link
Author

tuxity commented Mar 24, 2020

Still no issues on my side since I made the PR.

@kradalby you need to add the privileged param like this:

image: tuxity/drone-docker
privileged: true

args := []string{"--data-root", daemon.StoragePath}
args := []string{
"--data-root", daemon.StoragePath,
"--host=unix:///var/run/docker.sock",
Copy link
Member

@bradrydzewski bradrydzewski Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard-coding the unix socket would break the plugin for windows. Are we sure this change is necessary since DOCKER_HOST is being set globally as an image environment variable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other solution would be to listen on TCP socket without TLS, like this tcp://0.0.0.0:2375.
Should I make the change ?

@tuxity tuxity changed the title Update Docker to 19.05.03 Update Docker to 19.03.8 May 2, 2020
@tuxity
Copy link
Author

tuxity commented May 2, 2020

Switched from Unix socket without TLS to TCP socket without TLS to avoid messing with Windows version of the plugin.

Also rebase from latest version of master branch, update docker dind from 19.03.5 to 19.03.8

Should I squash commits?

@bradrydzewski
Copy link
Member

using TCP is not an option for this plugin, however, we should be able to solve this by delegating the command to the OS-specific files:

https://github.com/drone-plugins/drone-docker/blob/master/daemon_win.go
https://github.com/drone-plugins/drone-docker/blob/master/daemon.go

@tuxity
Copy link
Author

tuxity commented May 5, 2020

Ok, I will revert back to socket.

But I have no idea how the plugin handle on Windows. It doesn't seems to have a dockerd so idk which path pass it as argument.

@tuxity
Copy link
Author

tuxity commented May 12, 2020

After reviewing code, I don't understand how it will break Windows version, I only changed Linux Dockerfiles and Windows Dockerfiles already have an up to date version.

Only daemon.go call function commandDaemon where I set the socket, daemon_win.go don't call anything.

@SPFZ
Copy link

SPFZ commented May 25, 2020

We would really like to see this fix happening because we hit this issue quite a lot.

@sudo-bmitch
Copy link

@SPFZ does either this PR, or the one I linked, work for you? I believe the drone maintainers are looking for feedback from the community before approving.

@bradrydzewski bradrydzewski merged commit 675553c into drone-plugins:master May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants