-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
live-restore does not work on latest version 18.09.1 #556
Comments
ping @crosbymichael @seemethere - I think this might be related to socket activation. (as I recall it, live-restore was the original reason for removing it in the RPM packages) /cc @andrewhsu |
Actually .. wondering if it's a race condition between the containerd service starting, and the dockerd daemon starting it as a child process; I had another issue mentioning that 🤔 |
@thaJeztah Thanks, I just test manually start |
I think this happens now because the docker service depends on the containerd service or has an After= on the containerd service. |
I think that if we add an |
This PR should fix the issue docker/docker-ce-packaging#290 It's going into the |
This doesn't seem to reliably fix the issue for me. I added containerd.service to the After= line in docker.service, but I'm still intermittently seeing the behavior reported above. Docker reports that the container has exited with code 255, but (some) processes formerly associated with the container are still running. When this happens to me, the entrypoint (a bash wrapper to an executable that sets paths and such) actually does die in the service restart, but the executable that it calls is still running. I'm using the google cloud ubuntu xenial image, with docker version 18.09.1 build 4c52b90 |
It's not yet in the 18.09.1 packages (opened a backport in docker/docker-ce-packaging#294) However, you can use an override file to add the change (without modifying the original The easiest way to create an override file is using
|
Ok, I did modify the docker.service file directly to match the referenced PR above, and held updates to the docker-ce package in dpkg to prevent any conflicts while I waited for the backport. To be clear, it does seem to be generally better, but once (out of 10s of service restarts across multiple hosts) I saw it lose track of all containers again when the host system was under high load. It seemed possible to me that some less-likely race condition might still exist, but I will change my approach and keep an eye on it. |
Just confirming that I do still see this issue happen sporadically after restoring the original docker.service and creating the override as you described. |
Ok, so there's one more option to prevent any possible race condition. The daemon has a So, there's two approaches to configure this. Generally, the You cannot use both methods - you need to pick one. Setting the option both as a daemon flag ( Option 1 - using the
|
I pushed these changes to 20+ VMs through the systemd override file, and so far (~48h), I've had no cases of containers getting lost by docker during service restarts. The |
Thanks @buck2202 @seemethere ^^ looks like we may want to consider using the |
Just checking back in--there were zero issues in ~1.5 weeks of continuous use across ~30 VMs with both Daemon restarts would have been occurring 1-2 times/hour on each VM, and I had previously setup scripts to log all instances where docker reported a container had exited but the container's executable was still active. |
Thanks for testing! I opened docker/docker-ce-packaging#297 and docker/docker-ce-packaging#298 for consideration |
Sorry if this has been covered by @buck2202 already, however i am testing docker v. 18.09.2 with an interactive bash container and when i run systemctl restart docker, bash exits with an error message*, even though docker ps reporting the container never exited. I have used the patch per @thaJeztah instructions** We do have a rare and random issue in our production line, that docker unix socket is hanging and receive UnixSocket time outs from our orchestrator, and docker daemon restart is required. However, we need all of our containers keep running during this procedure. Thanks *ERRO[0023] error waiting for container: unexpected EOF |
I'm not sure actually if interactive containers are supported during live-restore; I seem to recall that wasn't supported, but I don't see a mention in the docs (https://docs.docker.com/config/containers/live-restore/). Did that work with older versions? @crosbymichael might know from the top of his head; that looks to be a separate issue though |
@thaJeztah, I realised container (upon docker restart) exits only when running a docker interactive command bash/sh execution (docker exec -it). So, all main processes are still running after the restart so there is no issue related with the ticket (my bad for this). However, our main problem is still there as even a docker daemon restart doesn't help the hanging API Client caused by a hanging container which even after a daemon restart fails to respond on everything (stop, kill etc) , very buggy situation. I ll try to find a more suitable ticket or create a new one. |
Issue still exists, i am testing with docker18.09.1 in centos 7.7. Even after setting containerd in docker.service as well as puting After=containerd.service in systemd file. live restore is not working as expected. Is there any expected caveat. Also in which point release this issue is fixed |
This issue is still present in the latest CE version:
@seemethere should I open a new bug report for this, or can you reopen this one? |
Expected behavior
Docker don't stop containers when restart docker daemon on "live-restore" is open.
Actual behavior
Docker stop the containers when restart docker daemon.
Steps to reproduce the behavior
systemctl restart docker
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
AlibabaCloud
Actually, the process of container is not stopped, but dockerd/containerd can not take over it.
Then I create a container after restart. The container's
containerd-shim
arguments is not same to the first container.The workdir diff of two
containerd-shim
:/var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
/var/lib/containerd/io.containerd.runtime.v1.linux/...
The text was updated successfully, but these errors were encountered: