-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker fails to start after reboot when a system container with a restart policy exists #184
Comments
Thanks @jfmontanaro for giving Sysbox a shot and filing this issue. I've not come across this, but will reproduce and get back to you tomorrow. |
Hi @jfmontanaro, I can see there is a problem, though I am not able to repro the exact behavior you describe. On my ubuntu focal machine, I tried the steps to reproduce that you mentioned:
After the reboot, I can see that Docker fails to restart the container automatically:
The reason for this is that the Sysbox service has not yet started. This is a bug on our installer: we need to fix the Sysbox systemd unit file to ensure Docker starts after Sysbox is ready. But Docker itself starts fine (it's not hanging in the
I tried the reboot sequence several times, and it was always the same. Also, I was not able to reproduce the shutdown problem you reported. In my case the shutdown proceeded without any problem in all of the several times I tried. Let me work on fixing the Sysbox systemd unit file. But I am still curious why you are seeing the problems on shutdown and with Docker activating after reboot, yet I can't repro those. |
@ctalledo Hmm, that's very interesting. I have a VM running that is consistently exhibiting the behavior, would you like access to it to test? Or are you satisfied that this is probably due to the order in which services start and will go from there? |
Let me first come up with a fix for the Sysbox systemd service, so you can apply it and see if the problem goes away. If it does not, then I would certainly appreciate getting access to the machine to investigate. Thanks! |
@jfmontanaro : FYI, I was able to reproduce the docker hang too, by playing a bit with the timing of the start of the sysbox service vs. that of the docker service during the machine boot. Will take a deeper look at it tomorrow. Thanks again for reporting the issue! |
@jfmontanaro: found the root cause of the hang: it's a cyclic dependency between Docker and Sysbox. When the machine is rebooted, systemd starts Sysbox and then Docker. When Docker starts, it automatically proceeds to launch containers previously configured with When the hang occurs, one can see that
If I manually kill the sysbox-runc process, then Docker resumes its normal operation (though the restart containers fail to come up). I will dig deeper to look for a solution. |
So in summary we have two bugs here:
I'll work on fixing both of these. |
@ctalledo Fascinating stuff, thanks so much for looking into it! I really appreciate your quick and thorough response. Really enjoying Sysbox so far, BTW. Aside from this it's been very smooth. Very nice to have a solution for docker-in-docker that a) doesn't require privileged mode and b) avoids some of the performance pitfalls of previous docker-in-docker solutions. Before I discovered sysbox I was thinking I'd have to run a fat VM for my CI builds, which would have been both far more resource-intensive and much more annoying to manage. So thanks! :) |
@jfmontanaro, really good to hear, thanks for the feedback. CI is one of the top uses of Sysbox currently, exactly for the reason you mentioned: avoiding the VM. FYI, our blog site has a couple of articles on CI with Sysbox (with Jenkins and GitLab). Glad Sysbox is working smoothly, we write a lot of tests for it. Docker I'll likely have a fix by tomorrow, but you'll probably need to build from source until we do our next packaged released (towards the end of January). Building from source is very easy (clone the repo and do |
The fix for (2) is out for code-review. For (1), the fix is still pending. As a temporary work-around, one can edit the systemd service file for Sysbox to look as follows. Notice the "Before=docker.service" and "ExecStart=/usr/bin/sleep 5" lines.
|
Hi @jfmontanaro: the fix for problem (2) has been submitted to the Sysbox repo. This means that if you build Sysbox from source you'll now be able to use To get the fix you must:
At this point you'll have the latest version of Sysbox running, and you can use Docker + Sysbox as usual. The Docker The next Sysbox release will come with the fix included (both in the Sysbox runtime as well as its systemd service unit file), so you'll just install Sysbox and you are done. Hope this helps. I'll keep the issue open until we have a proper fix for the Sysbox systemd service unit. |
FYI: the task remaining is to fix the Sysbox systemd service unit to ensure two things:
|
@ctalledo Perfect, after building from source and adjusting the service dependencies as you mentioned I am able to run sysbox containers with One note: The Thanks again for the rapid turnaround on this, I really appreciate it. Very impressed with Sysbox overall and happy to have it in my toolbox. Good luck with the business! |
Thanks @jfmontanaro , we will take a close look at the Sysbox systemd service files before the upcoming release to make sure all is good. Thank you very much for catching the problem and filing the issue, and very happy to hear that Sysbox is working well for you. If there is any other feedback you have as far as use cases or features you need, do let us know please. Thanks again! |
This has been fixed in the sysbox repo, and the fix will be present in the upcoming Sysbox release (v0.3.0). Closing. |
If you create a system container with
--restart always
or--restart unless-stopped
, then reboot the system, Docker fails to come back up properly.Restarting just the daemon also exhibits some issues, but it doesn't hang entirely in that case. The system container won't come back up, but the daemon will eventually (after ~1min) get itself sorted out and be available for other containers.
Ubuntu 20.04, Linux 5.4.0-54-generic
Sysbox 0.2.1-0.ubuntu-focal
Steps to reproduce:
docker run -d --runtime sysbox-runc --restart always alpine:latest sh -c 'sleep infinity'
docker.service will hang in the
activating
state indefinitely.What's especially fun is that apparently this also means the system is unable to shutdown properly. It can, however, get far enough into the shutdown process to terminate other daemons (such as, say, sshd.) So if you run into this while working remotely and try to reboot without manually killing the Docker process first, your machine will be inaccessible until you can trigger a hard reboot.
Logs:
I first encountered this with a
docker:dind
image that I wanted to come back up automatically.dind
is Alpine-based, but I've tested as well with Debian and CentOS and it doesn't seem to make a difference.The daemon can be gotten out of its stuck sate with
kill -9 <daemon PID>
, but the container in question remains stopped until manually started.The text was updated successfully, but these errors were encountered: