-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker key.json has invalid contents and the system refuses to boot #1706
Comments
Hm, we don't really touch the bind mounted overlay partition on update, so I can't really see how that is related to the update process. Also it seems that Docker uses AtomicWriteFile to make sure it gets written completely to disk before replacing it: liusdu/moby@dcc1d2e#diff-2c9b2092e4c0945dfd6f1a67b07a560c472bb9d66a02f9d2333fb7f9d4b46eafR147 I'd call hardware issue, but its a bit strange that the same file was affected twice (and non other, at least not that we know of...). |
Yeah, it's definitely a strange one. I've had the box for a good number of years but the SSD was new at the time I installed Home Assistant OS on it (~4 months ago). I've had no other issues (no freezing, reboots, or any other file corruption) that I've noticed. It's only during an OS upgrade that I have an issue. |
Just hit this issue too, updating to 5.7; appears this person also had the issue: https://community.home-assistant.io/t/hassio-wont-start-failed-to-start-docker-application-container-engine/95407/31 The file seems to be pretty empty:
|
5.7 is rather old at this point 😅 Did the update and the reboot thereafter go through without problems? |
Sorry I meant 7.6 (I don't know where I got that number from). Post update the reboot failed; I booted off an Ubuntu stick and deleted the key.json, and then the reboot worked. |
A clean reboot should always cause that file to be properly synced to disk, its really not clear to me why this can happen. What kind of system are you using? |
FWIW I woke up to my Home Assistant box not responding and after a reboot hit this docker key.json issue again. After fixing that I upgraded to |
I also had this happened to me 2 day ago on a new install (a couple week) running the X86 version of the os on a NUC. After the upgrade the device bootlooped for ever, restarting after the docker service.
The file, who isn't in the homeassistant_data partition but one called "homeassistant_overlay" (don't know what it's used for), was full of `\00\00\00\00\00\00....00" a very long line. When asking for help on the discord i found that another user had that same issue mgolisch also found the cause of the reboot : https://github.com/home-assistant/operating-system/blob/dev/buildroot-external/rootfs-overlay/etc/systemd/system/docker.service.d/failure.conf Lastly, i don't believe the update processes is in cause here, as i didn't find anything about it, but something is breaking the key.json file on many device, and it break rebooting, which append during upgrade. I was moving from core to os for better uptime and quicker upgrade cycle as i am slowly leaving the house, but this incident and it's apparently somewhat regular occurrence made me go back on that for now As i am not comfortable to release the full journalctl file from installation to crash in the wild (i don't know what kind of sensible information there is on it) i won't upload them but i keep them at the discretion of any maintainer who may ask for them. Sincerely |
As mentioned above, I'm having the same issue, which forces me to never upgrade the os, or reboot the host for fear of a broken inaccessible install. Having at least the possibility to work with the cli while a fix is found would be great. |
The boot loop is implemented so that the bootloader can switch to the other (presumably good) installation (HAOS has two OS installation, A and B. Each upgrade updates the other, not currenlty running system). However, if the old presumably good installation is not booting as well, simply rebooting is indeed not helpful. Currently we don't detect that situation/behave accordingly. This case is somewhat special as data corruption in the shared overlay partition causes both installations (A and B) to fail.
We use the overlay partition to make certain parts of Now your case looks a corruption happened to that partition. However, as written in #1706 (comment) I don't really understand why that can happen in first place: If that is indeed a more common problem, maybe we should sanity check that file or something. |
In the last month, I've had a similar bootloop happen 3 times. I've always had to reflash my ssd, as I could not do much with the install, and the error itself was hard to find. As you pointed out, this might not be 100% relayed to OS upgrade, as I've had it happen once not during the update process. But can't give more specific details. For now I'll try avoiding rebooting host, and if a reboot happens, will try to gather more details |
Ok, i understand the goal there, but in that case you need an extra grub entry for an emergency or "safe mode" shell. Bringing the monitor and the keyboard to edit the boot command was way to involved to guide someone on discord, and so is booting a live cd in my mind. That aside, i was going to offer a execpre to docker checking the file and wiping it if needed, but i was bothered by the idea of hiding file corruption ... |
One way to remove the corrupted |
I just hit this after an unclean shutdown. The debugging experience was really bad because of the boot loop. On a hunch, based on the boot loop and not having found this issue yet, I grepped this repo for There is a You could consider using I don't know much about docker, but does |
With HAOS 9.0 a invalid key file will be detected, see #1988. |
Thanks for your work |
I believe I just fell victim to this as well and with HAOS 9.3 !!! how to correct the corrupted key.json ? /etc/docker/daemon.json is the only file ?
and using this still boot loops
|
You can just delete the file at
Hm, that then sounds like a different problem. |
Describe the issue you are experiencing
I'm 2 for 2 now on this. Each reboot after an upgrade to the OS somehow leads to a corrupt key.json file on my Home Assistant OS install. Each time I have to boot via a recovery USB drive, mount the right partition (in this case
/dev/sda7
) and delete the/etc/docker/key.json
file to fix the system. I don't intentionally manage this file in any way, nor do I care about the contents, so the Docker generated version is just fine with me.As a test, I fixed the file, booted into Home Assistant and then restarted back into my recovery USB to inspect the contents. They were valid JSON instead of the invalid content.
What operating system image do you use?
generic-x86-64 (Generic UEFI capable x86-64 systems)
What version of Home Assistant Operating System is installed?
6.6
Did you upgrade the Operating System.
Yes
Steps to reproduce the issue
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System Health information
System Health
Home Assistant Community Store
Home Assistant Cloud
Home Assistant Supervisor
keymaster
Lovelace
Additional information
This is the contents of the file:
And this is the log entries from journalctl:
The text was updated successfully, but these errors were encountered: