Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot Loop Issue #2092

Closed
igloo15 opened this issue Aug 30, 2022 · 10 comments
Closed

Boot Loop Issue #2092

igloo15 opened this issue Aug 30, 2022 · 10 comments
Labels
board/generic-x86-64 Generic x86-64 Boards (like Intel NUC) bug

Comments

@igloo15
Copy link

igloo15 commented Aug 30, 2022

Describe the issue you are experiencing

I came home from work and noticed my home assistant was not working. I checked on the machine and rebooted it. Upon rebooting the system began to boot loop.

I hooked up a monitor to the intel nuc machine and viewed the boot logs. It seems that as the Network Manager Scripts Dispatcher starts it begins to get errors.

Unable to read configuration index 0 descriptor/all

Then after a few of those messages it says Finished Network Manager Service. Followed after that is a message saying Network Target is Online then Waiting Until Kernel Time Synchronized with a timer counting to 1 minute 30 seconds. When it hits 1 minute 30 seconds a ton of text flashes by and it reboots.

There were two slots A & B and I tried both of them but both did the same thing.

What operating system image do you use?

generic-x86-64 (Generic UEFI capable x86-64 systems)

What version of Home Assistant Operating System is installed?

latest

Did you upgrade the Operating System.

No

Steps to reproduce the issue

  1. Unsure it just randomily started rebooting

...

Anything in the Supervisor logs that might be useful for us?

Can't see supervisor logs stuck at boot

Anything in the Host logs that might be useful for us?

usb 2-1: unable to read configuration index 0 descriptor/all

waiting until Kernel Time Sychronization

System Health information

No response

Additional information

Intel Nuc machine
Home Assistant 2022.8.6
Two Coral USB devices
One USB Zigbee Controller
One USB Zwave Controller

@igloo15 igloo15 added the bug label Aug 30, 2022
@agners agners added the board/generic-x86-64 Generic x86-64 Boards (like Intel NUC) label Aug 30, 2022
@agners
Copy link
Member

agners commented Aug 30, 2022

usb 2-1: unable to read configuration index 0 descriptor/all

Those messages are likely caused by a USB device. They should not be fatal. Can you try to disconnect the USB devices one by one to see which one is causing that?

Waiting Until Kernel Time Synchronized counting down to 0 means that the system isn't able to communicate with the NTP server. Are you sure your internet connectivity is fine at that point?

However, even if the NTP is not accessible, this should not lead to a system reboot afterwards.

The system reboots automatically if Docker cannot start properly. This can happen if there is data corruption on the data partition. This partition is shared between the two instances (A/B), so that would explain why both options fail.

One method to debug it from this point is by pressing e at the GRUB menu and appending the string systemd.unit=rescue.target to the line starting with linux .. and press F10 to access an emergency console. From there you can check what Docker reported in the last boot by using journalctl -b -1 -u docker.service.

Since data corruption is likely, your best bet is probably to replace the SSD and start with a new installation.

@igloo15
Copy link
Author

igloo15 commented Aug 30, 2022

@agners Ok the last couple of messages I got on the docker service said error replicating health state no space left on device followed by some errors from influxdb saying it could not write the last few data entries.

I am assuming that my drive is just full to the max and that is why its boot looping? Is that possible?

@igloo15
Copy link
Author

igloo15 commented Aug 30, 2022

I removed an old backup from march and now it is booting again. I am surprised that just running out of space causes the system to constantly boot loop.

@agners
Copy link
Member

agners commented Aug 30, 2022

Having no space can be very problematic for some services. The system reboots if Docker cannot start, as the container engine is crucial for the whole operation. It seems that Docker is one of these services.

You should have received a warning that there is not much space left well before that from the Supervisor. The new Repairs should show it if I am not mistaken.

That said, I agree ideally the system should act more gracefully. Maybe stop installing any add-ons or making backups when the disk is around 90% full.

@ronjtaylor
Copy link

I found this discussion interesting ...

My SSD is 256GB. I monitor the sensor.disk_use_percent_home
After I rebuilt my HA with 8.4 about a week ago my disk usage now sits at 3.2%
Prior to trying the 8.5 update it was approximately 6.5%
I pulled this out of my stats.
image

I find it difficult to believe that the SSD was anywhere near full as it had like > 90% left

@leneaspilimbergo
Copy link

Same issue here. Updated OS from 8.4 to 8.5 on an HP t620 Thin Client. SSD is 256Gb and had plenty of free space (it was fresh installed 1 month ago with few sensors).
I I was able to record a video and check frames to get the error messages (the log printout is too fast)
These are the errors I saw:

[FAILED] Failed to start Docker Application Container Engine
See 'systemctl status docker.service' dor details
[DEPEND] Dependency failed for HassOS supervisor
[DEPEND] Deendency failed for Dropbear SSH daemon

then it started stopping all the services and rebooted

it keeps rebooting forever

It was my first attempt with HassOD but the lack of control on the base OS compared to a debian installation I had before is something I don't like

@agners
Copy link
Member

agners commented Sep 12, 2022

[FAILED] Failed to start Docker Application Container Engine

Hm, so it seems that systemd could not start up Docker successfully. Most likely this is due to corrupted key.json, see also #1706.

With 9.0, these type of issues will get detected automatically and the Docker service should be able to recover from it..

It was my first attempt with HassOD but the lack of control on the base OS compared to a debian installation I had before is something I don't like

Boot looping is definitely not ideal, and 9.0 will boot into a rescue mode if three boot attempts as well as the fallback OS partition fails to boot (see #2096 and #2112).

That said, HAOS is designed to be largely stateless and without configuration. If you prefer full control over the OS, Debian is likely the better choice for you.

@ronjtaylor
Copy link

I am still running 8.4 and hesitant to try to update 8.5 as it may mean another rebuild and at the moment I have a sick wife and not a lot of time so am just keeping with the status quo.

Is there a solution to this issue?
If so can it be detailed so that we can follow it?
Is a solution to do a full backup of HA and load the 8.5 OS the restore the backup?
Some guidance would be great for the not so skilled users after all this OS was supposed to be for the not so skilled to use.

Thank you

@agners
Copy link
Member

agners commented Sep 13, 2022

These are exceptional cases, in @igloo15 case the culprit was likely lack of space, in your case I am not entirely sure. According to analytics we have 52310 installations on 8.5, almost all of them seem not to run into issues, I am sorry that it happened to you.

Despite having not all the information, I think 9.0 will be a lot more resilient thanks to auto fixes in place for various of these issues. However, a full backup (and download it) before an update is always the safest step (like with any computer system).

@agners
Copy link
Member

agners commented Sep 13, 2022

With #2097 HAOS will boot even when absolutely no space is available. This will be part of HAOS 9.0. Since that seems to be the culprit of the original report, I am closing this issue.

@agners agners closed this as completed Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
board/generic-x86-64 Generic x86-64 Boards (like Intel NUC) bug
Projects
None yet
Development

No branches or pull requests

4 participants