-
-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
13.1 - System crashing probably due to USB driver #3575
Comments
I don't think it needs to be "the entire USB stack" to fail, all it needs is the USB drive to fail, then the behavior would be similar as you describe. Also, if there was a systematic problem of the USB stack, I'm pretty sure there would be more reports of this happening. Relocating the logs to a different media isn't an easy task, generally, on RPi it's better to use the data disk setup which would prevent this from happening - even if the data partition became inaccessible, the system would be still running and you could SSH into it (at least using the developer SSH on port 22222). |
I know that having the system running on the internal MicroSD would be better, but I had too many cases of dead cards in my life, so I decided to move everything to the external drive. The thing that seems strange to me is how all usb devices seem to stop working at the same time (I stop getting data from the UPS and zigbee devices go offline). HomeKit is maybe the service that lasts the longer (and the reason why nobody notices anything is wrong at first). The web UI is one of the first things that stops working, followed by the debug ssh, but without access to the disk, there is plenty of space for bizarre behavior. Still, I'd like to save the logs to another drive in order to see what's going on. |
I have the same problem on a rpi 3B+, I disconnect the power and start again, the boot seems normal in sequence without errors, when the system finishes loading the web interface loses the connection, restarts again but fails to boot. |
Did you have a look at #3362? There's a workaround for USB related issues for Raspberry Pi boards, the |
Describe the issue you are experiencing
In the latest months, my HA instance running on a RPi 4 Model B (4GB RAM, using USB SSD) was crashing at random times, while having some strange behavior. Initially I blamed the PSU, but after replacing it, the instance crashed just a few hours later.
Every time the server crashes, there is a small chance that the watchdog forces a reboot, but most of the time I come home to a dead RPi with the activity LED (both the one on the board and the one on the SSD) flashing with constant timing.
It was only after a few months that I noticed that the instance would start to slowly break way before it crashed completely: the homepage was fine, but maybe some pages were broken, and trying to do something different would often cause a reboot.
I finally had a revelation a week ago, when I configured the automation for when my Zigbee devices go offline. I got notified that all my devices were offline, like if coordinator was dead. After checking, I noticed that the coordinator had become unavailable, and it was the same for the UPS connected via USB. When I checked the RPi, the SSD activity light was flashing constantly. A few minutes later, the server managed to force a reboot.
From what I saw, I believe that the entire USB stack is crashing somehow, causing ZHA and NUT to fail, while removing access to the main disk, probably causing all the unexpected behavior.
At this point, I don't know how to further analyze the issue, because the system is way too broken in that state: SSH connections are refused and nothing happens when I plug in a monitor to see what's going on, and of course, the logs don't provide any useful information, since they can't be written to disk after the issue start.
The only thing I thought about in order to help you diagnose the problem was to maybe write the logs on the unused MicroSD so we are able to retrieve them, but I don't know how I should do that.
What operating system image do you use?
rpi4-64 (Raspberry Pi 4/400 64-bit OS)
What version of Home Assistant Operating System is installed?
13.1
Did the problem occur after upgrading the Operating System?
No
Hardware details
Main board: Raspberry Pi 4 Model B, 4GB RAM
USB devices:
Device info:
Steps to reproduce the issue
Nothing specific, the error occurs randomly. It could happen after a few hours or even weeks.
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
System Information
Home Assistant Community Store
Home Assistant Cloud
Home Assistant Supervisor
Dashboards
Recorder
Additional information
No response
The text was updated successfully, but these errors were encountered: