-
-
Notifications
You must be signed in to change notification settings - Fork 987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HA OS 13.0 task kthreadd blocked for more than 120 seconds #3534
Comments
I just hit this. Pi4 with SSD. I updated from 12.4 to 13.0 and HA never came up. I had to retrieve the pi from where it lives on top of a cupboard and plug it in to a screen and keyboard to see what it was doing. It was booting to the CLI, then after a couple of minutes I was getting the same messages as shown above, and then it rebooted. Left alone, it just kept doing this, HA never came up. I got in to the CLI and then discovered via command "ha os info" that there are 2 boot slots that it flip flops between when an update is applied. If it fails to boot 3 times it reverts to the other slot, but this problem is obviously letting the os come up far enough to prevent that triggering. I then used command "ha os boot-slot other" to force it to the other slot, which had previous 12.4 in it. The system then came up OK. |
I'm dealing with this too! Crashing every few minutes. Occasionally I'll also see it throw up a stack trace too but I don't know of any way to have those traces be saved. I'll try the boot slot change now... Before:
After:
Looks like it's back on 12.4 even though the version is null under the boot slot version. Hopefully this fixes it for now... Edit: Seems to have been reliable back on 12.4 for me. I was experiencing some crashes but I think that's because of another bug in HA Core to do with ESPHome running out of memory and crashing the host. I increased swap and it seems OK. |
@dpgh947 Thanks for looking into this, I have just rolled back to 12.4 using Using your proposed
|
Thanks for the tip of reverting back to 12.4 ! |
The error messages basically mean the system is too busy to handle kernel tasks. Can you watch memory and CPU usage at the Hardware page to check whether it's not hitting limits before it becomes unresponsive? The common denominator for this report and @dipseth's in the other issue is RPi with 1GB RAM, which is sufficient only for very simple HA setup. |
Mine is a pi4 with 2gb, never had a problem before. |
I have pi4 with 4GB and was experiencing this. Although, when recording both |
I understand that the RPi 3B+ is quite old and after 5 years of loyal duty I should consider to replace it but still, I'm trying to understand why it happened after the 13.0 update and it's not occuring when I run 12.4. The ram shortage was quite mitigated by my bigger SWAP config. In order to downgrade to 12.4, I had to stop every add-ons, and avoid connecting to the UI (otherwise the crash would occur). When I downgraded to 12.4, I re-enabled the add-ons and I have 0 issues since then. |
This reminds me that in between when it was crashing and now, where it is no longer crashing, I increased my swap to prevent ESPHome crashes. |
I had done something similar, adding a 600mb SWAP file. This is still required in 12.4 but runs without any issues now. |
@sairon Are you saying this as a HA OS developer, or just making a statement? What are your sources that 1 GB is not enough? My current memory usage on HA OS 12.4 is 676.3 MiB/75 % with 53 integrations active, 155 devices and 3 add ons active |
@EastArctica you can run ESPHome on your laptop or local PC, this wil definitely take some load away from your HA device |
I don't know what you mean by either of those, so I will elaborate a bit. On one hand, 1 GB RAM can be fine, from those who opted in for the analytics, 7 % are running on RPi 3 which only has 1 GB of RAM. However, your mileage may vary. 3 add-ons and 53 integrations is still something I'd call conservative usage, as especially the amount of add-ons makes a bigger difference. Compared to you, dipseth has 18 add-ons, and many people are using HA as a self-hosting platform, where RAM starts to be scarce very soon. For other platforms minimum of 2 GB RAM is generally recommended in the docs, for CM4 in Yellow it's recommended as well. From my purely personal experience, only one real HA deployment out of three I manage, only one sits just slightly below 1 GB after a while of usage. In your case, the remaining 25 % of RAM can become insufficient quite quickly. It also needs to be considered the system usually performs well if it also has some RAM available for page caches, if not, it can lead to higher I/O and combined with swapping on a (rather) slow media, it can be very detrimental to performance. Which leads me to another thought - the SD card you are using - while it's a good choice in terms of endurance - might not be the best for this usage in overall. Per the description it's optimized for use in security cameras, and has no Application Performance Class, so there's no IOPS guarantee. For HAOS it's recommended to use cards of A2 class which perform better in scenarios like this. In summary, the issue definitely looks like performance related. To see what's going on, having an HDMI display connected, typing |
Thanks for the insight. |
@DeXter3306 It's hard to tell without any details about your deployment. Doing what I suggested in the last paragraph of the previous post could help. Also, the kernel doesn't log all information by default, there might be more in |
I made a backup of my current system and reverted back to one from when I was experiencing issues before and alas! I am experiencing them once again (didn't think I'd be happy to say that). So far I've been doing most of my monitoring and diag through ssh to the OS, but when the host seems to "die" it tends to kill the network too. Through the terminal on the OS itself (via keyboard + HDMI), how can I run shell commands? I'm sorta just stuck in the home assistant CLI and don't know how to break out of it... I think I can confirm that it is an issue with something HA Core related as the backup I had previously was ONLY HA Core and not a full backup (remember how I said that I couldn't get it to crash before and now I can, although I did have the swap increase but that shouldn't have persisted) Edit: Just saw memory spike to 3.6GB right before the entire system froze (unable to type in console, ssh wont connect, webserver wont respond) The stack trace seems to be the system failing to exit correctly because it's out of memory which it's trying to exit because it ran out of memory... After an hour of testing, it's the whisper model. I don't even know why I had it installed but the whisper addon is 100% what was causing my crashes at least. Currently restoring to my pre-testing backup and I'll reinstall the whisper addon to see if it causes crashes there too, then do the same test with a swap increase. |
I tried 13.1 this morning, still broken. HA was just starting to come up, then it rebooted, over and over. This time I managed to get in to ssh and issue the boot-slot other command to get back to 12.4 rather than getting the pi out again and plugging in a screen, so I can't confirm the same messages that I saw before, but whatever it is, I still cannot upgrade my 2gb pi4. |
Well, on a hunch, I have been playing around turning off "start at boot" for some addons. I turned off esphome, plex server, wireguard, samba and chose to install 13.1 again. It booted ok. I started esphome and samba manually, all ok. I turned on start at boot for esphome and samba, and rebooted, again came up ok. I started wireguard manually, it started ok. I started plex manually, system immediately stopped responding and then rebooted. Came up OK as I hadn't set those to start. I started plex manually first this time, started ok. Started wireguard manually, came up ok. Go figure... I don't use plex or wireguard at the moment, so leaving start at boot for those turned off seems to have alleviated the problem. Whether this is due to an actual problem in one of those, or it's just some sort of resource issue during a full boot (maybe the order things happen has changed?) - I have no idea. EDIT - it rebooted after about half an hour, back to 12.4 again. |
Anyone else using the onboard serial port of the Raspberry PI? In my case the Phoscon RaspBee II. I'm starting to think this could be one of the causes |
Hey, I'm also using the serial port for my RaZberry. |
Having the same problem after updating to 2024.10 on a 1GB Pi 4 - reformatting and restoring a backup from 2024.9.3 makes it work. Upgrading to 2024.10 breaks it again. |
I don't know if it'll help, but after googling a lot, I came across this issue. After lots of google searches, I decided to roll back the move to a 64bit kernel which promised a boost (arm_64bits=1 in boot config, which is not a problem since distrib is 32bits) and force supervisor to use 32 bit. So looks like there's a bug in docker on 64bit raspberry kernels if it's the same issue. EDIT: Rolled back to 64bit kernel but stayed with 32bit supervisor, no problem. |
Seems I have the same problem. RPI4 with 8GB. |
Describe the issue you are experiencing
Since I upgraded my Raspberry Pi 4 to the latest HA OS 13.0 the system freezes regularly with the message
Once frozen I can still ping my HA server, but can't SSH into it anymore
What operating system image do you use?
rpi4-64 (Raspberry Pi 4/400 64-bit OS)
What version of Home Assistant Operating System is installed?
3.0
Did the problem occur after upgrading the Operating System?
Yes
Hardware details
Raspberry Pi 4 with 1 GB memory and WD Purple SC QD101 microSDXC 64 GB storage, original power supply and powered USB hub, Phoscon RaspBee II Zigbee module
Steps to reproduce the issue
about 1 in 4 times the system boots up correctly and I can access HA, but then after some hours it crashes again
Anything in the Supervisor logs that might be useful for us?
Anything in the Host logs that might be useful for us?
System information
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: