-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ #2810
Comments
Same problem on a 3B after upgrading to 4.14.79-v7+ |
Hmm, what are the Pi's doing during that time? We haven't seen anything like this in the office, so would be interested to know what they are doing that might cause this. |
Actually, this is a duplicate of an issue on our Linux tracker which is the correct place for it to be. Please continue conversation there...#2810 |
Bum, closed wrong one. Reopened. Will now attempt to close the one in firmware... |
So the machines in question are running LXDE, a Java application, and the onboard keyboard. |
That doesn't tell me anything about their level of activity - CPU, file system, network etc. |
Same issue here. Attached is my kernel log. |
Happened again mid movie. Attached is another kernel log |
I've so far found some advice from https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ |
So far so good. |
Just going say this thread saved my pi 3b+. Was hanging terrible on any intense IO. The solution by Alexurb fixed the problem for me. |
Just ran into this with kernel 5.4.51-v8 aarch64 on Pi 4. Ill try the fix recommended above in this thread:
Here's the dmesg errors:
|
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>
My card hasn't failed since I made the change on April 4 of this year. I've been using the same SanDisk microSD card I bought new since mid-2018. I think you only need to worry about microSD card life if you're using super cheap or no name cards. If you use a high tier SanDisk you should be just fine. |
Seeing the same, seems to be a memory leak in lxpanel causing it to run out of memory and start swapping until it dies.
Straight after reboot:
See also https://www.raspberrypi.org/forums/viewtopic.php?t=267015 |
I haven't noticed this problem on my 3B+ since July 21 of last year, which is just before The Foundation engineer announced a fix had been committed. Are you fully patched and updated on Buster and still seeing the issue? One thing I'm thinking based on both my repo issue thread and the Raspberry Pi Forum one you linked to (thanks!) is this could be triggered by having the CPU % lxpanel plugin AND a process that causes CPU % to spike for a significant length of time, e.g. |
This is on a 1GB 3B+ with a freshly installed and updated raspbian image, so should it is buster: It was crashing every ~6 hours from high lxpanel memory usage until I set up a script to restart lxpanel once an hour. I also added: Besides lxde there's influxdb and a autostart chrome with grafana running, plus I'm using the official 7" DSI touch display. |
Actually it's still crashing from lxpanel it seems, so the hourly restart isn't good enough to prevent whatever lxpanel is doing. htop excerpt:
Kernel messages excerpt:
|
Hmmm ... I would suggest disabling |
So I've enabled memory cgroups + swap on zram and put lxpanel into a limited cgroup. Time for some investigation:
Let's try looking at that core file
So I think it's likely that it's the bluetooth lxpanel plugin that is leaking memory in my case, I'll try disabling the bluetooth plugin and see how it behaves after that. |
Technically it looks more like a lockup from out-of-memory, not a crash. There's actually two separate issues here:
|
I'm running homeassistant on Pi4, after 2 SD cards destroyed in relatively short time I changed to USB SSD, and zero issues after months. If the distro used does a lot of writing (log files, db updates, etc) expect the sd cards not to last (that's been my experience). |
A few observations from my side to the mmc_rescan event: Here is my setup:
I started with a traditional sd card, but recognized the speed would be to slow to run iobroker. So I decided to run the entire OS and everything else from an external USB-SSD. I created a 1:1 copy of the SD card on the USB.
consequently, /boot was mounted via fstab to the boot sector on the SD-Card
Given this setup, there should be no or very low file activity on the SD-Card. When I initially created the USB-SDD disk setup of my iobroker Instance, I used the remaining space on the SD-Card as swap space. I assumed swap to become slow, but keeping an eye on it should be sufficient.
I just started to use watchdog in a test mode and as you can see, watchdog did not run the test script after 17:59. The entry at 20:27 was created after the reboot of the Pi. But that might (or might not) be another issue. |
FWIW I haven't seen this problem on my 3 B+ since I updated to Raspberry Pi OS 11.x. Currently:
I would encourage everyone to update and see what happens. And yes, I updated in place with no issues at all using this method. |
I still have this bug with 5.15.30-v8+ 64 bit Raspberry Pi 3B+.... the system freezed and resumed after 3 hours of freeze. |
…mpound Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages, and can cause bad page state issues like BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ... The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... hardware issue, removed comment since it's not relevant. |
Welp, within the past 2 days I had my 1st recurrence of this issue. |
If this happens your SD card is probably bad. Best replace it. Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation. I have had the best luck with The kernel should be updated nevertheless with better error handling, in my opinion. |
Seems you are right. Can't even clear the partition table on the darn thing now. Will order some new cards. Thanks for the advice on what works well for you. Will also update my comment. |
I did change the sysctl setting as described here: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ I noticed this when I used VS Code's SSH Remote Extension which install a bunch of stuff in the remote server. This cause a HUGE use of CPU by "node" program, and then everything freezes, I couldn't even connect via SSH because the server is freezed. My remote server is a simple Raspberry Pi 3 B+, so not very powerful. Something curious: when all this happens, I noticed in both my host machine and remote server a process "kswapd0" that consumes much CPU too, can this be related? Also, if the scheduler sends interrupts constantly to context switch and to pass to another process, so why a certain process that consumes too much CPU can freeze the pi? Shouldn't scheduler go on with other processes equally? Why can it monopolize the CPU and freeze the pi? |
@All3xJ I'm regularly running entire Gentoo builds on the pi itself that completely saturate all cpu cores, and yet never run into this error after replacing the faulty microsd cards. Try a brand new card. |
I believe you are running out of memory, which is why kswapd is going mad, rather than running out of CPU power. |
@JamesH65 is probably right, you're out of memory. My bad, I assumed that the comment was related to this issue and not some other unrelated thing :) |
All storage devices fail eventually ;) But yes, I have had the best luck with SanDisk, and their warranty (both length and replacement policy) is the best in the business. Last Friday I retired the 32 GB card I'd been using since I got the Pi in 2018 and replaced it with a 200 GB card. We'll see what happens from there. For those wondering about the process for doing so:
|
Use less memory? Use Pi4 with more memory? Use a USB SSD and put a swap file on it, that might be faster than swap on the SD card? What is the Pi doing that is using all the RAM? |
I noticed this when I used VS Code's SSH Remote Extension which install and runs a bunch of stuff in the remote server. This cause a HUGE use of memory, and then everything freezes, I couldn't even connect via SSH because the server is freezed. EDIT: I fixed by increasing the swap partition size! |
I would like to thanks everyone for all the comments i've found on this page. My system is a plain debian 11, with /boot/firmware and / mounted on sdcard; /home, /var and /tmp are mounted on ssd. |
FWIW replacing the microSD card has fixed the problem. I think I've experienced only 1 crash since. |
Hello, just joining the thread on this issue, which also happens to my raspberry pi 3b+ rev1.2. It had been running fine since ~2018 on raspbian armv7, however this issue has started ever since I migrated to aarch64 raspiOS in march. I'm currently running kernel 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux. It is running docker with HomeAssistant, Nodered, Signal-CLI and mqtt. The raspi locks up every 6 to 8 days.
I've had to replace the SD card before this migration several times, but the symptoms of a dying SD were different, being random crashes of containers, not a full system lockup.
If it doesn't work, I'll move the swap file to a USB stick and likely the whole system to a USB SSD afterwards. edit: edit2 (+1month):
It's been a few days, some more time is needed to see if it helps mitigating the issue. edit3 (+1.5months): edit4 (+3 months) edit5 (+10 months) |
I am able to reproduce this on a CM4 (lite, 1GB ram, no wifi, no heatsink) using the official CM4 IO board and a Sandisk EVO Select 128GB microSD card (which has not seen much abuse). I am using a 64bit kernel, v5.15.34-v8, built using Yocto and meta-raspberrypi. I can trigger it by compiling a large C++ project (https://github.com/OpenLightingProject/ola using Sample of kernel message:
|
I also found a similar problem in a completely different setup. |
I finding the same issue on newer kernel |
Observed this again after updating my kernel to 5.15.92 (still on a CM4 with the same uSD card, this time without running anything stressing the CPU/RAM/SD card IO):
|
Isn't Raspberry Pi OS on v6+ for all devices? I'd think that's where any fixes are most likely to be. |
I'm not using Raspberry Pi OS, I'm using an image built with Yocto kirkstone and meta-raspberrypi. This kernel update was a result of the update in meta-raspberrypi. |
I have a few Raspberry Pi 3 B+ exhibiting the same problem. They crash after 2-3 days of uptime with the following error:
[169451.220021] INFO: task kworker/0:3:10949 blocked for more than 120 seconds.
[169451.220036] Tainted: G C 4.14.79-v7+ #1159
[169451.220041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[169451.220048] kworker/0:3 D 0 10949 2 0x00000000
[169451.220077] Workqueue: events_freezable mmc_rescan
[169451.220110] [<8079ef70>] (__schedule) from [<8079f5d8>] (schedule+0x50/0xa8)
[169451.220130] [<8079f5d8>] (schedule) from [<8061a2d0>] (__mmc_claim_host+0xb8/0x1cc)
[169451.220147] [<8061a2d0>] (__mmc_claim_host) from [<8061a414>] (mmc_get_card+0x30/0x34)
[169451.220163] [<8061a414>] (mmc_get_card) from [<80623010>] (mmc_sd_detect+0x20/0x74)
[169451.220179] [<80623010>] (mmc_sd_detect) from [<8061ccdc>] (mmc_rescan+0x1c8/0x394)
[169451.220197] [<8061ccdc>] (mmc_rescan) from [<801379b4>] (process_one_work+0x158/0x454)
[169451.220212] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8)
[169451.220227] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c)
[169451.220246] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)
The machines are running Rasbian Stretch
The text was updated successfully, but these errors were encountered: