Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ #2810

op-ecoprintq · 2019-01-14T22:16:20Z

I have a few Raspberry Pi 3 B+ exhibiting the same problem. They crash after 2-3 days of uptime with the following error:

[169451.220021] INFO: task kworker/0:3:10949 blocked for more than 120 seconds.
[169451.220036] Tainted: G C 4.14.79-v7+ #1159
[169451.220041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[169451.220048] kworker/0:3 D 0 10949 2 0x00000000
[169451.220077] Workqueue: events_freezable mmc_rescan
[169451.220110] [<8079ef70>] (__schedule) from [<8079f5d8>] (schedule+0x50/0xa8)
[169451.220130] [<8079f5d8>] (schedule) from [<8061a2d0>] (__mmc_claim_host+0xb8/0x1cc)
[169451.220147] [<8061a2d0>] (__mmc_claim_host) from [<8061a414>] (mmc_get_card+0x30/0x34)
[169451.220163] [<8061a414>] (mmc_get_card) from [<80623010>] (mmc_sd_detect+0x20/0x74)
[169451.220179] [<80623010>] (mmc_sd_detect) from [<8061ccdc>] (mmc_rescan+0x1c8/0x394)
[169451.220197] [<8061ccdc>] (mmc_rescan) from [<801379b4>] (process_one_work+0x158/0x454)
[169451.220212] [<801379b4>] (process_one_work) from [<80137d14>] (worker_thread+0x64/0x5b8)
[169451.220227] [<80137d14>] (worker_thread) from [<8013dd98>] (kthread+0x13c/0x16c)
[169451.220246] [<8013dd98>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)

The machines are running Rasbian Stretch

ldrolez · 2019-01-18T09:28:17Z

Same problem on a 3B after upgrading to 4.14.79-v7+
Crash every 1-3 days. Now downgrading to 4.14.52-v7+

JamesH65 · 2019-01-18T10:07:21Z

Hmm, what are the Pi's doing during that time? We haven't seen anything like this in the office, so would be interested to know what they are doing that might cause this.

JamesH65 · 2019-01-18T10:11:09Z

Actually, this is a duplicate of an issue on our Linux tracker which is the correct place for it to be. Please continue conversation there...#2810

JamesH65 · 2019-01-18T10:12:24Z

Bum, closed wrong one. Reopened. Will now attempt to close the one in firmware...

op-ecoprintq · 2019-01-18T16:50:21Z

So the machines in question are running LXDE, a Java application, and the onboard keyboard.

pelwell · 2019-01-18T16:52:38Z

That doesn't tell me anything about their level of activity - CPU, file system, network etc.

Alexurb · 2019-01-19T07:13:09Z

Same issue here. Attached is my kernel log.
kern.log
The issue occurs around Jan 18 19:57:02
This is on a headless install. The green light is constant, the PI responds to pings but is unable to open up an ssh session.
This actually occurs for me during a movie being streamed by plex , around an hour and a half in.
I left the pi to see if it would recover overnight, sadly it didn't.

Alexurb · 2019-01-21T20:50:15Z

Happened again mid movie. Attached is another kernel log
kern.log

Alexurb · 2019-01-23T12:42:43Z

I've so far found some advice from https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
summary:
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p
seems to do the trick so far, still testing (3 movies down)
I'll comment if it occurs again.
To make it permanent(after reboot) read the link

Alexurb · 2019-01-23T20:38:07Z

I've so far found some advice from https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
summary:
sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p
seems to do the trick so far, still testing (3 movies down)
I'll comment if it occurs again.
To make it permanent(after reboot) read the link

So far so good.

peabody · 2019-08-29T07:35:34Z

Just going say this thread saved my pi 3b+. Was hanging terrible on any intense IO. The solution by Alexurb fixed the problem for me.

jdrch · 2020-04-04T06:22:59Z

Same problem here on my Pi 3B+ running Raspbian Buster. Gonna try that edit and then reboot. Fingers crossed!

paralin · 2020-09-22T05:43:34Z

Just ran into this with kernel 5.4.51-v8 aarch64 on Pi 4.

Ill try the fix recommended above in this thread:

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

Here's the dmesg errors:

[11722.001585] INFO: task kworker/1:0:1663 blocked for more than 120 seconds.
[11722.005139]       Tainted: G         C        5.4.51-v8 #1
[11722.008672] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11722.012222] kworker/1:0     D    0  1663      2 0x00000028
[11722.015773] Workqueue: events_freezable mmc_rescan
[11722.019269] Call trace:
[11722.022724]  __switch_to+0x110/0x180
[11722.026149]  __schedule+0x2f4/0x750
[11722.029523]  schedule+0x44/0xe0
[11722.032807]  __mmc_claim_host+0xb8/0x210
[11722.036111]  mmc_get_card+0x38/0x50
[11722.039407]  mmc_sd_detect+0x24/0x90
[11722.042671]  mmc_rescan+0xc8/0x390
[11722.045892]  process_one_work+0x1c0/0x470
[11722.049078]  worker_thread+0x50/0x430
[11722.052405]  kthread+0x100/0x130
[11722.055223]  ret_from_fork+0x10/0x18

May fix a lockup observed against Pi 4 64 bit: INFO: task kworker/1:0:1663 blocked for more than 120 seconds. Workqueue: events_freezable mmc_rescan Call trace: __switch_to+0x110/0x180 __schedule+0x2f4/0x750 schedule+0x44/0xe0 __mmc_claim_host+0xb8/0x210 mmc_get_card+0x38/0x50 mmc_sd_detect+0x24/0x90 mmc_rescan+0xc8/0x390 process_one_work+0x1c0/0x470 worker_thread+0x50/0x430 kthread+0x100/0x130 sets syctl values for pi boards: +vm.dirty_background_ratio = 5 +vm.dirty_ratio = 10 Reference: raspberrypi/linux#2810 Signed-off-by: Christian Stewart <christian@paral.in>

jdrch · 2020-11-22T05:31:52Z

will Too much of write can reduce life of SD CARDS?
vm.dirty_background_ratio = 5
vm.dirty_ratio = 80
Can we try this setting.

My card hasn't failed since I made the change on April 4 of this year. I've been using the same SanDisk microSD card I bought new since mid-2018. I think you only need to worry about microSD card life if you're using super cheap or no name cards. If you use a high tier SanDisk you should be just fine.

ranma · 2021-02-23T18:31:02Z

Seeing the same, seems to be a memory leak in lxpanel causing it to run out of memory and start swapping until it dies.
htop output from ssh session I left running:

  Mem[||||||||||||||||||||||||||||||||||||||||||||882M/924M]   Uptime: 07:48:17
[...]
  719 pi         20   0  882M  488M   204 D  0.7 52.9 35:19.23 lxpanel --profile LXDE-pi
[...]

Straight after reboot:

  Mem[||||||||||||||||||||||||||||||||||||||||||||390M/924M]   Uptime: 00:03:50
[...]
 726 pi         20   0  401M 30128 23492 S  5.1  3.2  0:05.68 lxpanel --profile LXDE-pi
[...]

See also https://www.raspberrypi.org/forums/viewtopic.php?t=267015

jdrch · 2021-02-23T18:52:41Z

I haven't noticed this problem on my 3B+ since July 21 of last year, which is just before The Foundation engineer announced a fix had been committed.

Are you fully patched and updated on Buster and still seeing the issue?

One thing I'm thinking based on both my repo issue thread and the Raspberry Pi Forum one you linked to (thanks!) is this could be triggered by having the CPU % lxpanel plugin AND a process that causes CPU % to spike for a significant length of time, e.g. backintime as I did. If you read through my issue thread you'll notice the whole thing went away once I removed backintime.

ranma · 2021-03-06T08:50:12Z

This is on a 1GB 3B+ with a freshly installed and updated raspbian image, so should it is buster:
Raspberry Pi 3 Model B Plus Rev 1.3
Raspberry Pi reference 2021-01-11
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 21090519d85bdaa1615d5d5057d37b09368ea5d2, stage4
Linux raspberrypi 5.10.11-v7+ #1399 SMP Thu Jan 28 12:06:05 GMT 2021 armv7l GNU/Linux
Videocore Jan 27 2021 22:26:53

It was crashing every ~6 hours from high lxpanel memory usage until I set up a script to restart lxpanel once an hour.

I also added:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

Besides lxde there's influxdb and a autostart chrome with grafana running, plus I'm using the official 7" DSI touch display.
It's actually still likes to crash but less often, I haven't investigated the more recent ones yet.

ranma · 2021-03-06T17:30:38Z

Actually it's still crashing from lxpanel it seems, so the hourly restart isn't good enough to prevent whatever lxpanel is doing.
I also would have expected the OOM-killer to kick in, but somehow the system seems to lock up instead?

htop excerpt:

  1  [||||||||                                        13.0%]   Tasks: 78, 157 thr; 3 running
  2  [|||||||||                                       15.3%]   Load average: 39.39 40.12 37.97 
  3  [|||||||||                                       14.3%]   Uptime: 07:53:55
  4  [||||||||||                                      16.8%]
  Mem[||||||||||||||||||||||||||||||||||||||||||||881M/924M]
  Swp[||||||||||||||||||||||||||||||||||||||||100.0M/100.0M]

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command                                                         
  457 grafana    20   0  934M 10728     0 S 32.2  1.1 29:40.85 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pid
  451 influxdb   20   0  978M  9476     0 D 28.4  1.0 35:29.89 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
  494 grafana    20   0  934M 10728     0 R 20.4  1.1 15:36.50 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pid
  731 grafana    20   0  934M 10728     0 S 11.6  1.1  9:18.06 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pid
 1347 influxdb   20   0  978M  9476     0 D  6.1  1.0  5:24.33 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
[...]
  795 pi         20   0  636M 39568     0 D  1.0  4.2  3:43.82 /usr/lib/chromium-browser/chromium-browser-v7 --force-renderer-a
 1282 pi         20   0  429M 72396   372 D  0.8  7.6  4:00.12 /usr/lib/chromium-browser/chromium-browser-v7 --type=renderer --
[...]
 760 pi         20   0 1012M  591M     0 D  0.4 64.0 28:24.12 lxpanel --profile LXDE-pi

Kernel messages excerpt:

[  119.755918] logitech-hidpp-device 0003:046D:404D.0004: HID++ 4.1 device connected.
[14620.750775] INFO: task kworker/2:0:7862 blocked for more than 122 seconds.
[14620.750798]       Tainted: G         C        5.10.11-v7+ #1399
[14620.750807] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[14620.750818] task:kworker/2:0     state:D stack:    0 pid: 7862 ppid:     2 flags:0x00000000
[14620.750859] Workqueue: events_freezable mmc_rescan
[14620.750876] Backtrace:
[14620.750906] [<809e5190>] (__schedule) from [<809e5b60>] (schedule+0x68/0xe4)
[14620.750924]  r10:822b1800 r9:ffffe000 r8:00000000 r7:00000000 r6:40000113 r5:9d8b2e80
[14620.750934]  r4:ffffe000
[14620.750951] [<809e5af8>] (schedule) from [<807f72f8>] (__mmc_claim_host+0xe0/0x238)
[14620.750962]  r5:822b1a18 r4:00000002
[14620.750978] [<807f7218>] (__mmc_claim_host) from [<807f7488>] (mmc_get_card+0x38/0x3c)
[14620.750994]  r10:b6b53640 r9:00000000 r8:b6b56c00 r7:00000080 r6:822b1a18 r5:00000000
[14620.751005]  r4:822b2800
[14620.751041] [<807f7450>] (mmc_get_card) from [<808010a0>] (mmc_sd_detect+0x24/0x7c)
[14620.751053]  r5:822b1800 r4:822b1800
[14620.751071] [<8080107c>] (mmc_sd_detect) from [<807f9cc0>] (mmc_rescan+0xdc/0x3b0)
[14620.751083]  r5:822b1800 r4:822b1a7c
[14620.751103] [<807f9be4>] (mmc_rescan) from [<8013b78c>] (process_one_work+0x254/0x584)
[14620.751119]  r9:00000000 r8:b6b56c00 r7:00000080 r6:b6b53640 r5:89128200 r4:822b1a7c
[14620.751137] [<8013b538>] (process_one_work) from [<8013bb1c>] (worker_thread+0x60/0x5c4)
[14620.751153]  r10:b6b53640 r9:80e03d00 r8:b6b53658 r7:00000008 r6:b6b53640 r5:89128214
[14620.751164]  r4:89128200
[14620.751182] [<8013babc>] (worker_thread) from [<801435a4>] (kthread+0x170/0x174)
[14620.751198]  r10:adf9be74 r9:89128200 r8:8013babc r7:9b4ba000 r6:00000000 r5:ade02c80
[14620.751209]  r4:9170bf80
[14620.751225] [<80143434>] (kthread) from [<801000ec>] (ret_from_fork+0x14/0x28)
[14620.751236] Exception stack(0x9b4bbfb0 to 0x9b4bbff8)
[14620.751249] bfa0:                                     00000000 00000000 00000000 00000000
[14620.751264] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[14620.751277] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000  
[14620.751293]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:80143434
[14620.751303]  r4:ade02c80

jdrch · 2021-03-06T17:36:45Z

Hmmm ... I would suggest disabling cron jobs and/or processes and services one by one and seeing how that goes. As I said, once I removed backintime the issue went away (or at least is now sufficiently infrequent for me to not notice it at all). While lxpanel is crashing, I believe in some cases the crash is being prompted by something else.

ranma · 2021-03-07T09:46:35Z

So I've enabled memory cgroups + swap on zram and put lxpanel into a limited cgroup.
As expected, this morning I found it almost maxing out it's allocation limit and heavily swapping (but only the lxpanel process, thanks to the cgroup).

Time for some investigation:

(gdb) attach 3924
Attaching to program: /usr/bin/lxpanel, process 3924
[New LWP 3928]
[New LWP 3942]
[New LWP 3948]
[New LWP 3988]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
0x73822510 in g_dbus_object_get_object_path@plt () from /usr/lib/arm-linux-gnueabihf/lxpanel/plugins/bluetooth.so
(gdb) bt
#0  0x73822510 in g_dbus_object_get_object_path@plt () from /usr/lib/arm-linux-gnueabihf/lxpanel/plugins/bluetooth.so
#1  0x73824ae0 in ?? () from /usr/lib/arm-linux-gnueabihf/lxpanel/plugins/bluetooth.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) gcore lxpanel.core
warning: target file /proc/3924/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x60aff000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x64aff000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x68aff000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x6cb00000.
warning: Memory read failed for corefile section, 135168 bytes at 0x6dd00000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x6df00000.
warning: Memory read failed for corefile section, 135168 bytes at 0x6e700000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x6e8ff000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x6f100000.
warning: Memory read failed for corefile section, 135168 bytes at 0x6f900000.
warning: Memory read failed for corefile section, 393216 bytes at 0x6fa30000.
warning: Memory read failed for corefile section, 237568 bytes at 0x6fac5000.
warning: Memory read failed for corefile section, 135168 bytes at 0x70300000.
warning: Memory read failed for corefile section, 4096 bytes at 0x70465000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x70468000.
warning: Memory read failed for corefile section, 135168 bytes at 0x74600000.
warning: Memory read failed for corefile section, 8192 bytes at 0x75b32000.
warning: Memory read failed for corefile section, 12288 bytes at 0x7639d000.
warning: Memory read failed for corefile section, 131072 bytes at 0x76ed6000.
warning: Memory read failed for corefile section, 135168 bytes at 0x7ea79000.
Saved corefile lxpanel.core

Let's try looking at that core file

$ ls -lh lxpanel.core 
-rw-r--r-- 1 pi pi 432M Mar  7 10:37 lxpanel.core
$ strings lxpanel.core  | sort | uniq -c | sort -n -k1 -r | head -20
   6272 :1.18
   4376 daRvPaRv
   2569 org.bluez.Device1
   2414 org.freedesktop.DBus.Properties
   2080 org.freedesktop.DBus.Introspectable
   1595 RSSI
   1094 UUIDs
   1094 Alias
   1094 AddressType
   1094 Address
   1092 Trusted
   1092 ServicesResolved
   1092 Paired
   1092 LegacyPairing
   1092 Connected
   1092 Blocked
   1092 Adapter
    980 MXo=
    972 ManufacturerData
    888 /org/bluez/hci0

So I think it's likely that it's the bluetooth lxpanel plugin that is leaking memory in my case, I'll try disabling the bluetooth plugin and see how it behaves after that.

ranma · 2021-03-07T11:21:55Z

While lxpanal is crashing, I believe in some cases the crash is being prompted by something else.

Technically it looks more like a lockup from out-of-memory, not a crash.
lxpanel is the clear culprit because its resident set size is initially small, but is at 591M out of ~1GB total at the point of the lockup.
Also putting lxpanel into a cgroup prevents the lockup for me.

There's actually two separate issues here:

The system seems to lock up instead of the OOM-killer triggering and killing the offending lxpanel process (at which point the system would recover and continue working since lxpanel is auto-respawned).
It seems it might be that swapping on SD card is running into some kernel issue with the default kernel settings?
The memory leak in lxpanel (or rather most likely the raspi-specific lxplug-bluetooth)
I'll open a new bug at https://github.com/raspberrypi-ui/lxplug-bluetooth/ for this one, I think there are missing calls to g_variant_unref in cb_interface_properties. I'll also need to see if heaptrack is usable, I tried valgrinds massif yesterday and it was just too slow.

nickidw · 2021-12-24T06:06:42Z

I'm running homeassistant on Pi4, after 2 SD cards destroyed in relatively short time I changed to USB SSD, and zero issues after months. If the distro used does a lot of writing (log files, db updates, etc) expect the sd cards not to last (that's been my experience).

joetriskaide · 2022-04-10T08:32:15Z

A few observations from my side to the mmc_rescan event:
I had this issue yesterday (again). What strikes me is the fact that the PI is running for days, getting into trouble and after having the trouble the first time, even a reboot leads more or less immediately into the same trouble after a few minutes.

Here is my setup:

/proc/cpuinfo:
Hardware	: BCM2835
Revision	: a020d3
Serial		: 0000000040ea362b
Model		: Raspberry Pi 3 Model B Plus Rev 1.3

I started with a traditional sd card, but recognized the speed would be to slow to run iobroker. So I decided to run the entire OS and everything else from an external USB-SSD. I created a 1:1 copy of the SD card on the USB.
I did not use the fuse in the RPi to let the Pi boot for USB, I did change the "root=" command line parameter in the boot sector on the SD card to

/boot/cmdline.txt
console=serial0,115200 console=tty1 root=/dev/sda2 rootfstype=ext4 elevator=deadline fsck.repair=yes rootwait dwc_otg.lpm_enable=0

consequently, /boot was mounted via fstab to the boot sector on the SD-Card

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/mmcblk0p1     97070   49522     47548  52% /boot
/dev/root        4159192 3317080    825728  81% /

Given this setup, there should be no or very low file activity on the SD-Card. When I initially created the USB-SDD disk setup of my iobroker Instance, I used the remaining space on the SD-Card as swap space. I assumed swap to become slow, but keeping an eye on it should be sufficient.
When I encountered the first "Workqueue: events_freezable mmc_rescan" event, I changed from SD-Card based swap device to an USB-SDD based swap device. I got rid of the /var/swap swap file at all.
I need to mention that I use LVM for my files except /boot and / file systems and my swap device is a logical volume on the USB-SDD. No PV on the SD Card at all.
What I do not understand is the fact, that with only /boot mounted and nothing else accessing the SD-Card, my PI yesterday ran into the unresponsive mode. Even watchdog seem to have stopped:

2022-04-09 17:58:34 test        addr b8:27:eb:xx:xx:xx  ssid <ssid> 
2022-04-09 17:58:45 test        addr b8:27:eb:xx:xx:xx  ssid <ssid> 
2022-04-09 17:58:52 test        addr b8:27:eb:xx:xx:xx  ssid <ssid> 
2022-04-09 17:59:08 test        addr b8:27:eb:xx:xx:xx  ssid <ssid> 
2022-04-09 20:27:35 test        addr b8:27:eb:xx:xx:xx  ssid <ssid> 
2022-04-09 20:27:45 test        addr b8:27:eb:xx:xx:xx  ssid <ssid>

I just started to use watchdog in a test mode and as you can see, watchdog did not run the test script after 17:59. The entry at 20:27 was created after the reboot of the Pi. But that might (or might not) be another issue.
Another strange observation:
After running for three weeks without issues, the Pi became unresponsive and when I powered off the Pi (by pulling the plug!), it needs three or four reboot (=pulling the plug) before I could access the Pi long enough to make changes to the configuration.
Another observation, not sure it has to do with this rescan_mmc issue, is that my Pi from time to time is not reachable by Wifi, but usually comes back online after some time (tens of minutes).
Finally I have to mention that I run a 2nd Pi, same model, same setup with USB-SDD with kodi media center and I do not yet see any issues with mmc_rescan as I can see in my Pi running iobroker.
I changed system settings a proposed by samyk, keep fingers crossed and let you know, hat happens - or not.

jdrch · 2022-04-10T17:20:37Z

FWIW I haven't seen this problem on my 3 B+ since I updated to Raspberry Pi OS 11.x. Currently:

$ cat /etc/debian_version
11.3

I would encourage everyone to update and see what happens. And yes, I updated in place with no issues at all using this method.

All3xJ · 2022-04-24T07:17:12Z

I still have this bug with 5.15.30-v8+ 64 bit Raspberry Pi 3B+.... the system freezed and resumed after 3 hours of freeze.
Here are journalctl logs:
journalctl.txt

…mpound Huge vmalloc higher-order backing pages were allocated with __GFP_COMP in order to allow the sub-pages to be refcounted by callers such as "remap_vmalloc_page [sic]" (remap_vmalloc_range). However a similar problem exists for other struct page fields callers use, for example fb_deferred_io_fault() takes a vmalloc'ed page and not only refcounts it but uses ->lru, ->mapping, ->index. This is not compatible with compound sub-pages, and can cause bad page state issues like BUG: Bad page state in process swapper/0 pfn:00743 page:(____ptrval____) refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x743 flags: 0x7ffff000000000(node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000000 c00c00000001d0c8 c00c00000001d0c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: corrupted mapping in tail page Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc3-00082-gfc6fff4a7ce1-dirty #2810 Call Trace: dump_stack_lvl+0x74/0xa8 (unreliable) bad_page+0x12c/0x170 free_tail_pages_check+0xe8/0x190 free_pcp_prepare+0x31c/0x4e0 free_unref_page+0x40/0x1b0 __vunmap+0x1d8/0x420 ... The correct approach is to use split high-order pages for the huge vmalloc backing. These allow callers to treat them in exactly the same way as individually-allocated order-0 pages. Link: https://lore.kernel.org/all/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

delgurth · 2022-05-08T20:08:13Z

...

hardware issue, removed comment since it's not relevant.

jdrch · 2022-05-08T20:12:31Z

FWIW I haven't seen this problem on my 3 B+ since I updated to Raspberry Pi OS 11.x. Currently:
$ cat /etc/debian_version
11.3
I would encourage everyone to update and see what happens. And yes, I updated in place with no issues at all using this method.

Welp, within the past 2 days I had my 1st recurrence of this issue.

paralin · 2022-05-08T20:16:56Z

If this happens your SD card is probably bad. Best replace it.

Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation.

I have had the best luck with SanDisk Ultra microSDHC UHS-I Memory Card DSQUA4-032G-GN6MT which rarely fail. Regardless I would recommend swapping out SD cards once every year to be safe.

The kernel should be updated nevertheless with better error handling, in my opinion.

delgurth · 2022-05-08T21:33:31Z

If this happens your SD card is probably bad. Best replace it.

Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation.

I have had the best luck with SanDisk Ultra microSDHC UHS-I Memory Card DSQUA4-032G-GN6MT which rarely fail. Regardless I would recommend swapping out SD cards once every year to be safe.

The kernel should be updated nevertheless with better error handling, in my opinion.

Seems you are right. Can't even clear the partition table on the darn thing now. Will order some new cards. Thanks for the advice on what works well for you. Will also update my comment.

All3xJ · 2022-05-24T15:48:19Z

I did change the sysctl setting as described here: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
but the freeze still happens. It happens ALWAYS in this situation:

I noticed this when I used VS Code's SSH Remote Extension which install a bunch of stuff in the remote server. This cause a HUGE use of CPU by "node" program, and then everything freezes, I couldn't even connect via SSH because the server is freezed. My remote server is a simple Raspberry Pi 3 B+, so not very powerful.

Something curious: when all this happens, I noticed in both my host machine and remote server a process "kswapd0" that consumes much CPU too, can this be related?

Also, if the scheduler sends interrupts constantly to context switch and to pass to another process, so why a certain process that consumes too much CPU can freeze the pi? Shouldn't scheduler go on with other processes equally? Why can it monopolize the CPU and freeze the pi?

paralin · 2022-05-24T15:55:55Z

@All3xJ I'm regularly running entire Gentoo builds on the pi itself that completely saturate all cpu cores, and yet never run into this error after replacing the faulty microsd cards.

Try a brand new card.

JamesH65 · 2022-05-24T15:57:17Z

I did change the sysctl setting as described here: https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/ but the freeze still happens. It happens ALWAYS in this situation:

I noticed this when I used VS Code's SSH Remote Extension which install a bunch of stuff in the remote server. This cause a HUGE use of CPU by "node" program, and then everything freezes, I couldn't even connect via SSH because the server is freezed. My remote server is a simple Raspberry Pi 3 B+, so not very powerful.

Something curious: when all this happens, I noticed in both my host machine and remote server a process "kswapd0" that consumes much CPU too, can this be related?

Also, if the scheduler sends interrupts constantly to context switch and to pass to another process, so why a certain process that consumes too much CPU can freeze the pi? Shouldn't scheduler go on with other processes equally? Why can it monopolize the CPU and freeze the pi?

I believe you are running out of memory, which is why kswapd is going mad, rather than running out of CPU power.

paralin · 2022-05-24T16:28:56Z

@JamesH65 is probably right, you're out of memory.

My bad, I assumed that the comment was related to this issue and not some other unrelated thing :)

jdrch · 2022-05-24T16:39:36Z

If this happens your SD card is probably bad. Best replace it.

Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation.

I have had the best luck with SanDisk Ultra microSDHC UHS-I Memory Card DSQUA4-032G-GN6MT which rarely fail. Regardless I would recommend swapping out SD cards once every year to be safe.

The kernel should be updated nevertheless with better error handling, in my opinion.

All storage devices fail eventually ;) But yes, I have had the best luck with SanDisk, and their warranty (both length and replacement policy) is the best in the business.

Last Friday I retired the 32 GB card I'd been using since I got the Pi in 2018 and replaced it with a 200 GB card. We'll see what happens from there.

For those wondering about the process for doing so:

Use SD Card Copier (Raspberry Pi OS GUI app) to copy existing card to new card
Physically swap in new card
(If desired) Use GParted to:
- resize the shared data partition to its minimum size at the end of the partition table
- extend the root partition into the unallocated space resulting from the above

All3xJ · 2022-05-24T16:48:26Z

I believe you are running out of memory, which is why kswapd is going mad, rather than running out of CPU power.

@JamesH65 @paralin
Is there a way to fix this? Or to workaround this?

JamesH65 · 2022-05-25T08:49:04Z

Use less memory? Use Pi4 with more memory? Use a USB SSD and put a swap file on it, that might be faster than swap on the SD card?

What is the Pi doing that is using all the RAM?

All3xJ · 2022-05-25T17:38:11Z

What is the Pi doing that is using all the RAM?

I noticed this when I used VS Code's SSH Remote Extension which install and runs a bunch of stuff in the remote server. This cause a HUGE use of memory, and then everything freezes, I couldn't even connect via SSH because the server is freezed.

EDIT: I fixed by increasing the swap partition size!

shakin89 · 2022-10-21T05:53:11Z

I would like to thanks everyone for all the comments i've found on this page.
I've an rpi4 4G memory, and from this summer i've began to have this issue too frequently.

My system is a plain debian 11, with /boot/firmware and / mounted on sdcard; /home, /var and /tmp are mounted on ssd.
I think i'll move my / to ssd to and activate the swap, and first of all i'll replace my sdcard!!! 😄

jdrch · 2022-10-26T14:05:10Z

If this happens your SD card is probably bad. Best replace it.
Even relatively new SD cards can go bad, because they are not designed for repeated continuous writes to the same location over & over, nor continuous 24/7 operation.
I have had the best luck with SanDisk Ultra microSDHC UHS-I Memory Card DSQUA4-032G-GN6MT which rarely fail. Regardless I would recommend swapping out SD cards once every year to be safe.
The kernel should be updated nevertheless with better error handling, in my opinion.

All storage devices fail eventually ;) But yes, I have had the best luck with SanDisk, and their warranty (both length and replacement policy) is the best in the business.

Last Friday I retired the 32 GB card I'd been using since I got the Pi in 2018 and replaced it with a 200 GB card. We'll see what happens from there.

For those wondering about the process for doing so:
1. Use SD Card Copier (Raspberry Pi OS GUI app) to copy existing card to new card

2. Physically swap in new card

3. (If desired) Use GParted to:
   
   * resize the shared data partition to its minimum size at the end of the partition table
   * extend the root partition into the unallocated space resulting from the above

FWIW replacing the microSD card has fixed the problem. I think I've experienced only 1 crash since.

sirjeannot · 2023-05-23T05:48:31Z

Hello, just joining the thread on this issue, which also happens to my raspberry pi 3b+ rev1.2.

It had been running fine since ~2018 on raspbian armv7, however this issue has started ever since I migrated to aarch64 raspiOS in march. I'm currently running kernel 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux. It is running docker with HomeAssistant, Nodered, Signal-CLI and mqtt. The raspi locks up every 6 to 8 days.
Also:

cat /etc/debian_version
11.7

I've had to replace the SD card before this migration several times, but the symptoms of a dying SD were different, being random crashes of containers, not a full system lockup.
In this case replacing the SD doesn't change anything. (suing Sandisk 16/32GiB A1 class 10)
I'm currently trying the proposed fix (applied it 3 days ago, currently monitoring):

sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5
sudo sysctl -p

If it doesn't work, I'll move the swap file to a USB stick and likely the whole system to a USB SSD afterwards.

edit:
the system partly locked up with the fix. the system still runs (replies to pings) but any write will not complete.
now moving swap to a usb key (not ssd, no power budget left)

edit2 (+1month):
it appears the system running low on memory on many occasions mainly due to 2 docker containers requiring more and more ressources (homeassistant and signal-cli in json-rpc mode). Some are now set to restart daily to contain the increasing memory consumption.
the swap on a usb key did work out quite well. However the USB key died in a few weeks : the system crashed within minutes of booting, corrupting docker log files ('\x00' error).
I've "moved" the swap to zram with aggressive values.

vm.vfs_cache_pressure = 500
vm.swappiness = 100
vm.dirty_background_ratio = 1
vm.dirty_ratio = 50

It's been a few days, some more time is needed to see if it helps mitigating the issue.
SSD migration still needs to be done, but zram swap doesn't interfere with this upgrade.

edit3 (+1.5months):
so far there are no more hangs with the zram swap. I've graphed the ram and swap usage to get a view of what is happening. it shows there is a major memory consumption after a few days with my workload, with only a few MiB of memory left even in the swap. so the swap increasing slowness on the sd-card was the culprit. no more hangs but I'll have to migrate to a sbc with more memory.

edit4 (+3 months)
still using the raspi3b+ 1GiB of RAM and no hang ups. The memory is heavily occupied but no more crashes.
over the last 1.5months (no reboot)

edit5 (+10 months)
still using raspi3b+ but encountering memory depletion and usb ssd disconnects (other kernel module issue, not power replated).
moved back to armv7l and emmc disk, not a single issue since : 300MiB+ free and no hangups or disconnects.

djr-spectrummfg · 2023-06-28T21:13:34Z

I am able to reproduce this on a CM4 (lite, 1GB ram, no wifi, no heatsink) using the official CM4 IO board and a Sandisk EVO Select 128GB microSD card (which has not seen much abuse). I am using a 64bit kernel, v5.15.34-v8, built using Yocto and meta-raspberrypi.

I can trigger it by compiling a large C++ project (https://github.com/OpenLightingProject/ola using make -j4) on the RPi. After some time the build will often hang and eventually produce the hung task kernel message. As well as stressing the CPU, this build does seem to consume most of the available RAM as at times I have seen oom-killer messages.

Sample of kernel message:

[ 7250.956746] INFO: task kworker/2:1:312 blocked for more than 120 seconds.
[ 7250.963695]       Tainted: G         C O      5.15.34-v8 #1
[ 7250.969448] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7250.977466] task:kworker/2:1     state:D stack:    0 pid:  312 ppid:     2 flags:0x00000208
[ 7250.986034] Workqueue: events_freezable mmc_rescan
[ 7250.990939] Call trace:
[ 7250.993481]  __switch_to+0x110/0x164
[ 7250.997144]  __schedule+0x314/0x8a0
[ 7251.000768]  schedule+0x50/0x10c
[ 7251.004065]  __mmc_claim_host+0xc0/0x1f0
[ 7251.008084]  mmc_get_card+0x40/0x50
[ 7251.011652]  mmc_sd_detect+0x2c/0xa0
[ 7251.015305]  mmc_rescan+0x98/0x2f0
[ 7251.018776]  process_one_work+0x1e0/0x48c
[ 7251.022871]  worker_thread+0x144/0x470
[ 7251.027399]  kthread+0x144/0x160
[ 7251.030716]  ret_from_fork+0x10/0x20

gec75 · 2024-02-16T15:17:21Z

I also found a similar problem in a completely different setup.
https://bugzilla.kernel.org/show_bug.cgi?id=218493

sresam89 · 2024-03-03T16:03:14Z

I finding the same issue on newer kernel
Trying not to cross-post
Issue linked here

djr-spectrummfg · 2024-04-15T18:58:51Z

Observed this again after updating my kernel to 5.15.92 (still on a CM4 with the same uSD card, this time without running anything stressing the CPU/RAM/SD card IO):

[236227.555571] INFO: task kworker/1:0:1010415 blocked for more than 120 seconds.
[236227.562933]       Tainted: G         C O      5.15.92-v8 #1
[236227.569081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[236227.577030] task:kworker/1:0     state:D stack:    0 pid:1010415 ppid:     2 flags:0x00000208
[236227.585685] Workqueue: events_freezable mmc_rescan
[236227.590613] Call trace:
[236227.593207]  __switch_to+0x110/0x164
[236227.597452]  __schedule+0x32c/0x8f4
[236227.601081]  schedule+0x70/0x12c
[236227.604426]  __mmc_claim_host+0xc0/0x1f0
[236227.608463]  mmc_get_card+0x3c/0x50
[236227.612079]  mmc_sd_detect+0x28/0xa0
[236227.616265]  mmc_rescan+0x94/0x2f0
[236227.619794]  process_one_work+0x1dc/0x4a4
[236227.623935]  worker_thread+0x144/0x490
[236227.627810]  kthread+0x140/0x154
[236227.631256]  ret_from_fork+0x10/0x20

jdrch · 2024-04-15T19:03:27Z

updating my kernel to 5.15.92

Isn't Raspberry Pi OS on v6+ for all devices? I'd think that's where any fixes are most likely to be.

djr-spectrummfg · 2024-04-15T19:06:06Z

I'm not using Raspberry Pi OS, I'm using an image built with Yocto kirkstone and meta-raspberrypi. This kernel update was a result of the update in meta-raspberrypi.

JamesH65 closed this as completed Jan 18, 2019

JamesH65 reopened this Jan 18, 2019

JamesH65 mentioned this issue Jan 18, 2019

Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ raspberrypi/firmware#1093

Closed

lategoodbye mentioned this issue Jan 20, 2019

mmc0 Timeout waiting for hardware interrupt #2802

Open

jdrch mentioned this issue Apr 4, 2020

Raspberry Pi locking up jdrch/Hardware#108

Closed

ChrisLanks mentioned this issue Nov 16, 2020

Possible fix for: Workqueue: events_freezable mmc_rescan ChrisLanks/HAKboard#6

Closed

ctengel mentioned this issue Jun 10, 2022

Investigate recent crashes ctengel/linkmeddle#74

Closed

SvenVD mentioned this issue Nov 5, 2022

vlc memory leak SvenVD/rpisurv#170

Closed

This was referenced Mar 3, 2024

Workqueue: kmmcd mmc_rescan crash with Raspbian kernel 4.4.38-v7 #1885

Open

Pi freezing up 4 times a day - Workqueue: events_freezable mmc_rescan raspberrypi/firmware#1522

Open

Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ #2810

Workqueue: events_freezable mmc_rescan crash with Raspbian kernel 4.14.79-v7+ #2810

Comments

op-ecoprintq commented Jan 14, 2019

ldrolez commented Jan 18, 2019

JamesH65 commented Jan 18, 2019

JamesH65 commented Jan 18, 2019 • edited Loading

JamesH65 commented Jan 18, 2019

op-ecoprintq commented Jan 18, 2019

pelwell commented Jan 18, 2019

Alexurb commented Jan 19, 2019

Alexurb commented Jan 21, 2019

Alexurb commented Jan 23, 2019

Alexurb commented Jan 23, 2019

peabody commented Aug 29, 2019

jdrch commented Apr 4, 2020

paralin commented Sep 22, 2020

jdrch commented Nov 22, 2020 • edited Loading

ranma commented Feb 23, 2021

jdrch commented Feb 23, 2021

ranma commented Mar 6, 2021

ranma commented Mar 6, 2021 • edited Loading

jdrch commented Mar 6, 2021 • edited Loading

ranma commented Mar 7, 2021

ranma commented Mar 7, 2021 • edited Loading

nickidw commented Dec 24, 2021

joetriskaide commented Apr 10, 2022 • edited Loading

Here is my setup:

jdrch commented Apr 10, 2022 • edited Loading

All3xJ commented Apr 24, 2022

delgurth commented May 8, 2022 • edited Loading

jdrch commented May 8, 2022

paralin commented May 8, 2022 • edited Loading

delgurth commented May 8, 2022

All3xJ commented May 24, 2022

paralin commented May 24, 2022

JamesH65 commented May 24, 2022

paralin commented May 24, 2022

jdrch commented May 24, 2022

All3xJ commented May 24, 2022 • edited Loading

JamesH65 commented May 25, 2022

All3xJ commented May 25, 2022 • edited Loading

shakin89 commented Oct 21, 2022

jdrch commented Oct 26, 2022

sirjeannot commented May 23, 2023 • edited Loading

djr-spectrummfg commented Jun 28, 2023

gec75 commented Feb 16, 2024

sresam89 commented Mar 3, 2024 • edited Loading

djr-spectrummfg commented Apr 15, 2024

jdrch commented Apr 15, 2024

djr-spectrummfg commented Apr 15, 2024

JamesH65 commented Jan 18, 2019 •

edited

Loading

jdrch commented Nov 22, 2020 •

edited

Loading

ranma commented Mar 6, 2021 •

edited

Loading

jdrch commented Mar 6, 2021 •

edited

Loading

ranma commented Mar 7, 2021 •

edited

Loading

joetriskaide commented Apr 10, 2022 •

edited

Loading

jdrch commented Apr 10, 2022 •

edited

Loading

delgurth commented May 8, 2022 •

edited

Loading

paralin commented May 8, 2022 •

edited

Loading

All3xJ commented May 24, 2022 •

edited

Loading

All3xJ commented May 25, 2022 •

edited

Loading

sirjeannot commented May 23, 2023 •

edited

Loading

sresam89 commented Mar 3, 2024 •

edited

Loading