-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to handle kernel paging request #2107
Comments
The immediate cause of the crash is memory corruption. The value in r3 is being used as the basis for a memory access - it's probably a structure pointer and we're trying to read the value at offset 12 from it. Unfortunately the value in r3 is the ASCII string "List", which is likely to be either a FOURCC or part of a string rather than a valid address. The usual cause of random memory corruption like this is a power supply problem. What does |
Well, it's rebooted since, but now I get /opt/vc/bin/vcgencmd get_throttledthrottled=0x0 I'll keep this in mind if I see this again. |
For whatever reason it just happened again, see attached log. Unable_to_handle_kernel_paging_request_25982.txt The events up to 1795 is me testing to power cycle the powered USB hub which connects my two DVB-T2 dongles. I've rigged a home automation thing to allow me to do that programmatically. Even now I get throttled=0x0. ideas? The machine has a good power supply, and all USB stuff is on powered hubs. |
@pelwell Any suggestons/ideas? |
The second log includes lots of WARNINGs from the kernel in module_put, presumably because a module was being unloaded as the result of the hub being power cycled and something wasn't expecting it. My guess would be that the DVB_T adapter driver is responsible for the crashes, possibly as a delayed result of the fallout from the untimely unplugging. It's a feature of memory corruption problems that the symptoms can occur long after the original infection, and that the most common symptom is death. |
Closing due to lack of activity. Reopen if you feel this issue is still relevant. |
commit 0b83c86 upstream. The function blk_revalidate_disk_zones() calls the function disk_update_zone_resources() after freezing the device queue. In turn, disk_update_zone_resources() calls queue_limits_start_update() which takes a queue limits mutex lock, resulting in the ordering: q->q_usage_counter check -> q->limits_lock. However, the usual ordering is to always take a queue limit lock before freezing the queue to commit the limits updates, e.g., the code pattern: lim = queue_limits_start_update(q); ... blk_mq_freeze_queue(q); ret = queue_limits_commit_update(q, &lim); blk_mq_unfreeze_queue(q); Thus, blk_revalidate_disk_zones() introduces a potential circular locking dependency deadlock that lockdep sometimes catches with the splat: [ 51.934109] ====================================================== [ 51.935916] WARNING: possible circular locking dependency detected [ 51.937561] 6.12.0+ #2107 Not tainted [ 51.938648] ------------------------------------------------------ [ 51.940351] kworker/u16:4/157 is trying to acquire lock: [ 51.941805] ffff9fff0aa0bea8 (&q->limits_lock){+.+.}-{4:4}, at: disk_update_zone_resources+0x86/0x170 [ 51.944314] but task is already holding lock: [ 51.945688] ffff9fff0aa0b890 (&q->q_usage_counter(queue)#3){++++}-{0:0}, at: blk_revalidate_disk_zones+0x15f/0x340 [ 51.948527] which lock already depends on the new lock. [ 51.951296] the existing dependency chain (in reverse order) is: [ 51.953708] -> #1 (&q->q_usage_counter(queue)#3){++++}-{0:0}: [ 51.956131] blk_queue_enter+0x1c9/0x1e0 [ 51.957290] blk_mq_alloc_request+0x187/0x2a0 [ 51.958365] scsi_execute_cmd+0x78/0x490 [scsi_mod] [ 51.959514] read_capacity_16+0x111/0x410 [sd_mod] [ 51.960693] sd_revalidate_disk.isra.0+0x872/0x3240 [sd_mod] [ 51.962004] sd_probe+0x2d7/0x520 [sd_mod] [ 51.962993] really_probe+0xd5/0x330 [ 51.963898] __driver_probe_device+0x78/0x110 [ 51.964925] driver_probe_device+0x1f/0xa0 [ 51.965916] __driver_attach_async_helper+0x60/0xe0 [ 51.967017] async_run_entry_fn+0x2e/0x140 [ 51.968004] process_one_work+0x21f/0x5a0 [ 51.968987] worker_thread+0x1dc/0x3c0 [ 51.969868] kthread+0xe0/0x110 [ 51.970377] ret_from_fork+0x31/0x50 [ 51.970983] ret_from_fork_asm+0x11/0x20 [ 51.971587] -> #0 (&q->limits_lock){+.+.}-{4:4}: [ 51.972479] __lock_acquire+0x1337/0x2130 [ 51.973133] lock_acquire+0xc5/0x2d0 [ 51.973691] __mutex_lock+0xda/0xcf0 [ 51.974300] disk_update_zone_resources+0x86/0x170 [ 51.975032] blk_revalidate_disk_zones+0x16c/0x340 [ 51.975740] sd_zbc_revalidate_zones+0x73/0x160 [sd_mod] [ 51.976524] sd_revalidate_disk.isra.0+0x465/0x3240 [sd_mod] [ 51.977824] sd_probe+0x2d7/0x520 [sd_mod] [ 51.978917] really_probe+0xd5/0x330 [ 51.979915] __driver_probe_device+0x78/0x110 [ 51.981047] driver_probe_device+0x1f/0xa0 [ 51.982143] __driver_attach_async_helper+0x60/0xe0 [ 51.983282] async_run_entry_fn+0x2e/0x140 [ 51.984319] process_one_work+0x21f/0x5a0 [ 51.985873] worker_thread+0x1dc/0x3c0 [ 51.987289] kthread+0xe0/0x110 [ 51.988546] ret_from_fork+0x31/0x50 [ 51.989926] ret_from_fork_asm+0x11/0x20 [ 51.991376] other info that might help us debug this: [ 51.994127] Possible unsafe locking scenario: [ 51.995651] CPU0 CPU1 [ 51.996694] ---- ---- [ 51.997716] lock(&q->q_usage_counter(queue)#3); [ 51.998817] lock(&q->limits_lock); [ 52.000043] lock(&q->q_usage_counter(queue)#3); [ 52.001638] lock(&q->limits_lock); [ 52.002485] *** DEADLOCK *** Prevent this issue by moving the calls to blk_mq_freeze_queue() and blk_mq_unfreeze_queue() around the call to queue_limits_commit_update() in disk_update_zone_resources(). In case of revalidation failure, the call to disk_free_zone_resources() in blk_revalidate_disk_zones() is still done with the queue frozen as before. Fixes: 843283e ("block: Fake max open zones limit when there is no limit") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241126104705.183996-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hi,
On a relatively freshly booted machine I got the attached error:
[ 1401.162539] Unable to handle kernel paging request at virtual address 74736958
[ 1401.175193] pgd = a19fc000
[ 1401.183318] [74736958] *pgd=00000000
[ 1401.191742] Internal error: Oops: 5 [#1] SMP ARM
[ 1401.200983] Modules linked in: dm_mod rc_pinnacle_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux tda18271 cxd2820r em28xx_dvb bridge stp llc veth dvb_core em28xx tveeprom v4l2_common video6
[ 1401.247309] CPU: 3 PID: 4840 Comm: uptime Not tainted 4.9.33-v7+ #35
[ 1401.258335] Hardware name: BCM2835
[ 1401.266271] task: b0f3d880 task.stack: a1a60000
[ 1401.275256] PC is at vma_interval_tree_insert+0x3c/0x94
[ 1401.284880] LR is at 0x16
[ 1401.291682] pc : [<8023aa5c>] lr : [<00000016>] psr: 20070093
[ 1401.291682] sp : a1a61e40 ip : a02e2933 fp : a1a61e5c
[ 1401.311536] r10: b0e0ddc0 r9 : 00000000 r8 : b16a94dc
[ 1401.320865] r7 : af65ce18 r6 : ae5be160 r5 : 00000006 r4 : a02e292b
[ 1401.331476] r3 : 7473694c r2 : 00080000 r1 : b16a94f0 r0 : ae5be160
[ 1401.342181] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[ 1401.353503] Control: 10c5387d Table: 219fc06a DAC: 00000055
[ 1401.363222] Process uptime (pid: 4840, stack limit = 0xa1a60210)
[ 1401.373196] Stack: (0xa1a61e40 to 0xa1a62000)
[ 1401.381409] 1e40: b16a94f0 ae5be19c b16a94f0 ae5be160 a1a61ec4 a1a61e60 80245238 8023aa2c
[ 1401.397655] 1e60: a1a61e84 00000000 80269498 00000000 00000000 00000001 80c7b4d8 00000006
[ 1401.414219] 1e80: 76d7d000 76d8e000 b130c3c0 b16a94f0 00000000 00000000 00000000 ae5be108
[ 1401.431189] 1ea0: ae5be160 00000000 80c7b4d8 00006000 00000001 76d8c000 a1a61ef4 a1a61ec8
[ 1401.448911] 1ec0: 8024583c 8024501c ae5be108 00000000 ae5be160 00000000 00000070 76d7d000
[ 1401.467369] 1ee0: b0e0ddc0 00000075 a1a61f04 a1a61ef8 8024621c 802456c4 a1a61f5c a1a61f08
[ 1401.486187] 1f00: 80248ce8 802461f8 00000070 00000000 b130c3c0 00000006 00000000 b130c3c0
[ 1401.505568] 1f20: a1a60000 00000000 0000000f 00000000 a1a61f5c 76d8c000 00000000 00000000
[ 1401.525714] 1f40: 00000000 00000004 76d8c000 00000000 a1a61f94 a1a61f60 80248ee0 80248af4
[ 1401.546444] 1f60: 00000070 76d77000 a1a61fa4 ae5be160 76fce588 00000000 00000005 0000007d
[ 1401.567772] 1f80: 80108244 a1a60000 a1a61fa4 a1a61f98 80248fb4 80248d90 00000000 a1a61fa8
[ 1401.589474] 1fa0: 801080c0 80248fa4 76fce588 00000000 76d7d000 0000f000 00000000 00006000
[ 1401.611737] 1fc0: 76fce588 00000000 00000005 0000007d 00016208 76fce6e0 7ed3f6ec 7ed3f66c
[ 1401.634629] 1fe0: 00000000 7ed3f48c 76fb45ec 76fc79ac 20070010 76d7d000 6e697475 64707075
[ 1401.657619] [<8023aa5c>] (vma_interval_tree_insert) from [<80245238>] (__vma_adjust+0x228/0x6a8)
[ 1401.680909] [<80245238>] (__vma_adjust) from [<8024583c>] (__split_vma+0x184/0x194)
[ 1401.702701] [<8024583c>] (__split_vma) from [<8024621c>] (split_vma+0x30/0x3c)
[ 1401.716945] [<8024621c>] (split_vma) from [<80248ce8>] (mprotect_fixup+0x200/0x29c)
[ 1401.737919] [<80248ce8>] (mprotect_fixup) from [<80248ee0>] (do_mprotect_pkey+0x15c/0x214)
[ 1401.759128] [<80248ee0>] (do_mprotect_pkey) from [<80248fb4>] (SyS_mprotect+0x1c/0x20)
[ 1401.779497] [<80248fb4>] (SyS_mprotect) from [<801080c0>] (ret_fast_syscall+0x0/0x1c)
[ 1401.799238] Code: e3a04000 e2453001 e083e62e ea000007 (e593200c)
[ 1401.832400] ---[ end trace 0d6f3de60187ba86 ]---
It seems to have coincided with cron, as journalctl shows:
Jul 09 00:00:01 rpi2.lan cron[4831]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons)
Jul 09 00:00:01 rpi2.lan cron[4832]: (root) CMD (/usr/local/bin/check_dvb_adapters)
Jul 09 00:00:01 rpi2.lan cron[4834]: (root) CMD (/usr/local/bin/check_uptime)
Jul 09 00:00:01 rpi2.lan cron[4833]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons)
Jul 09 00:00:02 rpi2.lan kernel: Unable to handle kernel paging request at virtual address 74736958
Jul 09 00:00:02 rpi2.lan kernel: pgd = a19fc000
Jul 09 00:00:03 rpi2.lan kernel: [74736958] *pgd=00000000
Jul 09 00:00:03 rpi2.lan kernel: Internal error: Oops: 5 [#1] SMP ARM
Jul 09 00:00:03 rpi2.lan kernel: Modules linked in: dm_mod rc_pinnacle_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux tda18271 cxd2820r em28xx_dvb bridge stp llc veth dvb_core em28xx tveeprom v4l2_common videodev media evdev ftdi_sio usbserial bcm2835_gpiomem uio_pdrv_genirq uio fixed sch_fq_codel nfsd ip_tables x_tables ipv6
Jul 09 00:00:03 rpi2.lan kernel: CPU: 3 PID: 4840 Comm: uptime Not tainted 4.9.33-v7+ #35
Jul 09 00:00:03 rpi2.lan kernel: Hardware name: BCM2835
Jul 09 00:00:03 rpi2.lan kernel: task: b0f3d880 task.stack: a1a60000
Jul 09 00:00:03 rpi2.lan kernel: PC is at vma_interval_tree_insert+0x3c/0x94
Jul 09 00:00:03 rpi2.lan kernel: LR is at 0x16
Jul 09 00:00:03 rpi2.lan kernel: pc : [<8023aa5c>] lr : [<00000016>] psr: 20070093
sp : a1a61e40 ip : a02e2933 fp : a1a61e5c
Jul 09 00:00:03 rpi2.lan kernel: r10: b0e0ddc0 r9 : 00000000 r8 : b16a94dc
Jul 09 00:00:03 rpi2.lan kernel: r7 : af65ce18 r6 : ae5be160 r5 : 00000006 r4 : a02e292b
Jul 09 00:00:03 rpi2.lan kernel: r3 : 7473694c r2 : 00080000 r1 : b16a94f0 r0 : ae5be160
Jul 09 00:00:03 rpi2.lan kernel: Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Jul 09 00:00:03 rpi2.lan kernel: Control: 10c5387d Table: 219fc06a DAC: 00000055
... and the rest of the dmesg dump. Any Ida what might have caused it? free(1) shows plenty of memory
free
Mem: 815580 240976 40692 1252 533912 469512
Swap: 0 0 0
and if paging fails due to oom, oomk would be making noise, wouldn't it? No I/O eerors and dd shows working disks.
internal_paging_error.txt
The text was updated successfully, but these errors were encountered: