Merge pull request #1 from torvalds/master #55

coder280 · 2013-11-04T07:58:14Z

1103

1102

Add pipe_lock/unlock for splice_write to avoid oops by following competition: (1) An application gets fds of a trace buffer, virtio-serial, pipe. (2) The application does fork() (3) The processes execute splice_read(trace buffer) and splice_write(virtio-serial) via same pipe. <parent> <child> get fds of a trace buffer, virtio-serial, pipe | fork()----------create--------+ | | splice(read) | ---+ splice(write) | +-- no competition | splice(read) | | splice(write) ---+ | | splice(read) | splice(write) splice(read) ------ competition | splice(write) Two processes share a pipe_inode_info structure. If the child execute splice(read) when the parent tries to execute splice(write), the structure can be broken. Existing virtio-serial driver does not get lock for the structure in splice_write, so this competition will induce oops. <oops messages> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 PGD 7223e067 PUD 72391067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: lockd bnep bluetooth rfkill sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore pcspkr virtio_net virtio_balloon i2c_piix4 i2c_core microcode uinput floppy CPU: 0 PID: 1072 Comm: compete-test Not tainted 3.10.0ws+ torvalds#55 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880071b98000 ti: ffff88007b55e000 task.ti: ffff88007b55e000 RIP: 0010:[<ffffffff811a6b5f>] [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 RSP: 0018:ffff88007b55fd78 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff88007b55fe20 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff88007a95ba30 RDI: ffff880036f9e6c0 RBP: ffff88007b55fda8 R08: 00000000000006ec R09: ffff880077626708 R10: 0000000000000003 R11: ffffffff8139ca59 R12: ffff88007a95ba30 R13: 0000000000000000 R14: ffffffff8139dd00 R15: ffff880036f9e6c0 FS: 00007f2e2e3a0740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000071bd1000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffff8139ca59 ffff88007b55fe20 ffff880036f9e6c0 ffffffff8139dd00 ffff8800776266c0 ffff880077626708 ffff88007b55fde8 ffffffff811a6e8e ffff88007b55fde8 ffffffff8139ca59 ffff880036f9e6c0 ffff88007b55fe20 Call Trace: [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0 [<ffffffff8139dd00>] ? virtcons_restore+0x100/0x100 [<ffffffff811a6e8e>] __splice_from_pipe+0x7e/0x90 [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0 [<ffffffff8139d739>] port_fops_splice_write+0xe9/0x140 [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120 [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0 [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110 [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0 [<ffffffff8161facf>] tracesys+0xdd/0xe2 Code: 49 8b 87 80 00 00 00 4c 8d 24 d0 8b 53 04 41 8b 44 24 0c 4d 8b 6c 24 10 39 d0 89 03 76 02 89 13 49 8b 44 24 10 4c 89 e6 4c 89 ff <ff> 50 18 85 c0 0f 85 aa 00 00 00 48 89 da 4c 89 e6 4c 89 ff 41 RIP [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 RSP <ffff88007b55fd78> CR2: 0000000000000018 ---[ end trace 24572beb7764de59 ]--- V2: Fix a locking problem for error V3: Add Reviewed-by lines and stable@ line in sign-off area Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Libin <huawei.libin@huawei.com> Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: huawei.libin@huawei.com Cc: wangyijing@huawei.com Cc: fenghua.yu@intel.com Cc: guohanjun@huawei.com Cc: paul.gortmaker@windriver.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>

commit 2b4fbf0 upstream. Add pipe_lock/unlock for splice_write to avoid oops by following competition: (1) An application gets fds of a trace buffer, virtio-serial, pipe. (2) The application does fork() (3) The processes execute splice_read(trace buffer) and splice_write(virtio-serial) via same pipe. <parent> <child> get fds of a trace buffer, virtio-serial, pipe | fork()----------create--------+ | | splice(read) | ---+ splice(write) | +-- no competition | splice(read) | | splice(write) ---+ | | splice(read) | splice(write) splice(read) ------ competition | splice(write) Two processes share a pipe_inode_info structure. If the child execute splice(read) when the parent tries to execute splice(write), the structure can be broken. Existing virtio-serial driver does not get lock for the structure in splice_write, so this competition will induce oops. <oops messages> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 PGD 7223e067 PUD 72391067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: lockd bnep bluetooth rfkill sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd soundcore pcspkr virtio_net virtio_balloon i2c_piix4 i2c_core microcode uinput floppy CPU: 0 PID: 1072 Comm: compete-test Not tainted 3.10.0ws+ #55 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880071b98000 ti: ffff88007b55e000 task.ti: ffff88007b55e000 RIP: 0010:[<ffffffff811a6b5f>] [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 RSP: 0018:ffff88007b55fd78 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff88007b55fe20 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff88007a95ba30 RDI: ffff880036f9e6c0 RBP: ffff88007b55fda8 R08: 00000000000006ec R09: ffff880077626708 R10: 0000000000000003 R11: ffffffff8139ca59 R12: ffff88007a95ba30 R13: 0000000000000000 R14: ffffffff8139dd00 R15: ffff880036f9e6c0 FS: 00007f2e2e3a0740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000071bd1000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffff8139ca59 ffff88007b55fe20 ffff880036f9e6c0 ffffffff8139dd00 ffff8800776266c0 ffff880077626708 ffff88007b55fde8 ffffffff811a6e8e ffff88007b55fde8 ffffffff8139ca59 ffff880036f9e6c0 ffff88007b55fe20 Call Trace: [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0 [<ffffffff8139dd00>] ? virtcons_restore+0x100/0x100 [<ffffffff811a6e8e>] __splice_from_pipe+0x7e/0x90 [<ffffffff8139ca59>] ? alloc_buf.isra.13+0x39/0xb0 [<ffffffff8139d739>] port_fops_splice_write+0xe9/0x140 [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120 [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0 [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110 [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0 [<ffffffff8161facf>] tracesys+0xdd/0xe2 Code: 49 8b 87 80 00 00 00 4c 8d 24 d0 8b 53 04 41 8b 44 24 0c 4d 8b 6c 24 10 39 d0 89 03 76 02 89 13 49 8b 44 24 10 4c 89 e6 4c 89 ff <ff> 50 18 85 c0 0f 85 aa 00 00 00 48 89 da 4c 89 e6 4c 89 ff 41 RIP [<ffffffff811a6b5f>] splice_from_pipe_feed+0x6f/0x130 RSP <ffff88007b55fd78> CR2: 0000000000000018 ---[ end trace 24572beb7764de59 ]--- V2: Fix a locking problem for error V3: Add Reviewed-by lines and stable@ line in sign-off area Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Kamal Mostafa <kamal@canonical.com>

commit bec4596 upstream. drop_monitor calls several sleeping functions while in atomic context. BUG: sleeping function called from invalid context at mm/slub.c:943 in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2 Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ torvalds#55 Call Trace: [<ffffffff810697ca>] __might_sleep+0xca/0xf0 [<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0 [<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130 [<ffffffff815343fb>] __alloc_skb+0x4b/0x230 [<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor] [<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor] [<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor] [<ffffffff810568e0>] process_one_work+0x130/0x4c0 [<ffffffff81058249>] worker_thread+0x159/0x360 [<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240 [<ffffffff8105d403>] kthread+0x93/0xa0 [<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80 [<ffffffff816be6d0>] ? gs_change+0xb/0xb Rework the logic to call the sleeping functions in right context. Use standard timer/workqueue api to let system chose any cpu to perform the allocation and netlink send. Also avoid a loop if reset_per_cpu_data() cannot allocate memory : use mod_timer() to wait 1/10 second before next try. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> [PG: diffstat here is less by one line due to blank line removal] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

WARNING: space prohibited between function name and open parenthesis '(' torvalds#55: FILE: kernel/posix-timers.c:345: + sizeof (struct k_itimer), 0, ERROR: do not use assignment in if condition #70: FILE: kernel/posix-timers.c:504: + if ((event->sigev_notify & SIGEV_THREAD_ID) && total: 1 errors, 1 warnings, 192 lines checked ./patches/kernel-posix-timersc-code-clean-up.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Fabian Frederick <fabf@skynet.be> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

…-checkpatch-fixes WARNING: line over 80 characters torvalds#39: FILE: mm/internal.h:207: + * pte lock is held(spinlock), which implies preemption disabled. WARNING: line over 80 characters torvalds#55: FILE: mm/rmap.c:988: + * pte lock(a spinlock) is held, which implies preemption disabled. total: 0 errors, 2 warnings, 44 lines checked ./patches/mm-use-the-light-version-__mod_zone_page_state-in-mlocked_vma_newpage.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

WARNING: space prohibited between function name and open parenthesis '(' torvalds#55: FILE: kernel/posix-timers.c:345: + sizeof (struct k_itimer), 0, ERROR: do not use assignment in if condition #70: FILE: kernel/posix-timers.c:504: + if ((event->sigev_notify & SIGEV_THREAD_ID) && total: 1 errors, 1 warnings, 192 lines checked ./patches/kernel-posix-timersc-code-clean-up.patch has style problems, please review. If any of these errors are false positives, please report them to the maintainer, see CHECKPATCH in MAINTAINERS. Please run checkpatch prior to sending patches Cc: Fabian Frederick <fabf@skynet.be> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

[ Upstream commit dbc153f ] A crash was found when dumping SMC-D connections. It can be reproduced by following steps: - run nginx/wrk test: smc_run nginx smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL> - continuously dump SMC-D connections in parallel: watch -n 1 'smcss -D' BUG: kernel NULL pointer dereference, address: 0000000000000030 CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ torvalds#55 RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] ? __kmalloc_node_track_caller+0x35d/0x430 ? __alloc_skb+0x77/0x170 smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x1f8/0x420 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x160 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 ? __handle_mm_fault+0x2b0/0x6c0 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x69/0x180 entry_SYSCALL_64_after_hwframe+0x6e/0x76 It is possible that the connection is in process of being established when we dump it. Assumed that the connection has been registered in a link group by smc_conn_create() but the rmb_desc has not yet been initialized by smc_buf_create(), thus causing the illegal access to conn->rmb_desc. So fix it by checking before dump. Fixes: 4b1b7d3 ("net/smc: add SMC-D diag support") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>

…_bind returns err The pointer need to be set to NULL, otherwise KASAN complains about use-after-free. Because in mtk_drm_bind, all private's drm are set as follows. private->all_drm_private[i]->drm = drm; And drm will be released by drm_dev_put in case mtk_drm_kms_init returns failure. However, the shutdown path still accesses the previous allocated memory in drm_atomic_helper_shutdown. [ 84.874820] watchdog: watchdog0: watchdog did not stop! [ 86.512054] ================================================================== [ 86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378 [ 86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1 [ 86.515213] [ 86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty torvalds#55 [ 86.516752] Hardware name: Unknown Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022 [ 86.517960] Call trace: [ 86.518333] show_stack+0x20/0x38 (C) [ 86.518891] dump_stack_lvl+0x90/0xd0 [ 86.519443] print_report+0xf8/0x5b0 [ 86.519985] kasan_report+0xb4/0x100 [ 86.520526] __asan_report_load8_noabort+0x20/0x30 [ 86.521240] drm_atomic_helper_shutdown+0x33c/0x378 [ 86.521966] mtk_drm_shutdown+0x54/0x80 [ 86.522546] platform_shutdown+0x64/0x90 [ 86.523137] device_shutdown+0x260/0x5b8 [ 86.523728] kernel_restart+0x78/0xf0 [ 86.524282] __do_sys_reboot+0x258/0x2f0 [ 86.524871] __arm64_sys_reboot+0x90/0xd8 [ 86.525473] invoke_syscall+0x74/0x268 [ 86.526041] el0_svc_common.constprop.0+0xb0/0x240 [ 86.526751] do_el0_svc+0x4c/0x70 [ 86.527251] el0_svc+0x4c/0xc0 [ 86.527719] el0t_64_sync_handler+0x144/0x168 [ 86.528367] el0t_64_sync+0x198/0x1a0 [ 86.528920] [ 86.529157] The buggy address belongs to the physical page: [ 86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc [ 86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000 [ 86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000 [ 86.534511] page dumped because: kasan: bad access detected [ 86.535323] [ 86.535559] Memory state around the buggy address: [ 86.536265] ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.537314] ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.544733] ^ [ 86.551057] ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.557510] ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.563928] ================================================================== [ 86.571093] Disabling lock debugging due to kernel taint [ 86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b [ 86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f] ... Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com>

…_bind returns err The pointer need to be set to NULL, otherwise KASAN complains about use-after-free. Because in mtk_drm_bind, all private's drm are set as follows. private->all_drm_private[i]->drm = drm; And drm will be released by drm_dev_put in case mtk_drm_kms_init returns failure. However, the shutdown path still accesses the previous allocated memory in drm_atomic_helper_shutdown. [ 84.874820] watchdog: watchdog0: watchdog did not stop! [ 86.512054] ================================================================== [ 86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378 [ 86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1 [ 86.515213] [ 86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty #55 [ 86.516752] Hardware name: Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022 [ 86.517960] Call trace: [ 86.518333] show_stack+0x20/0x38 (C) [ 86.518891] dump_stack_lvl+0x90/0xd0 [ 86.519443] print_report+0xf8/0x5b0 [ 86.519985] kasan_report+0xb4/0x100 [ 86.520526] __asan_report_load8_noabort+0x20/0x30 [ 86.521240] drm_atomic_helper_shutdown+0x33c/0x378 [ 86.521966] mtk_drm_shutdown+0x54/0x80 [ 86.522546] platform_shutdown+0x64/0x90 [ 86.523137] device_shutdown+0x260/0x5b8 [ 86.523728] kernel_restart+0x78/0xf0 [ 86.524282] __do_sys_reboot+0x258/0x2f0 [ 86.524871] __arm64_sys_reboot+0x90/0xd8 [ 86.525473] invoke_syscall+0x74/0x268 [ 86.526041] el0_svc_common.constprop.0+0xb0/0x240 [ 86.526751] do_el0_svc+0x4c/0x70 [ 86.527251] el0_svc+0x4c/0xc0 [ 86.527719] el0t_64_sync_handler+0x144/0x168 [ 86.528367] el0t_64_sync+0x198/0x1a0 [ 86.528920] [ 86.529157] The buggy address belongs to the physical page: [ 86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc [ 86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000 [ 86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000 [ 86.534511] page dumped because: kasan: bad access detected [ 86.535323] [ 86.535559] Memory state around the buggy address: [ 86.536265] ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.537314] ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.544733] ^ [ 86.551057] ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.557510] ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.563928] ================================================================== [ 86.571093] Disabling lock debugging due to kernel taint [ 86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b [ 86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f] ... Fixes: 1ef7ed4 ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223023227.1258112-1-guoqing.jiang@canonical.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>

…_bind returns err [ Upstream commit 36684e9 ] The pointer need to be set to NULL, otherwise KASAN complains about use-after-free. Because in mtk_drm_bind, all private's drm are set as follows. private->all_drm_private[i]->drm = drm; And drm will be released by drm_dev_put in case mtk_drm_kms_init returns failure. However, the shutdown path still accesses the previous allocated memory in drm_atomic_helper_shutdown. [ 84.874820] watchdog: watchdog0: watchdog did not stop! [ 86.512054] ================================================================== [ 86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378 [ 86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1 [ 86.515213] [ 86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty torvalds#55 [ 86.516752] Hardware name: Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022 [ 86.517960] Call trace: [ 86.518333] show_stack+0x20/0x38 (C) [ 86.518891] dump_stack_lvl+0x90/0xd0 [ 86.519443] print_report+0xf8/0x5b0 [ 86.519985] kasan_report+0xb4/0x100 [ 86.520526] __asan_report_load8_noabort+0x20/0x30 [ 86.521240] drm_atomic_helper_shutdown+0x33c/0x378 [ 86.521966] mtk_drm_shutdown+0x54/0x80 [ 86.522546] platform_shutdown+0x64/0x90 [ 86.523137] device_shutdown+0x260/0x5b8 [ 86.523728] kernel_restart+0x78/0xf0 [ 86.524282] __do_sys_reboot+0x258/0x2f0 [ 86.524871] __arm64_sys_reboot+0x90/0xd8 [ 86.525473] invoke_syscall+0x74/0x268 [ 86.526041] el0_svc_common.constprop.0+0xb0/0x240 [ 86.526751] do_el0_svc+0x4c/0x70 [ 86.527251] el0_svc+0x4c/0xc0 [ 86.527719] el0t_64_sync_handler+0x144/0x168 [ 86.528367] el0t_64_sync+0x198/0x1a0 [ 86.528920] [ 86.529157] The buggy address belongs to the physical page: [ 86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc [ 86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000 [ 86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000 [ 86.534511] page dumped because: kasan: bad access detected [ 86.535323] [ 86.535559] Memory state around the buggy address: [ 86.536265] ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.537314] ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.544733] ^ [ 86.551057] ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.557510] ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.563928] ================================================================== [ 86.571093] Disabling lock debugging due to kernel taint [ 86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b [ 86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f] ... Fixes: 1ef7ed4 ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223023227.1258112-1-guoqing.jiang@canonical.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

…_bind returns err The pointer need to be set to NULL, otherwise KASAN complains about use-after-free. Because in mtk_drm_bind, all private's drm are set as follows. private->all_drm_private[i]->drm = drm; And drm will be released by drm_dev_put in case mtk_drm_kms_init returns failure. However, the shutdown path still accesses the previous allocated memory in drm_atomic_helper_shutdown. [ 84.874820] watchdog: watchdog0: watchdog did not stop! [ 86.512054] ================================================================== [ 86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378 [ 86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1 [ 86.515213] [ 86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty torvalds#55 [ 86.516752] Hardware name: Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022 [ 86.517960] Call trace: [ 86.518333] show_stack+0x20/0x38 (C) [ 86.518891] dump_stack_lvl+0x90/0xd0 [ 86.519443] print_report+0xf8/0x5b0 [ 86.519985] kasan_report+0xb4/0x100 [ 86.520526] __asan_report_load8_noabort+0x20/0x30 [ 86.521240] drm_atomic_helper_shutdown+0x33c/0x378 [ 86.521966] mtk_drm_shutdown+0x54/0x80 [ 86.522546] platform_shutdown+0x64/0x90 [ 86.523137] device_shutdown+0x260/0x5b8 [ 86.523728] kernel_restart+0x78/0xf0 [ 86.524282] __do_sys_reboot+0x258/0x2f0 [ 86.524871] __arm64_sys_reboot+0x90/0xd8 [ 86.525473] invoke_syscall+0x74/0x268 [ 86.526041] el0_svc_common.constprop.0+0xb0/0x240 [ 86.526751] do_el0_svc+0x4c/0x70 [ 86.527251] el0_svc+0x4c/0xc0 [ 86.527719] el0t_64_sync_handler+0x144/0x168 [ 86.528367] el0t_64_sync+0x198/0x1a0 [ 86.528920] [ 86.529157] The buggy address belongs to the physical page: [ 86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc [ 86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000 [ 86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000 [ 86.534511] page dumped because: kasan: bad access detected [ 86.535323] [ 86.535559] Memory state around the buggy address: [ 86.536265] ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.537314] ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.544733] ^ [ 86.551057] ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.557510] ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.563928] ================================================================== [ 86.571093] Disabling lock debugging due to kernel taint [ 86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b [ 86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f] ... Fixes: 1ef7ed4 ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223023227.1258112-1-guoqing.jiang@canonical.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org>

Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 #55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: d74169c ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>

commit b150654 upstream. Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts locally") moved the call to enable_drhd_fault_handling() to a code path that does not hold any lock while traversing the drhd list. Fix it by ensuring the dmar_global_lock lock is held when traversing the drhd list. Without this fix, the following warning is triggered: ============================= WARNING: suspicious RCU usage 6.14.0-rc3 torvalds#55 Not tainted ----------------------------- drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by cpuhp/1/23: #0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 #1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0 stack backtrace: CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 torvalds#55 Call Trace: <TASK> dump_stack_lvl+0xb7/0xd0 lockdep_rcu_suspicious+0x159/0x1f0 ? __pfx_enable_drhd_fault_handling+0x10/0x10 enable_drhd_fault_handling+0x151/0x180 cpuhp_invoke_callback+0x1df/0x990 cpuhp_thread_fun+0x1ea/0x2c0 smpboot_thread_fn+0x1f5/0x2e0 ? __pfx_smpboot_thread_fn+0x10/0x10 kthread+0x12a/0x2d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x4a/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Holding the lock in enable_drhd_fault_handling() triggers a lockdep splat about a possible deadlock between dmar_global_lock and cpu_hotplug_lock. This is avoided by not holding dmar_global_lock when calling iommu_device_register(), which initiates the device probe process. Fixes: d74169c ("iommu/vt-d: Allocate DMAR fault interrupts locally") Reported-and-tested-by: Ido Schimmel <idosch@nvidia.com> Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/ Tested-by: Breno Leitao <leitao@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20250218022422.2315082-1-baolu.lu@linux.intel.com Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…_bind returns err BugLink: https://bugs.launchpad.net/bugs/2106770 [ Upstream commit 36684e9 ] The pointer need to be set to NULL, otherwise KASAN complains about use-after-free. Because in mtk_drm_bind, all private's drm are set as follows. private->all_drm_private[i]->drm = drm; And drm will be released by drm_dev_put in case mtk_drm_kms_init returns failure. However, the shutdown path still accesses the previous allocated memory in drm_atomic_helper_shutdown. [ 84.874820] watchdog: watchdog0: watchdog did not stop! [ 86.512054] ================================================================== [ 86.513162] BUG: KASAN: use-after-free in drm_atomic_helper_shutdown+0x33c/0x378 [ 86.514258] Read of size 8 at addr ffff0000d46fc068 by task shutdown/1 [ 86.515213] [ 86.515455] CPU: 1 UID: 0 PID: 1 Comm: shutdown Not tainted 6.13.0-rc1-mtk+gfa1a78e5d24b-dirty torvalds#55 [ 86.516752] Hardware name: Unknown Product/Unknown Product, BIOS 2022.10 10/01/2022 [ 86.517960] Call trace: [ 86.518333] show_stack+0x20/0x38 (C) [ 86.518891] dump_stack_lvl+0x90/0xd0 [ 86.519443] print_report+0xf8/0x5b0 [ 86.519985] kasan_report+0xb4/0x100 [ 86.520526] __asan_report_load8_noabort+0x20/0x30 [ 86.521240] drm_atomic_helper_shutdown+0x33c/0x378 [ 86.521966] mtk_drm_shutdown+0x54/0x80 [ 86.522546] platform_shutdown+0x64/0x90 [ 86.523137] device_shutdown+0x260/0x5b8 [ 86.523728] kernel_restart+0x78/0xf0 [ 86.524282] __do_sys_reboot+0x258/0x2f0 [ 86.524871] __arm64_sys_reboot+0x90/0xd8 [ 86.525473] invoke_syscall+0x74/0x268 [ 86.526041] el0_svc_common.constprop.0+0xb0/0x240 [ 86.526751] do_el0_svc+0x4c/0x70 [ 86.527251] el0_svc+0x4c/0xc0 [ 86.527719] el0t_64_sync_handler+0x144/0x168 [ 86.528367] el0t_64_sync+0x198/0x1a0 [ 86.528920] [ 86.529157] The buggy address belongs to the physical page: [ 86.529972] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff0000d46fd4d0 pfn:0x1146fc [ 86.531319] flags: 0xbfffc0000000000(node=0|zone=2|lastcpupid=0xffff) [ 86.532267] raw: 0bfffc0000000000 0000000000000000 dead000000000122 0000000000000000 [ 86.533390] raw: ffff0000d46fd4d0 0000000000000000 00000000ffffffff 0000000000000000 [ 86.534511] page dumped because: kasan: bad access detected [ 86.535323] [ 86.535559] Memory state around the buggy address: [ 86.536265] ffff0000d46fbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.537314] ffff0000d46fbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.538363] >ffff0000d46fc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.544733] ^ [ 86.551057] ffff0000d46fc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.557510] ffff0000d46fc100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 86.563928] ================================================================== [ 86.571093] Disabling lock debugging due to kernel taint [ 86.577642] Unable to handle kernel paging request at virtual address e0e9c0920000000b [ 86.581834] KASAN: maybe wild-memory-access in range [0x0752049000000058-0x075204900000005f] ... Fixes: 1ef7ed4 ("drm/mediatek: Modify mediatek-drm for mt8195 multi mmsys support") Signed-off-by: Guoqing Jiang <guoqing.jiang@canonical.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patchwork.kernel.org/project/dri-devel/patch/20241223023227.1258112-1-guoqing.jiang@canonical.com/ Signed-off-by: Chun-Kuang Hu <chunkuang.hu@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> CVE-2024-57926 Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Mehmet Basaran <mehmet.basaran@canonical.com>

…l_fop_write A potential deadlock due to A-B/B-A deadlock exists between the NFC core and the RFKill subsystem, involving the NFC device lock and the rfkill_global_mutex. This issue is particularly visible on PREEMPT_RT kernels, which can report the following warning: | rtmutex deadlock detected | WARNING: CPU: 0 PID: 22729 at kernel/locking/rtmutex.c:1674 rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | Modules linked in: | CPU: 0 UID: 0 PID: 22729 Comm: syz.7.2187 Kdump: loaded Not tainted 6.17.0-rc1-00001-g1149a5db27c8-dirty torvalds#55 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025 | pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | lr : rt_mutex_handle_deadlock+0x40/0xec kernel/locking/rtmutex.c:1674 | sp : ffff8000967c7720 | x29: ffff8000967c7720 x28: 1fffe0001946d182 x27: dfff800000000000 | x26: 0000000000000001 x25: 0000000000000003 x24: 1fffe0001946d00b | x23: 1fffe0001946d182 x22: ffff80008aec8940 x21: dfff800000000000 | x20: ffff0000ca368058 x19: ffff0000ca368c10 x18: ffff80008af6b6e0 | x17: 1fffe000590b8088 x16: ffff80008046cc08 x15: 0000000000000001 | x14: 1fffe000590ba990 x13: 0000000000000000 x12: 0000000000000000 | x11: ffff6000590ba991 x10: 0000000000000002 x9 : 0fe446e029bcfe00 | x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f | x5 : 0000000000000001 x4 : 0000000000001000 x3 : ffff800080503efc | x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000000001 | Call trace: | rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 (P) | __rt_mutex_slowlock+0x1cc/0x480 kernel/locking/rtmutex.c:1734 | __rt_mutex_slowlock_locked kernel/locking/rtmutex.c:1760 [inline] | rt_mutex_slowlock+0x140/0x21c kernel/locking/rtmutex.c:1800 | __rt_mutex_lock kernel/locking/rtmutex.c:1815 [inline] | __mutex_lock_common kernel/locking/rtmutex_api.c:536 [inline] | mutex_lock+0xf0/0x10c kernel/locking/rtmutex_api.c:603 | device_lock include/linux/device.h:911 [inline] | nfc_dev_down net/nfc/core.c:143 [inline] | nfc_rfkill_set_block+0x48/0x2a4 net/nfc/core.c:179 | rfkill_set_block+0x184/0x364 net/rfkill/core.c:346 | rfkill_fop_write+0x4dc/0x624 net/rfkill/core.c:1301 | vfs_write+0x2b8/0xa30 fs/read_write.c:684 | ksys_write+0x120/0x210 fs/read_write.c:738 | __do_sys_write fs/read_write.c:749 [inline] | __se_sys_write fs/read_write.c:746 [inline] | __arm64_sys_write+0x7c/0x90 fs/read_write.c:746 | __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] | invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 | el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 | do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 | el0_svc+0x40/0x140 arch/arm64/kernel/entry-common.c:879 | el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:596 The scenario is as follows: Task A (rfkill_fop_write): 1. Acquires rfkill_global_mutex. 2. Iterates devices and calls rfkill_set_block() -> nfc_rfkill_set_block() -> nfc_dev_down(). 3. Tries to acquire NFC device_lock. Task B (nfc_unregister_device): 1. Acquires NFC device_lock. 2. Calls rfkill_unregister(). 3. Tries to acquire rfkill_global_mutex. Task A waits for the device_lock held by Task B, while Task B waits for the rfkill_global_mutex held by Task A. To fix this, move the calls to rfkill_unregister() and rfkill_destroy() outside the device_lock critical section in nfc_unregister_device(). We ensure this is safe by first acquiring the device_lock, setting the shutting_down flag (which prevents races with nfc_dev_down()), stashing the rfkill pointer in a local variable, nullifying the pointer in the nfc_dev structure, and then releasing the device_lock before calling the rfkill unregister functions. This breaks the lock inversion. Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: NipaLocal <nipa@local>

…l_fop_write A potential deadlock due to A-B/B-A deadlock exists between the NFC core and the RFKill subsystem, involving the NFC device lock and the rfkill_global_mutex. This issue is particularly visible on PREEMPT_RT kernels, which can report the following warning: | rtmutex deadlock detected | WARNING: CPU: 0 PID: 22729 at kernel/locking/rtmutex.c:1674 rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | Modules linked in: | CPU: 0 UID: 0 PID: 22729 Comm: syz.7.2187 Kdump: loaded Not tainted 6.17.0-rc1-00001-g1149a5db27c8-dirty torvalds#55 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025 | pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | lr : rt_mutex_handle_deadlock+0x40/0xec kernel/locking/rtmutex.c:1674 | sp : ffff8000967c7720 | x29: ffff8000967c7720 x28: 1fffe0001946d182 x27: dfff800000000000 | x26: 0000000000000001 x25: 0000000000000003 x24: 1fffe0001946d00b | x23: 1fffe0001946d182 x22: ffff80008aec8940 x21: dfff800000000000 | x20: ffff0000ca368058 x19: ffff0000ca368c10 x18: ffff80008af6b6e0 | x17: 1fffe000590b8088 x16: ffff80008046cc08 x15: 0000000000000001 | x14: 1fffe000590ba990 x13: 0000000000000000 x12: 0000000000000000 | x11: ffff6000590ba991 x10: 0000000000000002 x9 : 0fe446e029bcfe00 | x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f | x5 : 0000000000000001 x4 : 0000000000001000 x3 : ffff800080503efc | x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000000001 | Call trace: | rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 (P) | __rt_mutex_slowlock+0x1cc/0x480 kernel/locking/rtmutex.c:1734 | __rt_mutex_slowlock_locked kernel/locking/rtmutex.c:1760 [inline] | rt_mutex_slowlock+0x140/0x21c kernel/locking/rtmutex.c:1800 | __rt_mutex_lock kernel/locking/rtmutex.c:1815 [inline] | __mutex_lock_common kernel/locking/rtmutex_api.c:536 [inline] | mutex_lock+0xf0/0x10c kernel/locking/rtmutex_api.c:603 | device_lock include/linux/device.h:911 [inline] | nfc_dev_down net/nfc/core.c:143 [inline] | nfc_rfkill_set_block+0x48/0x2a4 net/nfc/core.c:179 | rfkill_set_block+0x184/0x364 net/rfkill/core.c:346 | rfkill_fop_write+0x4dc/0x624 net/rfkill/core.c:1301 | vfs_write+0x2b8/0xa30 fs/read_write.c:684 | ksys_write+0x120/0x210 fs/read_write.c:738 | __do_sys_write fs/read_write.c:749 [inline] | __se_sys_write fs/read_write.c:746 [inline] | __arm64_sys_write+0x7c/0x90 fs/read_write.c:746 | __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] | invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 | el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 | do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 | el0_svc+0x40/0x140 arch/arm64/kernel/entry-common.c:879 | el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:596 The scenario is as follows: Task A (rfkill_fop_write): 1. Acquires rfkill_global_mutex. 2. Iterates devices and calls rfkill_set_block() -> nfc_rfkill_set_block() -> nfc_dev_down(). 3. Tries to acquire NFC device_lock. Task B (nfc_unregister_device): 1. Acquires NFC device_lock. 2. Calls rfkill_unregister(). 3. Tries to acquire rfkill_global_mutex. Task A waits for the device_lock held by Task B, while Task B waits for the rfkill_global_mutex held by Task A. To fix this, move the calls to rfkill_unregister() and rfkill_destroy() outside the device_lock critical section in nfc_unregister_device(). We ensure this is safe by first acquiring the device_lock, setting the shutting_down flag (which prevents races with nfc_dev_down()), stashing the rfkill pointer in a local variable, nullifying the pointer in the nfc_dev structure, and then releasing the device_lock before calling the rfkill unregister functions. This breaks the lock inversion. Signed-off-by: Yunseong Kim <ysk@kzalloc.com>

…l_fop_write A potential deadlock due to A-B/B-A deadlock exists between the NFC core and the RFKill subsystem, involving the NFC device lock and the rfkill_global_mutex. This issue is particularly visible on PREEMPT_RT kernels, which can report the following warning: | rtmutex deadlock detected | WARNING: CPU: 0 PID: 22729 at kernel/locking/rtmutex.c:1674 rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | Modules linked in: | CPU: 0 UID: 0 PID: 22729 Comm: syz.7.2187 Kdump: loaded Not tainted 6.17.0-rc1-00001-g1149a5db27c8-dirty torvalds#55 PREEMPT_RT | Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8ubuntu1 06/11/2025 | pstate: 63400005 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 | lr : rt_mutex_handle_deadlock+0x40/0xec kernel/locking/rtmutex.c:1674 | sp : ffff8000967c7720 | x29: ffff8000967c7720 x28: 1fffe0001946d182 x27: dfff800000000000 | x26: 0000000000000001 x25: 0000000000000003 x24: 1fffe0001946d00b | x23: 1fffe0001946d182 x22: ffff80008aec8940 x21: dfff800000000000 | x20: ffff0000ca368058 x19: ffff0000ca368c10 x18: ffff80008af6b6e0 | x17: 1fffe000590b8088 x16: ffff80008046cc08 x15: 0000000000000001 | x14: 1fffe000590ba990 x13: 0000000000000000 x12: 0000000000000000 | x11: ffff6000590ba991 x10: 0000000000000002 x9 : 0fe446e029bcfe00 | x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000000000003f | x5 : 0000000000000001 x4 : 0000000000001000 x3 : ffff800080503efc | x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000000001 | Call trace: | rt_mutex_handle_deadlock+0x68/0xec kernel/locking/rtmutex.c:-1 (P) | __rt_mutex_slowlock+0x1cc/0x480 kernel/locking/rtmutex.c:1734 | __rt_mutex_slowlock_locked kernel/locking/rtmutex.c:1760 [inline] | rt_mutex_slowlock+0x140/0x21c kernel/locking/rtmutex.c:1800 | __rt_mutex_lock kernel/locking/rtmutex.c:1815 [inline] | __mutex_lock_common kernel/locking/rtmutex_api.c:536 [inline] | mutex_lock+0xf0/0x10c kernel/locking/rtmutex_api.c:603 | device_lock include/linux/device.h:911 [inline] | nfc_dev_down net/nfc/core.c:143 [inline] | nfc_rfkill_set_block+0x48/0x2a4 net/nfc/core.c:179 | rfkill_set_block+0x184/0x364 net/rfkill/core.c:346 | rfkill_fop_write+0x4dc/0x624 net/rfkill/core.c:1301 | vfs_write+0x2b8/0xa30 fs/read_write.c:684 | ksys_write+0x120/0x210 fs/read_write.c:738 | __do_sys_write fs/read_write.c:749 [inline] | __se_sys_write fs/read_write.c:746 [inline] | __arm64_sys_write+0x7c/0x90 fs/read_write.c:746 | __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] | invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 | el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 | do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 | el0_svc+0x40/0x140 arch/arm64/kernel/entry-common.c:879 | el0t_64_sync_handler+0x84/0x12c arch/arm64/kernel/entry-common.c:898 | el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:596 The scenario is as follows: Task A (rfkill_fop_write): 1. Acquires rfkill_global_mutex. 2. Iterates devices and calls rfkill_set_block() -> nfc_rfkill_set_block() -> nfc_dev_down(). 3. Tries to acquire NFC device_lock. Task B (nfc_unregister_device): 1. Acquires NFC device_lock. 2. Calls rfkill_unregister(). 3. Tries to acquire rfkill_global_mutex. Task A waits for the device_lock held by Task B, while Task B waits for the rfkill_global_mutex held by Task A. To fix this, move the calls to rfkill_unregister() and rfkill_destroy() outside the device_lock critical section in nfc_unregister_device(). We ensure this is safe by first acquiring the device_lock, setting the shutting_down flag (which prevents races with nfc_dev_down()), stashing the rfkill pointer in a local variable, nullifying the pointer in the nfc_dev structure, and then releasing the device_lock before calling the rfkill unregister functions. This breaks the lock inversion. Signed-off-by: Yunseong Kim <ysk@kzalloc.com> Signed-off-by: NipaLocal <nipa@local>

Merge pull request #1 from torvalds/master

bfd4116

1102

coder280 closed this Nov 4, 2013

coder280 reopened this Nov 4, 2013

coder280 closed this Nov 4, 2013

This was referenced Mar 11, 2024

PVM guest kernel start failed with virtio-pmem rootfs virt-pvm/linux#1

Open

PVM guest kernel panic after restore from snapshot virt-pvm/linux#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge pull request #1 from torvalds/master #55

Merge pull request #1 from torvalds/master #55

Uh oh!

coder280 commented Nov 4, 2013

Uh oh!

Uh oh!

Merge pull request #1 from torvalds/master #55

Merge pull request #1 from torvalds/master #55

Uh oh!

Conversation

coder280 commented Nov 4, 2013

Uh oh!

Uh oh!