Integrating kernel-topics to qcom-next by sgaud-quic · Pull Request #1 · sgaud-quic/kernel

sgaud-quic · 2025-05-19T12:00:29Z

Integrating kernel-topics to qcom-next

The QCS8300 Camera clock controller is a derivative of SA8775P, but has few additional clocks and offset differences. Hence, add support for QCS8300 Camera clock controller by extending the SA8775P CamCC. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Imran Shaik <quic_imrashai@quicinc.com> Change-Id: I18a9a199a7dfb822279c9dd7aab2ffe69a89e889 Link: https://lore.kernel.org/all/20250109-qcs8300-mm-patches-new-v4-0-63e8ac268b02@quicinc.com/ Patch-mainline: linux-clk @ 01/09/25, 14:27 Signed-off-by: Shiraz Hashim <quic_shashim@quicinc.com>

Add the global clock controller support for QCS615 SoC. Change-Id: I211757d296f64c52264f679aeff02dd6cd872e90 Signed-off-by: Taniya Das <quic_tdas@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Imran Shaik <quic_imrashai@quicinc.com>

Correct the ngpios entry to account for the UFS_RESET pin being exported as a GPIO in addition to the real GPIOs, allowing the UFS driver to toggle it. Fixes: b698f36 ("pinctrl: qcom: add the tlmm driver for QCS615 platform") Change-Id: If4c29ae49c23997129673136091f361df697fa42 Signed-off-by: Lijuan Gao <quic_lijuang@quicinc.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/all/20241219-correct_gpio_ranges-v2-3-19af8588dbd0@quicinc.com/ Signed-off-by: Yuanjie Yang <quic_yuanjiey@quicinc.com>

Currently iwdev->rf is allocated in irdma_probe(), but free in irdma_ib_dealloc_device(). It can be misleading. Move the free to irdma_remove() to be more obvious. Freeing in irdma_ib_dealloc_device() leads to KASAN use-after-free issue. Which can also lead to NULL pointer dereference. Fix this. irdma_deinit_interrupts() can't be moved before freeing iwdef->rf, because in this case deinit interrupts will be done before freeing irqs. The simplest solution is to move kfree(iwdev->rf) to irdma_remove(). Reproducer: sudo rmmod irdma Minified splat(s): BUG: KASAN: use-after-free in irdma_remove+0x257/0x2d0 [irdma] Call Trace: <TASK> ? __pfx__raw_spin_lock_irqsave+0x10/0x10 ? kfree+0x253/0x450 ? irdma_remove+0x257/0x2d0 [irdma] kasan_report+0xed/0x120 ? irdma_remove+0x257/0x2d0 [irdma] irdma_remove+0x257/0x2d0 [irdma] auxiliary_bus_remove+0x56/0x80 device_release_driver_internal+0x371/0x530 ? kernfs_put.part.0+0x147/0x310 driver_detach+0xbf/0x180 bus_remove_driver+0x11b/0x2a0 auxiliary_driver_unregister+0x1a/0x50 irdma_exit_module+0x40/0x4c [irdma] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:ice_free_rdma_qvector+0x2a/0xa0 [ice] Call Trace: ? ice_free_rdma_qvector+0x2a/0xa0 [ice] irdma_remove+0x179/0x2d0 [irdma] auxiliary_bus_remove+0x56/0x80 device_release_driver_internal+0x371/0x530 ? kobject_put+0x61/0x4b0 driver_detach+0xbf/0x180 bus_remove_driver+0x11b/0x2a0 auxiliary_driver_unregister+0x1a/0x50 irdma_exit_module+0x40/0x4c [irdma] Reported-by: Marcin Szycik <marcin.szycik@linux.intel.com> Closes: https://lore.kernel.org/netdev/8e533834-4564-472f-b29b-4f1cb7730053@linux.intel.com/ Fixes: 3e0d3cb ("ice, irdma: move interrupts code to irdma") Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20250414234231.523-1-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>

When memory is insufficient, the allocation of nfs_lock_context in nfs_get_lock_context() fails and returns -ENOMEM. If we mistakenly treat an nfs4_unlockdata structure (whose l_ctx member has been set to -ENOMEM) as valid and proceed to execute rpc_run_task(), this will trigger a NULL pointer dereference in nfs4_locku_prepare. For example: BUG: kernel NULL pointer dereference, address: 000000000000000c PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 15 UID: 0 PID: 12 Comm: kworker/u64:0 Not tainted 6.15.0-rc2-dirty qualcomm-linux#60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 Workqueue: rpciod rpc_async_schedule RIP: 0010:nfs4_locku_prepare+0x35/0xc2 Code: 89 f2 48 89 fd 48 c7 c7 68 69 ef b5 53 48 8b 8e 90 00 00 00 48 89 f3 RSP: 0018:ffffbbafc006bdb8 EFLAGS: 00010246 RAX: 000000000000004b RBX: ffff9b964fc1fa00 RCX: 0000000000000000 RDX: 0000000000000000 RSI: fffffffffffffff4 RDI: ffff9ba53fddbf40 RBP: ffff9ba539934000 R08: 0000000000000000 R09: ffffbbafc006bc38 R10: ffffffffb6b689c8 R11: 0000000000000003 R12: ffff9ba539934030 R13: 0000000000000001 R14: 0000000004248060 R15: ffffffffb56d1c30 FS: 0000000000000000(0000) GS:ffff9ba5881f0000(0000) knlGS:00000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000c CR3: 000000093f244000 CR4: 00000000000006f0 Call Trace: <TASK> __rpc_execute+0xbc/0x480 rpc_async_schedule+0x2f/0x40 process_one_work+0x232/0x5d0 worker_thread+0x1da/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x10d/0x240 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: CR2: 000000000000000c ---[ end trace 0000000000000000 ]--- Free the allocated nfs4_unlockdata when nfs_get_lock_context() fails and return NULL to terminate subsequent rpc_run_task, preventing NULL pointer dereference. Fixes: f30cb75 ("NFS: Always wait for I/O completion before unlock") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250417072508.3850532-1-lilingfeng3@huawei.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

When CONFIG_PROVE_RCU_LIST is enabled, fprobe triggers the following warning: WARNING: suspicious RCU usage kernel/trace/fprobe.c:457 RCU-list traversed in non-reader section!! other info that might help us debug this: #1: ffffffff863c4e08 (fprobe_mutex){+.+.}-{4:4}, at: fprobe_module_callback+0x7b/0x8c0 Call Trace: fprobe_module_callback notifier_call_chain blocking_notifier_call_chain This warning occurs because fprobe_remove_node_in_module() traverses an RCU list using RCU primitives without holding an RCU read lock. However, the function is only called from fprobe_module_callback(), which holds the fprobe_mutex lock that provides sufficient protection for safely traversing the list. Fix the warning by specifying the locking design to the CONFIG_PROVE_RCU_LIST mechanism. Add the lockdep_is_held() argument to hlist_for_each_entry_rcu() to inform the RCU checker that fprobe_mutex provides the required protection. Link: https://lore.kernel.org/all/20250410-fprobe-v1-1-068ef5f41436@debian.org/ Fixes: a3dc298 ("tracing: fprobe: Cleanup fprobe hash when module unloading") Signed-off-by: Breno Leitao <leitao@debian.org> Tested-by: Antonio Quartulli <antonio@mandelbit.com> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>

If the discard worker is running and there's currently only one block group, that block group is a data block group, it's in the unused block groups discard list and is being used (it got an extent allocated from it after becoming unused), the worker can end up in an infinite loop if a transaction abort happens or the async discard is disabled (during remount or unmount for example). This happens like this: 1) Task A, the discard worker, is at peek_discard_list() and find_next_block_group() returns block group X; 2) Block group X is in the unused block groups discard list (its discard index is BTRFS_DISCARD_INDEX_UNUSED) since at some point in the past it become an unused block group and was added to that list, but then later it got an extent allocated from it, so its ->used counter is not zero anymore; 3) The current transaction is aborted by task B and we end up at __btrfs_handle_fs_error() in the transaction abort path, where we call btrfs_discard_stop(), which clears BTRFS_FS_DISCARD_RUNNING from fs_info, and then at __btrfs_handle_fs_error() we set the fs to RO mode (setting SB_RDONLY in the super block's s_flags field); 4) Task A calls __add_to_discard_list() with the goal of moving the block group from the unused block groups discard list into another discard list, but at __add_to_discard_list() we end up doing nothing because btrfs_run_discard_work() returns false, since the super block has SB_RDONLY set in its flags and BTRFS_FS_DISCARD_RUNNING is not set anymore in fs_info->flags. So block group X remains in the unused block groups discard list; 5) Task A then does a goto into the 'again' label, calls find_next_block_group() again we gets block group X again. Then it repeats the previous steps over and over since there are not other block groups in the discard lists and block group X is never moved out of the unused block groups discard list since btrfs_run_discard_work() keeps returning false and therefore __add_to_discard_list() doesn't move block group X out of that discard list. When this happens we can get a soft lockup report like this: [71.957] watchdog: BUG: soft lockup - CPU#0 stuck for 27s! [kworker/u4:3:97] [71.957] Modules linked in: xfs af_packet rfkill (...) [71.957] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.957] Tainted: [W]=WARN [71.957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.957] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.957] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [71.957] Code: c1 01 48 83 (...) [71.957] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [71.957] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [71.957] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [71.957] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [71.957] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [71.957] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [71.957] FS: 0000000000000000(0000) GS:ffff89707bc00000(0000) knlGS:0000000000000000 [71.957] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [71.957] CR2: 00005605bcc8d2f0 CR3: 000000010376a001 CR4: 0000000000770ef0 [71.957] PKRU: 55555554 [71.957] Call Trace: [71.957] <TASK> [71.957] process_one_work+0x17e/0x330 [71.957] worker_thread+0x2ce/0x3f0 [71.957] ? __pfx_worker_thread+0x10/0x10 [71.957] kthread+0xef/0x220 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork+0x34/0x50 [71.957] ? __pfx_kthread+0x10/0x10 [71.957] ret_from_fork_asm+0x1a/0x30 [71.957] </TASK> [71.957] Kernel panic - not syncing: softlockup: hung tasks [71.987] CPU: 0 UID: 0 PID: 97 Comm: kworker/u4:3 Tainted: G W L 6.14.2-1-default #1 openSUSE Tumbleweed 968795ef2b1407352128b466fe887416c33af6fa [71.989] Tainted: [W]=WARN, [L]=SOFTLOCKUP [71.989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [71.991] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] [71.992] Call Trace: [71.993] <IRQ> [71.994] dump_stack_lvl+0x5a/0x80 [71.994] panic+0x10b/0x2da [71.995] watchdog_timer_fn.cold+0x9a/0xa1 [71.996] ? __pfx_watchdog_timer_fn+0x10/0x10 [71.997] __hrtimer_run_queues+0x132/0x2a0 [71.997] hrtimer_interrupt+0xff/0x230 [71.998] __sysvec_apic_timer_interrupt+0x55/0x100 [71.999] sysvec_apic_timer_interrupt+0x6c/0x90 [72.000] </IRQ> [72.000] <TASK> [72.001] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [72.002] RIP: 0010:btrfs_discard_workfn+0xc4/0x400 [btrfs] [72.002] Code: c1 01 48 83 (...) [72.005] RSP: 0018:ffffafaec03efe08 EFLAGS: 00000246 [72.006] RAX: ffff897045500000 RBX: ffff8970413ed8d0 RCX: 0000000000000000 [72.006] RDX: 0000000000000001 RSI: ffff8970413ed8d0 RDI: 0000000a8f1272ad [72.007] RBP: 0000000a9d61c60e R08: ffff897045500140 R09: 8080808080808080 [72.008] R10: ffff897040276800 R11: fefefefefefefeff R12: ffff8970413ed860 [72.009] R13: ffff897045500000 R14: ffff8970413ed868 R15: 0000000000000000 [72.010] ? btrfs_discard_workfn+0x51/0x400 [btrfs 23b01089228eb964071fb7ca156eee8cd3bf996f] [72.011] process_one_work+0x17e/0x330 [72.012] worker_thread+0x2ce/0x3f0 [72.013] ? __pfx_worker_thread+0x10/0x10 [72.014] kthread+0xef/0x220 [72.014] ? __pfx_kthread+0x10/0x10 [72.015] ret_from_fork+0x34/0x50 [72.015] ? __pfx_kthread+0x10/0x10 [72.016] ret_from_fork_asm+0x1a/0x30 [72.017] </TASK> [72.017] Kernel Offset: 0x15000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [72.019] Rebooting in 90 seconds.. So fix this by making sure we move a block group out of the unused block groups discard list when calling __add_to_discard_list(). Fixes: 2bee7eb ("btrfs: discard one region at a time in async discard") Link: https://bugzilla.suse.com/show_bug.cgi?id=1242012 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Daniel Vacek <neelx@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

MACsec offload is not supported in switchdev mode for uplink representors. When switching to the uplink representor profile, the MACsec offload feature must be cleared from the netdevice's features. If left enabled, attempts to add offloads result in a null pointer dereference, as the uplink representor does not support MACsec offload even though the feature bit remains set. Clear NETIF_F_HW_MACSEC in mlx5e_fix_uplink_rep_features(). Kernel log: Oops: general protection fault, probably for non-canonical address 0xdffffc000000000f: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000078-0x000000000000007f] CPU: 29 UID: 0 PID: 4714 Comm: ip Not tainted 6.14.0-rc4_for_upstream_debug_2025_03_02_17_35 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x128/0x1dd0 Code: d0 7c 08 84 d2 0f 85 ad 15 00 00 8b 35 91 5c fe 03 85 f6 75 29 49 8d 7e 60 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 a6 15 00 00 4d 3b 76 60 0f 85 fd 0b 00 00 65 ff RSP: 0018:ffff888147a4f160 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 000000000000000f RSI: 0000000000000000 RDI: 0000000000000078 RBP: ffff888147a4f2e0 R08: ffffffffa05d2c19 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000018 R15: ffff888152de0000 FS: 00007f855e27d800(0000) GS:ffff88881ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004e5768 CR3: 000000013ae7c005 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 Call Trace: <TASK> ? die_addr+0x3d/0xa0 ? exc_general_protection+0x144/0x220 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? __mutex_lock+0x128/0x1dd0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mutex_lock_io_nested+0x1ae0/0x1ae0 ? lock_acquire+0x1c2/0x530 ? macsec_upd_offload+0x145/0x380 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? kasan_save_stack+0x30/0x40 ? kasan_save_stack+0x20/0x40 ? kasan_save_track+0x10/0x30 ? __kasan_kmalloc+0x77/0x90 ? __kmalloc_noprof+0x249/0x6b0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0xb5/0x240 ? mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] mlx5e_macsec_add_secy+0xf9/0x700 [mlx5_core] ? mlx5e_macsec_add_rxsa+0x11a0/0x11a0 [mlx5_core] macsec_update_offload+0x26c/0x820 ? macsec_set_mac_address+0x4b0/0x4b0 ? lockdep_hardirqs_on_prepare+0x284/0x400 ? _raw_spin_unlock_irqrestore+0x47/0x50 macsec_upd_offload+0x2c8/0x380 ? macsec_update_offload+0x820/0x820 ? __nla_parse+0x22/0x30 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x15e/0x240 genl_family_rcv_msg_doit+0x1cc/0x2a0 ? genl_family_rcv_msg_attrs_parse.constprop.0+0x240/0x240 ? cap_capable+0xd4/0x330 genl_rcv_msg+0x3ea/0x670 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? lockdep_set_lock_cmp_fn+0x190/0x190 ? macsec_update_offload+0x820/0x820 netlink_rcv_skb+0x12b/0x390 ? genl_family_rcv_msg_dumpit+0x2a0/0x2a0 ? netlink_ack+0xd80/0xd80 ? rwsem_down_read_slowpath+0xf90/0xf90 ? netlink_deliver_tap+0xcd/0xac0 ? netlink_deliver_tap+0x155/0xac0 ? _copy_from_iter+0x1bb/0x12c0 genl_rcv+0x24/0x40 netlink_unicast+0x440/0x700 ? netlink_attachskb+0x760/0x760 ? lock_acquire+0x1c2/0x530 ? __might_fault+0xbb/0x170 netlink_sendmsg+0x749/0xc10 ? netlink_unicast+0x700/0x700 ? __might_fault+0xbb/0x170 ? netlink_unicast+0x700/0x700 __sock_sendmsg+0xc5/0x190 ____sys_sendmsg+0x53f/0x760 ? import_iovec+0x7/0x10 ? kernel_sendmsg+0x30/0x30 ? __copy_msghdr+0x3c0/0x3c0 ? filter_irq_stacks+0x90/0x90 ? stack_depot_save_flags+0x28/0xa30 ___sys_sendmsg+0xeb/0x170 ? kasan_save_stack+0x30/0x40 ? copy_msghdr_from_user+0x110/0x110 ? do_syscall_64+0x6d/0x140 ? lock_acquire+0x1c2/0x530 ? __virt_addr_valid+0x116/0x3b0 ? __virt_addr_valid+0x1da/0x3b0 ? lock_downgrade+0x680/0x680 ? __delete_object+0x21/0x50 __sys_sendmsg+0xf7/0x180 ? __sys_sendmsg_sock+0x20/0x20 ? kmem_cache_free+0x14c/0x4e0 ? __x64_sys_close+0x78/0xd0 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f855e113367 Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 RSP: 002b:00007ffd15e90c88 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f855e113367 RDX: 0000000000000000 RSI: 00007ffd15e90cf0 RDI: 0000000000000004 RBP: 00007ffd15e90dbc R08: 0000000000000028 R09: 000000000045d100 R10: 00007f855e011dd8 R11: 0000000000000246 R12: 0000000000000019 R13: 0000000067c6b785 R14: 00000000004a1e80 R15: 0000000000000000 </TASK> Modules linked in: 8021q garp mrp sch_ingress openvswitch nsh mlx5_ib mlx5_fwctl mlx5_dpll mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- Fixes: 8ff0ac5 ("net/mlx5: Add MACsec offload Tx command support") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1746958552-561295-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…unload Kernel panic occurs when a devmem TCP socket is closed after NIC module is unloaded. This is Devmem TCP unregistration scenarios. number is an order. (a)netlink socket close (b)pp destroy (c)uninstall result 1 2 3 OK 1 3 2 (d)Impossible 2 1 3 OK 3 1 2 (e)Kernel panic 2 3 1 (d)Impossible 3 2 1 (d)Impossible (a) netdev_nl_sock_priv_destroy() is called when devmem TCP socket is closed. (b) page_pool_destroy() is called when the interface is down. (c) mp_ops->uninstall() is called when an interface is unregistered. (d) There is no scenario in mp_ops->uninstall() is called before page_pool_destroy(). Because unregister_netdevice_many_notify() closes interfaces first and then calls mp_ops->uninstall(). (e) netdev_nl_sock_priv_destroy() accesses struct net_device to acquire netdev_lock(). But if the interface module has already been removed, net_device pointer is invalid, so it causes kernel panic. In summary, there are only 3 possible scenarios. A. sk close -> pp destroy -> uninstall. B. pp destroy -> sk close -> uninstall. C. pp destroy -> uninstall -> sk close. Case C is a kernel panic scenario. In order to fix this problem, It makes mp_dmabuf_devmem_uninstall() set binding->dev to NULL. It indicates an bound net_device was unregistered. It makes netdev_nl_sock_priv_destroy() do not acquire netdev_lock() if binding->dev is NULL. A new binding->lock is added to protect a dev of a binding. So, lock ordering is like below. priv->lock netdev_lock(dev) binding->lock Tests: Scenario A: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 kill $pid ip link set $interface down modprobe -rv $module Scenario B: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 ip link set $interface down kill $pid modprobe -rv $module Scenario C: ./ncdevmem -s 192.168.1.4 -c 192.168.1.2 -f $interface -l -p 8000 \ -v 7 -t 1 -q 1 & pid=$! sleep 10 modprobe -rv $module sleep 5 kill $pid Splat looks like: Oops: general protection fault, probably for non-canonical address 0xdffffc001fffa9f7: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI KASAN: probably user-memory-access in range [0x00000000fffd4fb8-0x00000000fffd4fbf] CPU: 0 UID: 0 PID: 2041 Comm: ncdevmem Tainted: G B W 6.15.0-rc1+ #2 PREEMPT(undef) 0947ec89efa0fd68838b78e36aa1617e97ff5d7f Tainted: [B]=BAD_PAGE, [W]=WARN RIP: 0010:__mutex_lock (./include/linux/sched.h:2244 kernel/locking/mutex.c:400 kernel/locking/mutex.c:443 kernel/locking/mutex.c:605 kernel/locking/mutex.c:746) Code: ea 03 80 3c 02 00 0f 85 4f 13 00 00 49 8b 1e 48 83 e3 f8 74 6a 48 b8 00 00 00 00 00 fc ff df 48 8d 7b 34 48 89 fa 48 c1 ea 03 <0f> b6 f RSP: 0018:ffff88826f7ef730 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 00000000fffd4f88 RCX: ffffffffaa9bc811 RDX: 000000001fffa9f7 RSI: 0000000000000008 RDI: 00000000fffd4fbc RBP: ffff88826f7ef8b0 R08: 0000000000000000 R09: ffffed103e6aa1a4 R10: 0000000000000007 R11: ffff88826f7ef442 R12: fffffbfff669f65e R13: ffff88812a830040 R14: ffff8881f3550d20 R15: 00000000fffd4f88 FS: 0000000000000000(0000) GS:ffff888866c05000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563bed0cb288 CR3: 00000001a7c98000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ... netdev_nl_sock_priv_destroy (net/core/netdev-genl.c:953 (discriminator 3)) genl_release (net/netlink/genetlink.c:653 net/netlink/genetlink.c:694 net/netlink/genetlink.c:705) ... netlink_release (net/netlink/af_netlink.c:737) ... __sock_release (net/socket.c:647) sock_close (net/socket.c:1393) Fixes: 1d22d30 ("net: drop rtnl_lock for queue_mgmt operations") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250514154028.1062909-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

[ Upstream commit 123eda2 ] mlx5e_priv is an unstable structure that can be memset(0) if profile attaching fails, mlx5e_priv in mlx5e_dev devlink private is used to reference the netdev and mdev associated with that struct. Instead, store netdev directly into mlx5e_dev and get mdev from the containing mlx5_adev aux device structure. This fixes a kernel oops in mlx5e_remove when switchdev mode fails due to change profile failure. $ devlink dev eswitch set pci/0000:00:03.0 mode switchdev Error: mlx5_core: Failed setting eswitch to offloads. dmesg: workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12 workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 $ devlink dev reload pci/0000:00:03.0 ==> oops BUG: kernel NULL pointer dereference, address: 0000000000000520 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 3 UID: 0 PID: 521 Comm: devlink Not tainted 6.18.0-rc5+ qualcomm-linux#117 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:mlx5e_remove+0x68/0x130 RSP: 0018:ffffc900034838f0 EFLAGS: 00010246 RAX: ffff88810283c380 RBX: ffff888101874400 RCX: ffffffff826ffc45 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff888102d789c0 R08: ffff8881007137f0 R09: ffff888100264e10 R10: ffffc90003483898 R11: ffffc900034838a0 R12: ffff888100d261a0 R13: ffff888100d261a0 R14: ffff8881018749a0 R15: ffff888101874400 FS: 00007f8565fea740(0000) GS:ffff88856a759000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000520 CR3: 000000010b11a004 CR4: 0000000000370ef0 Call Trace: <TASK> device_release_driver_internal+0x19c/0x200 bus_remove_device+0xc6/0x130 device_del+0x160/0x3d0 ? devl_param_driverinit_value_get+0x2d/0x90 mlx5_detach_device+0x89/0xe0 mlx5_unload_one_devl_locked+0x3a/0x70 mlx5_devlink_reload_down+0xc8/0x220 devlink_reload+0x7d/0x260 devlink_nl_reload_doit+0x45b/0x5a0 genl_family_rcv_msg_doit+0xe8/0x140 Fixes: ee75f1f ("net/mlx5e: Create separate devlink instance for ethernet auxiliary device") Fixes: c4d7eb5 ("net/mxl5e: Add change profile method") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://patch.msgid.link/20260108212657.25090-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 4ef8512 ] mlx5e_priv is an unstable structure that can be memset(0) if profile attaching fails. Pass netdev to mlx5e_destroy_netdev() to guarantee it will work on a valid netdev. On mlx5e_remove: Check validity of priv->profile, before attempting to cleanup any resources that might be not there. This fixes a kernel oops in mlx5e_remove when switchdev mode fails due to change profile failure. $ devlink dev eswitch set pci/0000:00:03.0 mode switchdev Error: mlx5_core: Failed setting eswitch to offloads. dmesg: workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: new profile init failed, -12 workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR mlx5_core 0012:03:00.1: mlx5e_netdev_init_profile:6214:(pid 37199): mlx5e_priv_init failed, err=-12 mlx5_core 0012:03:00.1 gpu3rdma1: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12 $ devlink dev reload pci/0000:00:03.0 ==> oops BUG: kernel NULL pointer dereference, address: 0000000000000370 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 15 UID: 0 PID: 520 Comm: devlink Not tainted 6.18.0-rc5+ qualcomm-linux#115 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:mlx5e_dcbnl_dscp_app+0x23/0x100 RSP: 0018:ffffc9000083f8b8 EFLAGS: 00010286 RAX: ffff8881126fc380 RBX: ffff8881015ac400 RCX: ffffffff826ffc45 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8881035109c0 RBP: ffff8881035109c0 R08: ffff888101e3e838 R09: ffff888100264e10 R10: ffffc9000083f898 R11: ffffc9000083f8a0 R12: ffff888101b921a0 R13: ffff888101b921a0 R14: ffff8881015ac9a0 R15: ffff8881015ac400 FS: 00007f789a3c8740(0000) GS:ffff88856aa59000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000370 CR3: 000000010b6c0001 CR4: 0000000000370ef0 Call Trace: <TASK> mlx5e_remove+0x57/0x110 device_release_driver_internal+0x19c/0x200 bus_remove_device+0xc6/0x130 device_del+0x160/0x3d0 ? devl_param_driverinit_value_get+0x2d/0x90 mlx5_detach_device+0x89/0xe0 mlx5_unload_one_devl_locked+0x3a/0x70 mlx5_devlink_reload_down+0xc8/0x220 devlink_reload+0x7d/0x260 devlink_nl_reload_doit+0x45b/0x5a0 genl_family_rcv_msg_doit+0xe8/0x140 Fixes: c4d7eb5 ("net/mxl5e: Add change profile method") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260108212657.25090-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

commit e707c59 upstream. The Secondary Sample Point Source field has been set to an incorrect value by some mistake in the past 0b01 - SSP_SRC_NO_SSP - SSP is not used. for data bitrates above 1 MBit/s. The correct/default value already used for lower bitrates is 0b00 - SSP_SRC_MEAS_N_OFFSET - SSP position = TRV_DELAY (Measured Transmitter delay) + SSP_OFFSET. The related configuration register structure is described in section 3.1.46 SSP_CFG of the CTU CAN FD IP CORE Datasheet. The analysis leading to the proper configuration is described in section 2.8.3 Secondary sampling point of the datasheet. The change has been tested on AMD/Xilinx Zynq with the next CTU CN FD IP core versions: - 2.6 aka master in the "integration with Zynq-7000 system" test 6.12.43-rt12+ #1 SMP PREEMPT_RT kernel with CTU CAN FD git driver (change already included in the driver repo) - older 2.5 snapshot with mainline kernels with this patch applied locally in the multiple CAN latency tester nightly runs 6.18.0-rc4-rt3-dut #1 SMP PREEMPT_RT 6.19.0-rc3-dut The logs, the datasheet and sources are available at https://canbus.pages.fel.cvut.cz/ Signed-off-by: Ondrej Ille <ondrej.ille@gmail.com> Signed-off-by: Pavel Pisa <pisa@fel.cvut.cz> Link: https://patch.msgid.link/20260105111620.16580-1-pisa@fel.cvut.cz Fixes: 2dcb8e8 ("can: ctucanfd: add support for CTU CAN FD open-source IP core - bus independent part.") Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 038a102 upstream. The kernel test robot has reported: BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28 lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0 CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT 8cc09ef94dcec767faa911515ce9e609c45db470 Call Trace: <IRQ> __dump_stack (lib/dump_stack.c:95) dump_stack_lvl (lib/dump_stack.c:123) dump_stack (lib/dump_stack.c:130) spin_dump (kernel/locking/spinlock_debug.c:71) do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?) _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138) __free_frozen_pages (mm/page_alloc.c:2973) ___free_pages (mm/page_alloc.c:5295) __free_pages (mm/page_alloc.c:5334) tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290) ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289) ? rcu_core (kernel/rcu/tree.c:?) rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861) rcu_core_si (kernel/rcu/tree.c:2879) handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623) __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725) irq_exit_rcu (kernel/softirq.c:741) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052) </IRQ> <TASK> RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194) free_pcppages_bulk (mm/page_alloc.c:1494) drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632) __drain_all_pages (mm/page_alloc.c:2731) drain_all_pages (mm/page_alloc.c:2747) kcompactd (mm/compaction.c:3115) kthread (kernel/kthread.c:465) ? __cfi_kcompactd (mm/compaction.c:3166) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork (arch/x86/kernel/process.c:164) ? __cfi_kthread (kernel/kthread.c:412) ret_from_fork_asm (arch/x86/entry/entry_64.S:255) </TASK> Matthew has analyzed the report and identified that in drain_page_zone() we are in a section protected by spin_lock(&pcp->lock) and then get an interrupt that attempts spin_trylock() on the same lock. The code is designed to work this way without disabling IRQs and occasionally fail the trylock with a fallback. However, the SMP=n spinlock implementation assumes spin_trylock() will always succeed, and thus it's normally a no-op. Here the enabled lock debugging catches the problem, but otherwise it could cause a corruption of the pcp structure. The problem has been introduced by commit 5749077 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations"). The pcp locking scheme recognizes the need for disabling IRQs to prevent nesting spin_trylock() sections on SMP=n, but the need to prevent the nesting in spin_lock() has not been recognized. Fix it by introducing local wrappers that change the spin_lock() to spin_lock_iqsave() with SMP=n and use them in all places that do spin_lock(&pcp->lock). [vbabka@suse.cz: add pcp_ prefix to the spin_lock_irqsave wrappers, per Steven] Link: https://lkml.kernel.org/r/20260105-fix-pcp-up-v1-1-5579662d2071@suse.cz Fixes: 5749077 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com Analyzed-by: Matthew Wilcox <willy@infradead.org> Link: https://lore.kernel.org/all/aUW05pyc9nZkvY-1@casper.infradead.org/ Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Patch series "mm/hugetlb: fixes for PMD table sharing (incl. using mmu_gather)", v3. One functional fix, one performance regression fix, and two related comment fixes. I cleaned up my prototype I recently shared [1] for the performance fix, deferring most of the cleanups I had in the prototype to a later point. While doing that I identified the other things. The goal of this patch set is to be backported to stable trees "fairly" easily. At least patch #1 and #4. Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing Patch #2 + #3 are simple comment fixes that patch #4 interacts with. Patch #4 is a fix for the reported performance regression due to excessive IPI broadcasts during fork()+exit(). The last patch is all about TLB flushes, IPIs and mmu_gather. Read: complicated There are plenty of cleanups in the future to be had + one reasonable optimization on x86. But that's all out of scope for this series. Runtime tested, with a focus on fixing the performance regression using the original reproducer [2] on x86. This patch (of 4): We switched from (wrongly) using the page count to an independent shared count. Now, shared page tables have a refcount of 1 (excluding speculative references) and instead use ptdesc->pt_share_count to identify sharing. We didn't convert hugetlb_pmd_shared(), so right now, we would never detect a shared PMD table as such, because sharing/unsharing no longer touches the refcount of a PMD table. Page migration, like mbind() or migrate_pages() would allow for migrating folios mapped into such shared PMD tables, even though the folios are not exclusive. In smaps we would account them as "private" although they are "shared", and we would be wrongly setting the PM_MMAP_EXCLUSIVE in the pagemap interface. Fix it by properly using ptdesc_pmd_is_shared() in hugetlb_pmd_shared(). Link: https://lkml.kernel.org/r/20251223214037.580860-1-david@kernel.org Link: https://lkml.kernel.org/r/20251223214037.580860-2-david@kernel.org Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [1] Link: https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ [2] Fixes: 59d9094 ("mm: hugetlb: independent PMD page table shared count") Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Rik van Riel <riel@surriel.com> Reviewed-by: Lance Yang <lance.yang@linux.dev> Tested-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Uschakow, Stanislav" <suschako@amazon.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Jamal Hadi Salim says: ==================== net/sched: teql: Enforce hierarchy placement GangMin Kim <km.kim1503@gmail.com> managed to create a UAF on qfq by inserting teql as a child qdisc and exploiting a qlen sync issue. teql is not intended to be used as a child qdisc. Lets enforce that rule in patch #1. Although patch #1 fixes the issue, we prevent another potential qlen exploit in qfq in patch #2 by enforcing the child's active status is not determined by inspecting the qlen. In patch #3 we add a tdc test case. ==================== Link: https://patch.msgid.link/20260114160243.913069-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…gs list The netdevsim driver lacks a protection mechanism for operations on the bpf_bound_progs list. When the nsim_bpf_create_prog() performs list_add_tail, it is possible that nsim_bpf_destroy_prog() is simultaneously performs list_del. Concurrent operations on the list may lead to list corruption and trigger a kernel crash as follows: [ 417.290971] kernel BUG at lib/list_debug.c:62! [ 417.290983] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 417.290992] CPU: 10 PID: 168 Comm: kworker/10:1 Kdump: loaded Not tainted 6.19.0-rc5 #1 [ 417.291003] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 417.291007] Workqueue: events bpf_prog_free_deferred [ 417.291021] RIP: 0010:__list_del_entry_valid_or_report+0xa7/0xc0 [ 417.291034] Code: a8 ff 0f 0b 48 89 fe 48 89 ca 48 c7 c7 48 a1 eb ae e8 ed fb a8 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 80 a1 eb ae e8 d9 fb a8 ff <0f> 0b 48 89 d1 48 c7 c7 d0 a1 eb ae 48 89 f2 48 89 c6 e8 c2 fb a8 [ 417.291040] RSP: 0018:ffffb16a40807df8 EFLAGS: 00010246 [ 417.291046] RAX: 000000000000006d RBX: ffff8e589866f500 RCX: 0000000000000000 [ 417.291051] RDX: 0000000000000000 RSI: ffff8e59f7b23180 RDI: ffff8e59f7b23180 [ 417.291055] RBP: ffffb16a412c9000 R08: 0000000000000000 R09: 0000000000000003 [ 417.291059] R10: ffffb16a40807c80 R11: ffffffffaf9edce8 R12: ffff8e594427ac20 [ 417.291063] R13: ffff8e59f7b44780 R14: ffff8e58800b7a05 R15: 0000000000000000 [ 417.291074] FS: 0000000000000000(0000) GS:ffff8e59f7b00000(0000) knlGS:0000000000000000 [ 417.291079] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.291083] CR2: 00007fc4083efe08 CR3: 00000001c3626006 CR4: 0000000000770ee0 [ 417.291088] PKRU: 55555554 [ 417.291091] Call Trace: [ 417.291096] <TASK> [ 417.291103] nsim_bpf_destroy_prog+0x31/0x80 [netdevsim] [ 417.291154] __bpf_prog_offload_destroy+0x2a/0x80 [ 417.291163] bpf_prog_dev_bound_destroy+0x6f/0xb0 [ 417.291171] bpf_prog_free_deferred+0x18e/0x1a0 [ 417.291178] process_one_work+0x18a/0x3a0 [ 417.291188] worker_thread+0x27b/0x3a0 [ 417.291197] ? __pfx_worker_thread+0x10/0x10 [ 417.291207] kthread+0xe5/0x120 [ 417.291214] ? __pfx_kthread+0x10/0x10 [ 417.291221] ret_from_fork+0x31/0x50 [ 417.291230] ? __pfx_kthread+0x10/0x10 [ 417.291236] ret_from_fork_asm+0x1a/0x30 [ 417.291246] </TASK> Add a mutex lock, to prevent simultaneous addition and deletion operations on the list. Fixes: 31d3ad8 ("netdevsim: add bpf offload support") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Signed-off-by: Yun Lu <luyun@kylinos.cn> Link: https://patch.msgid.link/20260116095308.11441-1-luyun_611@163.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>

tcf_ife_encode() must make sure ife_encode() does not return NULL. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:ife_tlv_meta_encode+0x41/0xa0 net/ife/ife.c:166 CPU: 3 UID: 0 PID: 8990 Comm: syz.0.696 Not tainted syzkaller #0 PREEMPT(full) Call Trace: <TASK> ife_encode_meta_u32+0x153/0x180 net/sched/act_ife.c:101 tcf_ife_encode net/sched/act_ife.c:841 [inline] tcf_ife_act+0x1022/0x1de0 net/sched/act_ife.c:877 tc_act include/net/tc_wrapper.h:130 [inline] tcf_action_exec+0x1c0/0xa20 net/sched/act_api.c:1152 tcf_exts_exec include/net/pkt_cls.h:349 [inline] mall_classify+0x1a0/0x2a0 net/sched/cls_matchall.c:42 tc_classify include/net/tc_wrapper.h:197 [inline] __tcf_classify net/sched/cls_api.c:1764 [inline] tcf_classify+0x7f2/0x1380 net/sched/cls_api.c:1860 multiq_classify net/sched/sch_multiq.c:39 [inline] multiq_enqueue+0xe0/0x510 net/sched/sch_multiq.c:66 dev_qdisc_enqueue+0x45/0x250 net/core/dev.c:4147 __dev_xmit_skb net/core/dev.c:4262 [inline] __dev_queue_xmit+0x2998/0x46c0 net/core/dev.c:4798 Fixes: 295a6e0 ("net/sched: act_ife: Change to use ife module") Reported-by: syzbot+5cf914f193dffde3bd3c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6970d61d.050a0220.706b.0010.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yotam Gigi <yotam.gi@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260121133724.3400020-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

firmware populates MAC address, link modes (supported, advertised) and EEPROM data in shared firmware structure which kernel access via MAC block(CGX/RPM). Accessing fwdata, on boards booted with out MAC block leading to kernel panics. Internal error: Oops: 0000000096000005 [#1] SMP [ 10.460721] Modules linked in: [ 10.463779] CPU: 0 UID: 0 PID: 174 Comm: kworker/0:3 Not tainted 6.19.0-rc5-00154-g76ec646abdf7-dirty #3 PREEMPT [ 10.474045] Hardware name: Marvell OcteonTX CN98XX board (DT) [ 10.479793] Workqueue: events work_for_cpu_fn [ 10.484159] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 10.491124] pc : rvu_sdp_init+0x18/0x114 [ 10.495051] lr : rvu_probe+0xe58/0x1d18 Fixes: 9978144 ("Octeontx2-af: Fetch MAC channel info from firmware") Fixes: 5f21226 ("Octeontx2-pf: ethtool: support multi advertise mode") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260121094819.2566786-1-hkelam@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

The GET_INSTANCE_ID macro that caused a kernel panic when accessing sysfs attributes: 1. Off-by-one error: The loop condition used '<=' instead of '<', causing access beyond array bounds. Since array indices are 0-based and go from 0 to instances_count-1, the loop should use '<'. 2. Missing NULL check: The code dereferenced attr_name_kobj->name without checking if attr_name_kobj was NULL, causing a null pointer dereference in min_length_show() and other attribute show functions. The panic occurred when fwupd tried to read BIOS configuration attributes: Oops: general protection fault [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:min_length_show+0xcf/0x1d0 [hp_bioscfg] Add a NULL check for attr_name_kobj before dereferencing and corrects the loop boundary to match the pattern used elsewhere in the driver. Cc: stable@vger.kernel.org Fixes: 5f94f18 ("platform/x86: hp-bioscfg: bioscfg-h") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20260115203725.828434-3-mario.limonciello@amd.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

The code to restore a ZA context doesn't attempt to allocate the task's sve_state before setting TIF_SME. Consequently, restoring a ZA context can place a task into an invalid state where TIF_SME is set but the task's sve_state is NULL. In legitimate but uncommon cases where the ZA signal context was NOT created by the kernel in the context of the same task (e.g. if the task is saved/restored with something like CRIU), we have no guarantee that sve_state had been allocated previously. In these cases, userspace can enter streaming mode without trapping while sve_state is NULL, causing a later NULL pointer dereference when the kernel attempts to store the register state: | # ./sigreturn-za | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | Mem abort info: | ESR = 0x0000000096000046 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x06: level 2 translation fault | Data abort info: | ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | user pgtable: 4k pages, 52-bit VAs, pgdp=0000000101f47c00 | [0000000000000000] pgd=08000001021d8403, p4d=0800000102274403, pud=0800000102275403, pmd=0000000000000000 | Internal error: Oops: 0000000096000046 [#1] SMP | Modules linked in: | CPU: 0 UID: 0 PID: 153 Comm: sigreturn-za Not tainted 6.19.0-rc1 #1 PREEMPT | Hardware name: linux,dummy-virt (DT) | pstate: 214000c9 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) | pc : sve_save_state+0x4/0xf0 | lr : fpsimd_save_user_state+0xb0/0x1c0 | sp : ffff80008070bcc0 | x29: ffff80008070bcc0 x28: fff00000c1ca4c40 x27: 63cfa172fb5cf658 | x26: fff00000c1ca5228 x25: 0000000000000000 x24: 0000000000000000 | x23: 0000000000000000 x22: fff00000c1ca4c40 x21: fff00000c1ca4c40 | x20: 0000000000000020 x19: fff00000ff6900f0 x18: 0000000000000000 | x17: fff05e8e0311f000 x16: 0000000000000000 x15: 028fca8f3bdaf21c | x14: 0000000000000212 x13: fff00000c0209f10 x12: 0000000000000020 | x11: 0000000000200b20 x10: 0000000000000000 x9 : fff00000ff69dcc0 | x8 : 00000000000003f2 x7 : 0000000000000001 x6 : fff00000c1ca5b48 | x5 : fff05e8e0311f000 x4 : 0000000008000000 x3 : 0000000000000000 | x2 : 0000000000000001 x1 : fff00000c1ca5970 x0 : 0000000000000440 | Call trace: | sve_save_state+0x4/0xf0 (P) | fpsimd_thread_switch+0x48/0x198 | __switch_to+0x20/0x1c0 | __schedule+0x36c/0xce0 | schedule+0x34/0x11c | exit_to_user_mode_loop+0x124/0x188 | el0_interrupt+0xc8/0xd8 | __el0_irq_handler_common+0x18/0x24 | el0t_64_irq_handler+0x10/0x1c | el0t_64_irq+0x198/0x19c | Code: 54000040 d51b4408 d65f03c0 d503245f (e5bb5800) | ---[ end trace 0000000000000000 ]--- Fix this by having restore_za_context() ensure that the task's sve_state is allocated, matching what we do when taking an SME trap. Any live SVE/SSVE state (which is restored earlier from a separate signal context) must be preserved, and hence this is not zeroed. Fixes: 3978221 ("arm64/sme: Implement ZA signal handling") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

A DABT is reported[1] on an android based system when resume from hiberate. This happens because swsusp_arch_suspend_exit() is marked with SYM_CODE_*() and does not have a CFI hash, but swsusp_arch_resume() will attempt to verify the CFI hash when calling a copy of swsusp_arch_suspend_exit(). Given that there's an existing requirement that the entrypoint to swsusp_arch_suspend_exit() is the first byte of the .hibernate_exit.text section, we cannot fix this by marking swsusp_arch_suspend_exit() with SYM_FUNC_*(). The simplest fix for now is to disable the CFI check in swsusp_arch_resume(). Mark swsusp_arch_resume() as __nocfi to disable the CFI check. [1] [ 22.991934][ T1] Unable to handle kernel paging request at virtual address 0000000109170ffc [ 22.991934][ T1] Mem abort info: [ 22.991934][ T1] ESR = 0x0000000096000007 [ 22.991934][ T1] EC = 0x25: DABT (current EL), IL = 32 bits [ 22.991934][ T1] SET = 0, FnV = 0 [ 22.991934][ T1] EA = 0, S1PTW = 0 [ 22.991934][ T1] FSC = 0x07: level 3 translation fault [ 22.991934][ T1] Data abort info: [ 22.991934][ T1] ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000 [ 22.991934][ T1] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 22.991934][ T1] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 22.991934][ T1] [0000000109170ffc] user address but active_mm is swapper [ 22.991934][ T1] Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP [ 22.991934][ T1] Dumping ftrace buffer: [ 22.991934][ T1] (ftrace buffer empty) [ 22.991934][ T1] Modules linked in: [ 22.991934][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.98-android15-8-g0b1d2aee7fc3-dirty-4k #1 688c7060a825a3ac418fe53881730b355915a419 [ 22.991934][ T1] Hardware name: Unisoc UMS9360-base Board (DT) [ 22.991934][ T1] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 22.991934][ T1] pc : swsusp_arch_resume+0x2ac/0x344 [ 22.991934][ T1] lr : swsusp_arch_resume+0x294/0x344 [ 22.991934][ T1] sp : ffffffc08006b960 [ 22.991934][ T1] x29: ffffffc08006b9c0 x28: 0000000000000000 x27: 0000000000000000 [ 22.991934][ T1] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000820 [ 22.991934][ T1] x23: ffffffd0817e3000 x22: ffffffd0817e3000 x21: 0000000000000000 [ 22.991934][ T1] x20: ffffff8089171000 x19: ffffffd08252c8c8 x18: ffffffc080061058 [ 22.991934][ T1] x17: 00000000529c6ef0 x16: 00000000529c6ef0 x15: 0000000000000004 [ 22.991934][ T1] x14: ffffff8178c88000 x13: 0000000000000006 x12: 0000000000000000 [ 22.991934][ T1] x11: 0000000000000015 x10: 0000000000000001 x9 : ffffffd082533000 [ 22.991934][ T1] x8 : 0000000109171000 x7 : 205b5d3433393139 x6 : 392e32322020205b [ 22.991934][ T1] x5 : 000000010916f000 x4 : 000000008164b000 x3 : ffffff808a4e0530 [ 22.991934][ T1] x2 : ffffffd08058e784 x1 : 0000000082326000 x0 : 000000010a283000 [ 22.991934][ T1] Call trace: [ 22.991934][ T1] swsusp_arch_resume+0x2ac/0x344 [ 22.991934][ T1] hibernation_restore+0x158/0x18c [ 22.991934][ T1] load_image_and_restore+0xb0/0xec [ 22.991934][ T1] software_resume+0xf4/0x19c [ 22.991934][ T1] software_resume_initcall+0x34/0x78 [ 22.991934][ T1] do_one_initcall+0xe8/0x370 [ 22.991934][ T1] do_initcall_level+0xc8/0x19c [ 22.991934][ T1] do_initcalls+0x70/0xc0 [ 22.991934][ T1] do_basic_setup+0x1c/0x28 [ 22.991934][ T1] kernel_init_freeable+0xe0/0x148 [ 22.991934][ T1] kernel_init+0x20/0x1a8 [ 22.991934][ T1] ret_from_fork+0x10/0x20 [ 22.991934][ T1] Code: a9400a61 f94013e0 f9438923 f9400a64 (b85fc110) Co-developed-by: Jeson Gao <jeson.gao@unisoc.com> Signed-off-by: Jeson Gao <jeson.gao@unisoc.com> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> [catalin.marinas@arm.com: commit log updated by Mark Rutland] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

…itives The "valid" readout delay between the two reads of the watchdog is larger than the valid delta between the resulting watchdog and clocksource intervals, which results in false positive watchdog results. Assume TSC is the clocksource and HPET is the watchdog and both have a uncertainty margin of 250us (default). The watchdog readout does: 1) wdnow = read(HPET); 2) csnow = read(TSC); 3) wdend = read(HPET); The valid window for the delta between #1 and #3 is calculated by the uncertainty margins of the watchdog and the clocksource: m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin; which results in 750us for the TSC/HPET case. The actual interval comparison uses a smaller margin: m = watchdog.uncertainty_margin + cs.uncertainty margin; which results in 500us for the TSC/HPET case. That means the following scenario will trigger the watchdog: Watchdog cycle N: 1) wdnow[N] = read(HPET); 2) csnow[N] = read(TSC); 3) wdend[N] = read(HPET); Assume the delay between #1 and #2 is 100us and the delay between #1 and Watchdog cycle N + 1: 4) wdnow[N + 1] = read(HPET); 5) csnow[N + 1] = read(TSC); 6) wdend[N + 1] = read(HPET); If the delay between #4 and #6 is within the 750us margin then any delay between #4 and #5 which is larger than 600us will fail the interval check and mark the TSC unstable because the intervals are calculated against the previous value: wd_int = wdnow[N + 1] - wdnow[N]; cs_int = csnow[N + 1] - csnow[N]; Putting the above delays in place this results in: cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us); -> cs_int = wd_int + 510us; which is obviously larger than the allowed 500us margin and results in marking TSC unstable. Fix this by using the same margin as the interval comparison. If the delay between two watchdog reads is larger than that, then the readout was either disturbed by interconnect congestion, NMIs or SMIs. Fixes: 4ac1dd3 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin") Reported-by: Daniel J Blueman <daniel@quora.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/ Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx

When creating a synthetic event based on an existing synthetic event that had a stacktrace field and the new synthetic event used that field a kernel crash occurred: ~# cd /sys/kernel/tracing ~# echo 's:stack unsigned long stack[];' > dynamic_events ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger The above creates a synthetic event that takes a stacktrace when a task schedules out in a non-running state and passes that stacktrace to the sched_switch event when that task schedules back in. It triggers the "stack" synthetic event that has a stacktrace as its field (called "stack"). ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger The above makes another synthetic event called "syscall_stack" that attaches the first synthetic event (stack) to the sys_exit trace event and records the stacktrace from the stack event with the id of the system call that is exiting. When enabling this event (or using it in a historgram): ~# echo 1 > events/synthetic/syscall_stack/enable Produces a kernel crash! BUG: unable to handle page fault for address: 0000000000400010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy) Debian 6.16.3-1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:trace_event_raw_event_synth+0x90/0x380 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f RSP: 0018:ffffd2670388f958 EFLAGS: 00010202 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90 FS: 00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0 Call Trace: <TASK> ? __tracing_map_insert+0x208/0x3a0 action_trace+0x67/0x70 event_hist_trigger+0x633/0x6d0 event_triggers_call+0x82/0x130 trace_event_buffer_commit+0x19d/0x250 trace_event_raw_event_sys_exit+0x62/0xb0 syscall_exit_work+0x9d/0x140 do_syscall_64+0x20a/0x2f0 ? trace_event_raw_event_sched_switch+0x12b/0x170 ? save_fpregs_to_fpstate+0x3e/0x90 ? _raw_spin_unlock+0xe/0x30 ? finish_task_switch.isra.0+0x97/0x2c0 ? __rseq_handle_notify_resume+0xad/0x4c0 ? __schedule+0x4b8/0xd00 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x1ef/0x2f0 ? do_fault+0x2e9/0x540 ? __handle_mm_fault+0x7d1/0xf70 ? count_memcg_events+0x167/0x1d0 ? handle_mm_fault+0x1d7/0x2e0 ? do_user_addr_fault+0x2c3/0x7f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The reason is that the stacktrace field is not labeled as such, and is treated as a normal field and not as a dynamic event that it is. In trace_event_raw_event_synth() the event is field is still treated as a dynamic array, but the retrieval of the data is considered a normal field, and the reference is just the meta data: // Meta data is retrieved instead of a dynamic array str_val = (char *)(long)var_ref_vals[val_idx]; // Then when it tries to process it: len = *((unsigned long *)str_val) + 1; It triggers a kernel page fault. To fix this, first when defining the fields of the first synthetic event, set the filter type to FILTER_STACKTRACE. This is used later by the second synthetic event to know that this field is a stacktrace. When creating the field of the new synthetic event, have it use this FILTER_STACKTRACE to know to create a stacktrace field to copy the stacktrace into. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home Fixes: 00cf3d6 ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

When one iio device is a consumer of another, it is possible that the ->info_exist_lock of both ends up being taken when reading the value of the consumer device. Since they currently belong to the same lockdep class (being initialized in a single location with mutex_init()), that results in a lockdep warning CPU0 ---- lock(&iio_dev_opaque->info_exist_lock); lock(&iio_dev_opaque->info_exist_lock); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by sensors/414: #0: c31fd6dc (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0x44/0x4e4 #1: c4f5a1c4 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x1c/0xac #2: c2827548 (kn->active#34){.+.+}-{0:0}, at: kernfs_seq_start+0x30/0xac #3: c1dd2b6 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_read_channel_processed_scale+0x24/0xd8 stack backtrace: CPU: 0 UID: 0 PID: 414 Comm: sensors Not tainted 6.17.11 #5 NONE Hardware name: Generic AM33XX (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x44/0x60 dump_stack_lvl from print_deadlock_bug+0x2b8/0x334 print_deadlock_bug from __lock_acquire+0x13a4/0x2ab0 __lock_acquire from lock_acquire+0xd0/0x2c0 lock_acquire from __mutex_lock+0xa0/0xe8c __mutex_lock from mutex_lock_nested+0x1c/0x24 mutex_lock_nested from iio_read_channel_raw+0x20/0x6c iio_read_channel_raw from rescale_read_raw+0x128/0x1c4 rescale_read_raw from iio_channel_read+0xe4/0xf4 iio_channel_read from iio_read_channel_processed_scale+0x6c/0xd8 iio_read_channel_processed_scale from iio_hwmon_read_val+0x68/0xbc iio_hwmon_read_val from dev_attr_show+0x18/0x48 dev_attr_show from sysfs_kf_seq_show+0x80/0x110 sysfs_kf_seq_show from seq_read_iter+0xdc/0x4e4 seq_read_iter from vfs_read+0x238/0x2e4 vfs_read from ksys_read+0x6c/0xec ksys_read from ret_fast_syscall+0x0/0x1c Just as the mlock_key already has its own lockdep class, add a lock_class_key for the info_exist mutex. Note that this has in theory been a problem since before IIO first left staging, but it only occurs when a chain of consumers is in use and that is not often done. Fixes: ac917a8 ("staging:iio:core set the iio_dev.info pointer to null on unregister under lock.") Signed-off-by: Rasmus Villemoes <ravi@prevas.dk> Reviewed-by: Peter Rosin <peda@axentia.se> Cc: <stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Fix a use-after-free which happens due to enslave failure after the new slave has been added to the array. Since the new slave can be used for Tx immediately, we can use it after it has been freed by the enslave error cleanup path which frees the allocated slave memory. Slave update array is supposed to be called last when further enslave failures are not expected. Move it after xdp setup to avoid any problems. It is very easy to reproduce the problem with a simple xdp_pass prog: ip l add bond1 type bond mode balance-xor ip l set bond1 up ip l set dev bond1 xdp object xdp_pass.o sec xdp_pass ip l add dumdum type dummy Then run in parallel: while :; do ip l set dumdum master bond1 1>/dev/null 2>&1; done; mausezahn bond1 -a own -b rand -A rand -B 1.1.1.1 -c 0 -t tcp "dp=1-1023, flags=syn" The crash happens almost immediately: [ 605.602850] Oops: general protection fault, probably for non-canonical address 0xe0e6fc2460000137: 0000 [#1] SMP KASAN NOPTI [ 605.602916] KASAN: maybe wild-memory-access in range [0x07380123000009b8-0x07380123000009bf] [ 605.602946] CPU: 0 UID: 0 PID: 2445 Comm: mausezahn Kdump: loaded Tainted: G B 6.19.0-rc6+ #21 PREEMPT(voluntary) [ 605.602979] Tainted: [B]=BAD_PAGE [ 605.602998] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 605.603032] RIP: 0010:netdev_core_pick_tx+0xcd/0x210 [ 605.603063] Code: 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 3e 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 6b 08 49 8d 7d 30 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 25 01 00 00 49 8b 45 30 4c 89 e2 48 89 ee 48 89 [ 605.603111] RSP: 0018:ffff88817b9af348 EFLAGS: 00010213 [ 605.603145] RAX: dffffc0000000000 RBX: ffff88817d28b420 RCX: 0000000000000000 [ 605.603172] RDX: 00e7002460000137 RSI: 0000000000000008 RDI: 07380123000009be [ 605.603199] RBP: ffff88817b541a00 R08: 0000000000000001 R09: fffffbfff3ed8c0c [ 605.603226] R10: ffffffff9f6c6067 R11: 0000000000000001 R12: 0000000000000000 [ 605.603253] R13: 073801230000098e R14: ffff88817d28b448 R15: ffff88817b541a84 [ 605.603286] FS: 00007f6570ef67c0(0000) GS:ffff888221dfa000(0000) knlGS:0000000000000000 [ 605.603319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 605.603343] CR2: 00007f65712fae40 CR3: 000000011371b000 CR4: 0000000000350ef0 [ 605.603373] Call Trace: [ 605.603392] <TASK> [ 605.603410] __dev_queue_xmit+0x448/0x32a0 [ 605.603434] ? __pfx_vprintk_emit+0x10/0x10 [ 605.603461] ? __pfx_vprintk_emit+0x10/0x10 [ 605.603484] ? __pfx___dev_queue_xmit+0x10/0x10 [ 605.603507] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603546] ? _printk+0xcb/0x100 [ 605.603566] ? __pfx__printk+0x10/0x10 [ 605.603589] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603627] ? add_taint+0x5e/0x70 [ 605.603648] ? add_taint+0x2a/0x70 [ 605.603670] ? end_report.cold+0x51/0x75 [ 605.603693] ? bond_start_xmit+0xbfb/0xc20 [bonding] [ 605.603731] bond_start_xmit+0x623/0xc20 [bonding] Fixes: 9e2ee5c ("net, bonding: Add XDP support to the bonding driver") Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reported-by: Chen Zhen <chenzhen126@huawei.com> Closes: https://lore.kernel.org/netdev/fae17c21-4940-5605-85b2-1d5e17342358@huawei.com/ CC: Jussi Maki <joamaki@gmail.com> CC: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260123120659.571187-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>

When deleting TC steering flows, iterate only over actual devcom peers instead of assuming all possible ports exist. This avoids touching non-existent peers and ensures cleanup is limited to devices the driver is currently connected to. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 133c8a067 P4D 0 Oops: Oops: 0002 [#1] SMP CPU: 19 UID: 0 PID: 2169 Comm: tc Not tainted 6.18.0+ qualcomm-linux#156 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_tc_del_fdb_peers_flow+0xbe/0x200 [mlx5_core] Code: 00 00 a8 08 74 a8 49 8b 46 18 f6 c4 02 74 9f 4c 8d bf a0 12 00 00 4c 89 ff e8 0e e7 96 e1 49 8b 44 24 08 49 8b 0c 24 4c 89 ff <48> 89 41 08 48 89 08 49 89 2c 24 49 89 5c 24 08 e8 7d ce 96 e1 49 RSP: 0018:ff11000143867528 EFLAGS: 00010246 RAX: 0000000000000000 RBX: dead000000000122 RCX: 0000000000000000 RDX: ff11000143691580 RSI: ff110001026e5000 RDI: ff11000106f3d2a0 RBP: dead000000000100 R08: 00000000000003fd R09: 0000000000000002 R10: ff11000101c75690 R11: ff1100085faea178 R12: ff11000115f0ae78 R13: 0000000000000000 R14: ff11000115f0a800 R15: ff11000106f3d2a0 FS: 00007f35236bf740(0000) GS:ff110008dc809000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000157a01001 CR4: 0000000000373eb0 Call Trace: <TASK> mlx5e_tc_del_flow+0x46/0x270 [mlx5_core] mlx5e_flow_put+0x25/0x50 [mlx5_core] mlx5e_delete_flower+0x2a6/0x3e0 [mlx5_core] tc_setup_cb_reoffload+0x20/0x80 fl_reoffload+0x26f/0x2f0 [cls_flower] ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] ? mlx5e_tc_reoffload_flows_work+0xc0/0xc0 [mlx5_core] tcf_block_playback_offloads+0x9e/0x1c0 tcf_block_unbind+0x7b/0xd0 tcf_block_setup+0x186/0x1d0 tcf_block_offload_cmd.isra.0+0xef/0x130 tcf_block_offload_unbind+0x43/0x70 __tcf_block_put+0x85/0x160 ingress_destroy+0x32/0x110 [sch_ingress] __qdisc_destroy+0x44/0x100 qdisc_graft+0x22b/0x610 tc_get_qdisc+0x183/0x4d0 rtnetlink_rcv_msg+0x2d7/0x3d0 ? rtnl_calcit.isra.0+0x100/0x100 netlink_rcv_skb+0x53/0x100 netlink_unicast+0x249/0x320 ? __alloc_skb+0x102/0x1f0 netlink_sendmsg+0x1e3/0x420 __sock_sendmsg+0x38/0x60 ____sys_sendmsg+0x1ef/0x230 ? copy_msghdr_from_user+0x6c/0xa0 ___sys_sendmsg+0x7f/0xc0 ? ___sys_recvmsg+0x8a/0xc0 ? __sys_sendto+0x119/0x180 __sys_sendmsg+0x61/0xb0 do_syscall_64+0x55/0x640 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f35238bb764 Code: 15 b9 86 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bf 0f 1f 44 00 00 f3 0f 1e fa 80 3d e5 08 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 4c c3 0f 1f 00 55 48 89 e5 48 83 ec 20 89 55 RSP: 002b:00007ffed4c35638 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 000055a2efcc75e0 RCX: 00007f35238bb764 RDX: 0000000000000000 RSI: 00007ffed4c356a0 RDI: 0000000000000003 RBP: 00007ffed4c35710 R08: 0000000000000010 R09: 00007f3523984b20 R10: 0000000000000004 R11: 0000000000000202 R12: 00007ffed4c35790 R13: 000000006947df8f R14: 000055a2efcc75e0 R15: 00007ffed4c35780 Fixes: 9be6c21 ("net/mlx5e: Handle offloads flows per peer") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/1769411695-18820-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Add NULL pointer checks in ice_vsi_set_napi_queues() to prevent crashes during resume from suspend when rings[q_idx]->q_vector is NULL. Tested adaptor: 60:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02) Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-2 [8086:4003] SR-IOV state: both disabled and enabled can reproduce this issue. kernel version: v6.18 Reproduce steps: Boot up and execute suspend like systemctl suspend or rtcwake. Log: <1>[ 231.443607] BUG: kernel NULL pointer dereference, address: 0000000000000040 <1>[ 231.444052] #PF: supervisor read access in kernel mode <1>[ 231.444484] #PF: error_code(0x0000) - not-present page <6>[ 231.444913] PGD 0 P4D 0 <4>[ 231.445342] Oops: Oops: 0000 [#1] SMP NOPTI <4>[ 231.446635] RIP: 0010:netif_queue_set_napi+0xa/0x170 <4>[ 231.447067] Code: 31 f6 31 ff c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 c9 74 0b <48> 83 79 30 00 0f 84 39 01 00 00 55 41 89 d1 49 89 f8 89 f2 48 89 <4>[ 231.447513] RSP: 0018:ffffcc780fc078c0 EFLAGS: 00010202 <4>[ 231.447961] RAX: ffff8b848ca30400 RBX: ffff8b848caf2028 RCX: 0000000000000010 <4>[ 231.448443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b848dbd4000 <4>[ 231.448896] RBP: ffffcc780fc078e8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 231.449345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4>[ 231.449817] R13: ffff8b848dbd4000 R14: ffff8b84833390c8 R15: 0000000000000000 <4>[ 231.450265] FS: 00007c7b29e9d740(0000) GS:ffff8b8c068e2000(0000) knlGS:0000000000000000 <4>[ 231.450715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 231.451179] CR2: 0000000000000040 CR3: 000000030626f004 CR4: 0000000000f72ef0 <4>[ 231.451629] PKRU: 55555554 <4>[ 231.452076] Call Trace: <4>[ 231.452549] <TASK> <4>[ 231.452996] ? ice_vsi_set_napi_queues+0x4d/0x110 [ice] <4>[ 231.453482] ice_resume+0xfd/0x220 [ice] <4>[ 231.453977] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.454425] pci_pm_resume+0x8c/0x140 <4>[ 231.454872] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.455347] dpm_run_callback+0x5f/0x160 <4>[ 231.455796] ? dpm_wait_for_superior+0x107/0x170 <4>[ 231.456244] device_resume+0x177/0x270 <4>[ 231.456708] dpm_resume+0x209/0x2f0 <4>[ 231.457151] dpm_resume_end+0x15/0x30 <4>[ 231.457596] suspend_devices_and_enter+0x1da/0x2b0 <4>[ 231.458054] enter_state+0x10e/0x570 Add defensive checks for both the ring pointer and its q_vector before dereferencing, allowing the system to resume successfully even when q_vectors are unmapped. Fixes: 2a5dc09 ("ice: move netif_queue_set_napi to rtnl-protected sections") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Since the commit 25c6a5a ("net: phy: micrel: Dynamically control external clock of KSZ PHY"), the clock of Micrel PHY has been enabled by phy_driver::resume() and disabled by phy_driver::suspend(). However, devm_clk_get_optional_enabled() is used in kszphy_probe(), so the clock will automatically be disabled when the device is unbound from the bus. Therefore, this could cause the clock to be disabled twice, resulting in clk driver warnings. For example, this issue can be reproduced on i.MX6ULL platform, and we can see the following logs when removing the FEC MAC drivers. $ echo 2188000.ethernet > /sys/bus/platform/drivers/fec/unbind $ echo 20b4000.ethernet > /sys/bus/platform/drivers/fec/unbind [ 109.758207] ------------[ cut here ]------------ [ 109.758240] WARNING: drivers/clk/clk.c:1188 at clk_core_disable+0xb4/0xd0, CPU#0: sh/639 [ 109.771011] enet2_ref already disabled [ 109.793359] Call trace: [ 109.822006] clk_core_disable from clk_disable+0x28/0x34 [ 109.827340] clk_disable from clk_disable_unprepare+0xc/0x18 [ 109.833029] clk_disable_unprepare from devm_clk_release+0x1c/0x28 [ 109.839241] devm_clk_release from devres_release_all+0x98/0x100 [ 109.845278] devres_release_all from device_unbind_cleanup+0xc/0x70 [ 109.851571] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4 [ 109.859170] device_release_driver_internal from bus_remove_device+0xbc/0xe4 [ 109.866243] bus_remove_device from device_del+0x140/0x458 [ 109.871757] device_del from phy_mdio_device_remove+0xc/0x24 [ 109.877452] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac [ 109.883918] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78 [ 109.890125] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158 [ 109.896076] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4 [ 109.962748] WARNING: drivers/clk/clk.c:1047 at clk_core_unprepare+0xfc/0x13c, CPU#0: sh/639 [ 109.975805] enet2_ref already unprepared [ 110.002866] Call trace: [ 110.031758] clk_core_unprepare from clk_unprepare+0x24/0x2c [ 110.037440] clk_unprepare from devm_clk_release+0x1c/0x28 [ 110.042957] devm_clk_release from devres_release_all+0x98/0x100 [ 110.048989] devres_release_all from device_unbind_cleanup+0xc/0x70 [ 110.055280] device_unbind_cleanup from device_release_driver_internal+0x1a4/0x1f4 [ 110.062877] device_release_driver_internal from bus_remove_device+0xbc/0xe4 [ 110.069950] bus_remove_device from device_del+0x140/0x458 [ 110.075469] device_del from phy_mdio_device_remove+0xc/0x24 [ 110.081165] phy_mdio_device_remove from mdiobus_unregister+0x40/0xac [ 110.087632] mdiobus_unregister from fec_enet_mii_remove+0x40/0x78 [ 110.093836] fec_enet_mii_remove from fec_drv_remove+0x4c/0x158 [ 110.099782] fec_drv_remove from device_release_driver_internal+0x17c/0x1f4 After analyzing the process of removing the FEC driver, as shown below, it can be seen that the clock was disabled twice by the PHY driver. fec_drv_remove() --> fec_enet_close() --> phy_stop() --> phy_suspend() --> kszphy_suspend() #1 The clock is disabled --> fec_enet_mii_remove() --> mdiobus_unregister() --> phy_mdio_device_remove() --> device_del() --> devm_clk_release() #2 The clock is disabled again Therefore, devm_clk_get_optional() is used to fix the above issue. And to avoid the issue mentioned by the commit 9853294 ("net: phy: micrel: use devm_clk_get_optional_enabled for the rmii-ref clock"), the clock is enabled by clk_prepare_enable() to get the correct clock rate. Fixes: 25c6a5a ("net: phy: micrel: Dynamically control external clock of KSZ PHY") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260126081544.983517-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

HCA CAP structure is allocated in mlx5_hca_caps_alloc(). mlx5_mdev_init() mlx5_hca_caps_alloc() And HCA CAP is read from the device in mlx5_init_one(). The vhca_id's debugfs file is published even before above two operations are done. Due to this when user reads the vhca id before the initialization, following call trace is observed. Fix this by deferring debugfs publication until the HCA CAP is allocated and read from the device. BUG: kernel NULL pointer dereference, address: 0000000000000004 PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 23 UID: 0 PID: 6605 Comm: cat Kdump: loaded Not tainted 6.18.0-rc7-sf+ qualcomm-linux#110 PREEMPT(full) Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016 RIP: 0010:vhca_id_show+0x17/0x30 [mlx5_core] Code: cb 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 70 48 c7 c6 45 f0 12 c1 48 8b 80 70 03 00 00 <8b> 50 04 0f ca 0f b7 d2 e8 8c 82 47 cb 31 c0 c3 cc cc cc cc 0f 1f RSP: 0018:ffffd37f4f337d40 EFLAGS: 00010203 RAX: 0000000000000000 RBX: ffff8f18445c9b40 RCX: 0000000000000001 RDX: ffff8f1109825180 RSI: ffffffffc112f045 RDI: ffff8f18445c9b40 RBP: 0000000000000000 R08: 0000645eac0d2928 R09: 0000000000000006 R10: ffffd37f4f337d48 R11: 0000000000000000 R12: ffffd37f4f337dd8 R13: ffffd37f4f337db0 R14: ffff8f18445c9b68 R15: 0000000000000001 FS: 00007f3eea099580(0000) GS:ffff8f2090f1f000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000004 CR3: 00000008b64e4006 CR4: 00000000003726f0 Call Trace: <TASK> seq_read_iter+0x11f/0x4f0 ? _raw_spin_unlock+0x15/0x30 ? do_anonymous_page+0x104/0x810 seq_read+0xf6/0x120 ? srso_alias_untrain_ret+0x1/0x10 full_proxy_read+0x5c/0x90 vfs_read+0xad/0x320 ? handle_mm_fault+0x1ab/0x290 ksys_read+0x52/0xd0 do_syscall_64+0x61/0x11e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: dd3dd72 ("net/mlx5: Expose vhca_id to debugfs") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1769503961-124173-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

…alled from deferred_grow_zone() Commit 3acb913 ("mm/mm_init: use deferred_init_memmap_chunk() in deferred_grow_zone()") made deferred_grow_zone() call deferred_init_memmap_chunk() within a pgdat_resize_lock() critical section with irqs disabled. It did check for irqs_disabled() in deferred_init_memmap_chunk() to avoid calling cond_resched(). For a PREEMPT_RT kernel build, however, spin_lock_irqsave() does not disable interrupt but rcu_read_lock() is called. This leads to the following bug report. BUG: sleeping function called from invalid context at mm/mm_init.c:2091 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 3 locks held by swapper/0/1: #0: ffff80008471b7a0 (sched_domains_mutex){+.+.}-{4:4}, at: sched_domains_mutex_lock+0x28/0x40 #1: ffff003bdfffef48 (&pgdat->node_size_lock){+.+.}-{3:3}, at: deferred_grow_zone+0x140/0x278 #2: ffff800084acf600 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1b4/0x408 CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.19.0-rc6-test #1 PREEMPT_{RT,(full) } Tainted: [W]=WARN Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0xdc/0xf8 dump_stack+0x1c/0x28 __might_resched+0x384/0x530 deferred_init_memmap_chunk+0x560/0x688 deferred_grow_zone+0x190/0x278 _deferred_grow_zone+0x18/0x30 get_page_from_freelist+0x780/0xf78 __alloc_frozen_pages_noprof+0x1dc/0x348 alloc_slab_page+0x30/0x110 allocate_slab+0x98/0x2a0 new_slab+0x4c/0x80 ___slab_alloc+0x5a4/0x770 __slab_alloc.constprop.0+0x88/0x1e0 __kmalloc_node_noprof+0x2c0/0x598 __sdt_alloc+0x3b8/0x728 build_sched_domains+0xe0/0x1260 sched_init_domains+0x14c/0x1c8 sched_init_smp+0x9c/0x1d0 kernel_init_freeable+0x218/0x358 kernel_init+0x28/0x208 ret_from_fork+0x10/0x20 Fix it adding a new argument to deferred_init_memmap_chunk() to explicitly tell it if cond_resched() is allowed or not instead of relying on some current state information which may vary depending on the exact kernel configuration options that are enabled. Link: https://lkml.kernel.org/r/20260122184343.546627-1-longman@redhat.com Fixes: 3acb913 ("mm/mm_init: use deferred_init_memmap_chunk() in deferred_grow_zone()") Signed-off-by: Waiman Long <longman@redhat.com> Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: <stable@vger.kernrl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

There was a lockdep warning in sprd_gpio: [ 6.258269][T329@C6] [ BUG: Invalid wait context ] [ 6.258270][T329@C6] 6.18.0-android17-0-g30527ad7aaae-ab00009-4k #1 Tainted: G W OE [ 6.258272][T329@C6] ----------------------------- [ 6.258273][T329@C6] modprobe/329 is trying to lock: [ 6.258275][T329@C6] ffffff8081c91690 (&sprd_gpio->lock){....}-{3:3}, at: sprd_gpio_irq_unmask+0x4c/0xa4 [gpio_sprd] [ 6.258282][T329@C6] other info that might help us debug this: [ 6.258283][T329@C6] context-{5:5} [ 6.258285][T329@C6] 3 locks held by modprobe/329: [ 6.258286][T329@C6] #0: ffffff808baca108 (&dev->mutex){....}-{4:4}, at: __driver_attach+0xc4/0x204 [ 6.258295][T329@C6] #1: ffffff80965e7240 (request_class#4){+.+.}-{4:4}, at: __setup_irq+0x1cc/0x82c [ 6.258304][T329@C6] #2: ffffff80965e70c8 (lock_class#4){....}-{2:2}, at: __setup_irq+0x21c/0x82c [ 6.258313][T329@C6] stack backtrace: [ 6.258314][T329@C6] CPU: 6 UID: 0 PID: 329 Comm: modprobe Tainted: G W OE 6.18.0-android17-0-g30527ad7aaae-ab00009-4k #1 PREEMPT 3ad5b0f45741a16e5838da790706e16ceb6717df [ 6.258316][T329@C6] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 6.258317][T329@C6] Hardware name: Unisoc UMS9632-base Board (DT) [ 6.258318][T329@C6] Call trace: [ 6.258318][T329@C6] show_stack+0x20/0x30 (C) [ 6.258321][T329@C6] __dump_stack+0x28/0x3c [ 6.258324][T329@C6] dump_stack_lvl+0xac/0xf0 [ 6.258326][T329@C6] dump_stack+0x18/0x3c [ 6.258329][T329@C6] __lock_acquire+0x824/0x2c28 [ 6.258331][T329@C6] lock_acquire+0x148/0x2cc [ 6.258333][T329@C6] _raw_spin_lock_irqsave+0x6c/0xb4 [ 6.258334][T329@C6] sprd_gpio_irq_unmask+0x4c/0xa4 [gpio_sprd 814535e93c6d8e0853c45c02eab0fa88a9da6487] [ 6.258337][T329@C6] irq_startup+0x238/0x350 [ 6.258340][T329@C6] __setup_irq+0x504/0x82c [ 6.258342][T329@C6] request_threaded_irq+0x118/0x184 [ 6.258344][T329@C6] devm_request_threaded_irq+0x94/0x120 [ 6.258347][T329@C6] sc8546_init_irq+0x114/0x170 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258352][T329@C6] sc8546_charger_probe+0x53c/0x5a0 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258358][T329@C6] i2c_device_probe+0x2c8/0x350 [ 6.258361][T329@C6] really_probe+0x1a8/0x46c [ 6.258363][T329@C6] __driver_probe_device+0xa4/0x10c [ 6.258366][T329@C6] driver_probe_device+0x44/0x1b4 [ 6.258369][T329@C6] __driver_attach+0xd0/0x204 [ 6.258371][T329@C6] bus_for_each_dev+0x10c/0x168 [ 6.258373][T329@C6] driver_attach+0x2c/0x3c [ 6.258376][T329@C6] bus_add_driver+0x154/0x29c [ 6.258378][T329@C6] driver_register+0x70/0x10c [ 6.258381][T329@C6] i2c_register_driver+0x48/0xc8 [ 6.258384][T329@C6] init_module+0x28/0xfd8 [sc8546_charger 223586ccafc27439f7db4f95b0c8e6e882349a99] [ 6.258389][T329@C6] do_one_initcall+0x128/0x42c [ 6.258392][T329@C6] do_init_module+0x60/0x254 [ 6.258395][T329@C6] load_module+0x1054/0x1220 [ 6.258397][T329@C6] __arm64_sys_finit_module+0x240/0x35c [ 6.258400][T329@C6] invoke_syscall+0x60/0xec [ 6.258402][T329@C6] el0_svc_common+0xb0/0xe4 [ 6.258405][T329@C6] do_el0_svc+0x24/0x30 [ 6.258407][T329@C6] el0_svc+0x54/0x1c4 [ 6.258409][T329@C6] el0t_64_sync_handler+0x68/0xdc [ 6.258411][T329@C6] el0t_64_sync+0x1c4/0x1c8 This is because the spin_lock would change to rt_mutex in PREEMPT_RT, however the sprd_gpio->lock would use in hard-irq, this is unsafe. So change the spin_lock_t to raw_spin_lock_t to use the spinlock in hard-irq. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20260126094209.9855-1-xuewen.yan@unisoc.com [Bartosz: tweaked the commit message] Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

An issue was triggered: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 15 UID: 0 PID: 658 Comm: bash Tainted: 6.19.0-rc6-next-2026012 Tainted: [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:strcmp+0x10/0x30 RSP: 0018:ffffc900017f7dc0 EFLAGS: 00000246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888107cd4358 RDX: 0000000019f73907 RSI: ffffffff82cc381a RDI: 0000000000000000 RBP: ffff8881016bef0d R08: 000000006c0e7145 R09: 0000000056c0e714 R10: 0000000000000001 R11: ffff888107cd4358 R12: 0007ffffffffffff R13: ffff888101399200 R14: ffff888100fcb360 R15: 0007ffffffffffff CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000105c79000 CR4: 00000000000006f0 Call Trace: <TASK> dmemcg_limit_write.constprop.0+0x16d/0x390 ? __pfx_set_resource_max+0x10/0x10 kernfs_fop_write_iter+0x14e/0x200 vfs_write+0x367/0x510 ksys_write+0x66/0xe0 do_syscall_64+0x6b/0x390 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f42697e1887 It was trriggered setting max without limitation, the command is like: "echo test/region0 > dmem.max". To fix this issue, add check whether options is valid after parsing the region_name. Fixes: b168ed4 ("kernel/cgroup: Add "dmem" memory accounting cgroup") Cc: stable@vger.kernel.org # v6.14+ Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>

When deassigning a KVM_IRQFD, don't clobber the irqfd's copy of the IRQ's routing entry as doing so breaks kvm_arch_irq_bypass_del_producer() on x86 and arm64, which explicitly look for KVM_IRQ_ROUTING_MSI. Instead, to handle a concurrent routing update, verify that the irqfd is still active before consuming the routing information. As evidenced by the x86 and arm64 bugs, and another bug in kvm_arch_update_irqfd_routing() (see below), clobbering the entry type without notifying arch code is surprising and error prone. As a bonus, checking that the irqfd is active provides a convenient location for documenting _why_ KVM must not consume the routing entry for an irqfd that is in the process of being deassigned: once the irqfd is deleted from the list (which happens *before* the eventfd is detached), it will no longer receive updates via kvm_irq_routing_update(), and so KVM could deliver an event using stale routing information (relative to KVM_SET_GSI_ROUTING returning to userspace). As an even better bonus, explicitly checking for the irqfd being active fixes a similar bug to the one the clobbering is trying to prevent: if an irqfd is deactivated, and then its routing is changed, kvm_irq_routing_update() won't invoke kvm_arch_update_irqfd_routing() (because the irqfd isn't in the list). And so if the irqfd is in bypass mode, IRQs will continue to be posted using the old routing information. As for kvm_arch_irq_bypass_del_producer(), clobbering the routing type results in KVM incorrectly keeping the IRQ in bypass mode, which is especially problematic on AMD as KVM tracks IRQs that are being posted to a vCPU in a list whose lifetime is tied to the irqfd. Without the help of KASAN to detect use-after-free, the most common sympton on AMD is a NULL pointer deref in amd_iommu_update_ga() due to the memory for irqfd structure being re-allocated and zeroed, resulting in irqfd->irq_bypass_data being NULL when read by avic_update_iommu_vcpu_affinity(): BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 40cf2b9067 P4D 40cf2b9067 PUD 408362a067 PMD 0 Oops: Oops: 0000 [#1] SMP CPU: 6 UID: 0 PID: 40383 Comm: vfio_irq_test Tainted: G U W O 6.19.0-smp--5dddc257e6b2-irqfd #31 NONE Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025 RIP: 0010:amd_iommu_update_ga+0x19/0xe0 Call Trace: <TASK> avic_update_iommu_vcpu_affinity+0x3d/0x90 [kvm_amd] __avic_vcpu_load+0xf4/0x130 [kvm_amd] kvm_arch_vcpu_load+0x89/0x210 [kvm] vcpu_load+0x30/0x40 [kvm] kvm_arch_vcpu_ioctl_run+0x45/0x620 [kvm] kvm_vcpu_ioctl+0x571/0x6a0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x6f/0x9d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x46893b </TASK> ---[ end trace 0000000000000000 ]--- If AVIC is inhibited when the irfd is deassigned, the bug will manifest as list corruption, e.g. on the next irqfd assignment. list_add corruption. next->prev should be prev (ffff8d474d5cd588), but was 0000000000000000. (next=ffff8d8658f86530). ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:31! Oops: invalid opcode: 0000 [#1] SMP CPU: 128 UID: 0 PID: 80818 Comm: vfio_irq_test Tainted: G U W O 6.19.0-smp--f19dc4d680ba-irqfd #28 NONE Tainted: [U]=USER, [W]=WARN, [O]=OOT_MODULE Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.78.2-0 09/05/2025 RIP: 0010:__list_add_valid_or_report+0x97/0xc0 Call Trace: <TASK> avic_pi_update_irte+0x28e/0x2b0 [kvm_amd] kvm_pi_update_irte+0xbf/0x190 [kvm] kvm_arch_irq_bypass_add_producer+0x72/0x90 [kvm] irq_bypass_register_consumer+0xcd/0x170 [irqbypass] kvm_irqfd+0x4c6/0x540 [kvm] kvm_vm_ioctl+0x118/0x5d0 [kvm] __se_sys_ioctl+0x6d/0xb0 do_syscall_64+0x6f/0x9d0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- On Intel and arm64, the bug is less noisy, as the end result is that the device keeps posting IRQs to the vCPU even after it's been deassigned. Note, the worst of the breakage can be traced back to commit cb21073 ("KVM: Pass new routing entries and irqfd when updating IRTEs"), as before that commit KVM would pull the routing information from the per-VM routing table. But as above, similar bugs have existed since support for IRQ bypass was added. E.g. if a routing change finished before irq_shutdown() invoked kvm_arch_irq_bypass_del_producer(), VMX and SVM would see stale routing information and potentially leave the irqfd in bypass mode. Alternatively, x86 could be fixed by explicitly checking irq_bypass_vcpu instead of irq_entry.type in kvm_arch_irq_bypass_del_producer(), and arm64 could be modified to utilize irq_bypass_vcpu in a similar manner. But (a) that wouldn't fix the routing updates bug, and (b) fixing core code doesn't preclude x86 (or arm64) from adding such code as a sanity check (spoiler alert). Fixes: f70c20a ("KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'") Fixes: cb21073 ("KVM: Pass new routing entries and irqfd when updating IRTEs") Fixes: a0d7e2f ("KVM: arm64: vgic-v4: Only attempt vLPI mapping for actual MSIs") Cc: stable@vger.kernel.org Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20260113174606.104978-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>

Fix race condition where PTP periodic work runs while VSI is being rebuilt, accessing NULL vsi->rx_rings. The sequence was: 1. ice_ptp_prepare_for_reset() cancels PTP work 2. ice_ptp_rebuild() immediately queues PTP work 3. VSI rebuild happens AFTER ice_ptp_rebuild() 4. PTP work runs and accesses NULL vsi->rx_rings Fix: Keep PTP work cancelled during rebuild, only queue it after VSI rebuild completes in ice_rebuild(). Added ice_ptp_queue_work() helper function to encapsulate the logic for queuing PTP work, ensuring it's only queued when PTP is supported and the state is ICE_PTP_READY. Error log: [ 121.392544] ice 0000:60:00.1: PTP reset successful [ 121.392692] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 121.392712] #PF: supervisor read access in kernel mode [ 121.392720] #PF: error_code(0x0000) - not-present page [ 121.392727] PGD 0 [ 121.392734] Oops: Oops: 0000 [#1] SMP NOPTI [ 121.392746] CPU: 8 UID: 0 PID: 1005 Comm: ice-ptp-0000:60 Tainted: G S 6.19.0-rc6+ #4 PREEMPT(voluntary) [ 121.392761] Tainted: [S]=CPU_OUT_OF_SPEC [ 121.392773] RIP: 0010:ice_ptp_update_cached_phctime+0xbf/0x150 [ice] [ 121.393042] Call Trace: [ 121.393047] <TASK> [ 121.393055] ice_ptp_periodic_work+0x69/0x180 [ice] [ 121.393202] kthread_worker_fn+0xa2/0x260 [ 121.393216] ? __pfx_ice_ptp_periodic_work+0x10/0x10 [ice] [ 121.393359] ? __pfx_kthread_worker_fn+0x10/0x10 [ 121.393371] kthread+0x10d/0x230 [ 121.393382] ? __pfx_kthread+0x10/0x10 [ 121.393393] ret_from_fork+0x273/0x2b0 [ 121.393407] ? __pfx_kthread+0x10/0x10 [ 121.393417] ret_from_fork_asm+0x1a/0x30 [ 121.393432] </TASK> Fixes: 803bef8 ("ice: factor out ice_ptp_rebuild_owner()") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

syzbot reported a kernel BUG in fib6_add_rt2node() when adding an IPv6 route. [0] Commit f72514b ("ipv6: clear RA flags when adding a static route") introduced logic to clear RTF_ADDRCONF from existing routes when a static route with the same nexthop is added. However, this causes a problem when the existing route has a gateway. When RTF_ADDRCONF is cleared from a route that has a gateway, that route becomes eligible for ECMP, i.e. rt6_qualify_for_ecmp() returns true. The issue is that this route was never added to the fib6_siblings list. This leads to a mismatch between the following counts: - The sibling count computed by iterating fib6_next chain, which includes the newly ECMP-eligible route - The actual siblings in fib6_siblings list, which does not include that route When a subsequent ECMP route is added, fib6_add_rt2node() hits BUG_ON(sibling->fib6_nsiblings != rt->fib6_nsiblings) because the counts don't match. Fix this by only clearing RTF_ADDRCONF when the existing route does not have a gateway. Routes without a gateway cannot qualify for ECMP anyway (rt6_qualify_for_ecmp() requires fib_nh_gw_family), so clearing RTF_ADDRCONF on them is safe and matches the original intent of the commit. [0]: kernel BUG at net/ipv6/ip6_fib.c:1217! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 UID: 0 PID: 6010 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 RIP: 0010:fib6_add_rt2node+0x3433/0x3470 net/ipv6/ip6_fib.c:1217 [...] Call Trace: <TASK> fib6_add+0x8da/0x18a0 net/ipv6/ip6_fib.c:1532 __ip6_ins_rt net/ipv6/route.c:1351 [inline] ip6_route_add+0xde/0x1b0 net/ipv6/route.c:3946 ipv6_route_ioctl+0x35c/0x480 net/ipv6/route.c:4571 inet6_ioctl+0x219/0x280 net/ipv6/af_inet6.c:577 sock_do_ioctl+0xdc/0x300 net/socket.c:1245 sock_ioctl+0x576/0x790 net/socket.c:1366 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f72514b ("ipv6: clear RA flags when adding a static route") Reported-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=cb809def1baaac68ab92 Tested-by: syzbot+cb809def1baaac68ab92@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260204095837.1285552-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

This reverts commit d930ffa. According to Thomas, commit d930ffa ("drm/gma500: use drm_crtc_vblank_crtc()") breaks the driver with a NULL-ptr oops on startup. This is because the IRQ initialization in gma_irq_install() now uses CRTCs that are only allocated later in psb_modeset_init(). Stack trace is below. Revert. Go back to the drawing board. [ 65.831766] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 [#1] SMP KASAN NOPTI [ 65.832114] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f] [ 65.832232] CPU: 1 UID: 0 PID: 296 Comm: (udev-worker) Tainted: G E 6.19.0-rc6-1-default+ #4622 PREEMPT(voluntary) [ 65.832376] Tainted: [E]=UNSIGNED_MODULE [ 65.832448] Hardware name: /DN2800MT, BIOS MTCDT10N.86A.0164.2012.1213.1024 12/13/2012 [ 65.832543] RIP: 0010:drm_crtc_vblank_crtc+0x24/0xd0 [ 65.832652] Code: 90 90 90 90 90 90 0f 1f 44 00 00 48 89 f8 48 81 c7 18 01 00 00 48 83 ec 10 48 ba 00 00 00 00 00 fc ff df 48 89 f9 48 c1 e9 03 <0f> b6 14 11 84 d2 74 05 80 fa 03 7e 58 48 89 c6 8b 90 18 01 00 00 [ 65.832820] RSP: 0018:ffff88800c8f7688 EFLAGS: 00010006 [ 65.832919] RAX: fffffffffffffff0 RBX: ffff88800fff4928 RCX: 0000000000000021 [ 65.833011] RDX: dffffc0000000000 RSI: ffffc90000978130 RDI: 0000000000000108 [ 65.833107] RBP: ffffed1001ffea03 R08: 0000000000000000 R09: ffffed100191eec7 [ 65.833199] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8880014480c8 [ 65.833289] R13: dffffc0000000000 R14: fffffffffffffff0 R15: ffff88800fff4000 [ 65.833380] FS: 00007fe53d4d5d80(0000) GS:ffff888148dd8000(0000) knlGS:0000000000000000 [ 65.833488] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 65.833575] CR2: 00007fac707420b8 CR3: 000000000ebd1000 CR4: 00000000000006f0 [ 65.833668] Call Trace: [ 65.833735] <TASK> [ 65.833808] gma_irq_preinstall+0x190/0x3e0 [gma500_gfx] [ 65.834054] gma_irq_install+0xb2/0x240 [gma500_gfx] [ 65.834282] psb_driver_load+0x7b2/0x1090 [gma500_gfx] [ 65.834516] ? __pfx_psb_driver_load+0x10/0x10 [gma500_gfx] [ 65.834726] ? ksize+0x1d/0x40 [ 65.834817] ? drmm_add_final_kfree+0x3b/0xb0 [ 65.834935] ? __pfx_psb_pci_probe+0x10/0x10 [gma500_gfx] [ 65.835164] psb_pci_probe+0xc8/0x150 [gma500_gfx] [ 65.835384] local_pci_probe+0xd5/0x190 [ 65.835492] pci_call_probe+0x167/0x4b0 [ 65.835594] ? __pfx_pci_call_probe+0x10/0x10 [ 65.835693] ? local_clock+0x11/0x30 [ 65.835808] ? __pfx___driver_attach+0x10/0x10 [ 65.835915] ? do_raw_spin_unlock+0x55/0x230 [ 65.836014] ? pci_match_device+0x303/0x790 [ 65.836124] ? pci_match_device+0x386/0x790 [ 65.836226] ? __pfx_pci_assign_irq+0x10/0x10 [ 65.836320] ? kernfs_create_link+0x16a/0x230 [ 65.836418] ? do_raw_spin_unlock+0x55/0x230 [ 65.836526] ? __pfx___driver_attach+0x10/0x10 [ 65.836626] pci_device_probe+0x175/0x2c0 [ 65.836735] call_driver_probe+0x64/0x1e0 [ 65.836842] really_probe+0x194/0x740 [ 65.836951] ? __pfx___driver_attach+0x10/0x10 [ 65.837053] __driver_probe_device+0x18c/0x3a0 [ 65.837163] ? __pfx___driver_attach+0x10/0x10 [ 65.837262] driver_probe_device+0x4a/0x120 [ 65.837369] __driver_attach+0x19c/0x550 [ 65.837474] ? __pfx___driver_attach+0x10/0x10 [ 65.837575] bus_for_each_dev+0xe6/0x150 [ 65.837669] ? local_clock+0x11/0x30 [ 65.837770] ? __pfx_bus_for_each_dev+0x10/0x10 [ 65.837891] bus_add_driver+0x2af/0x4f0 [ 65.838000] ? __pfx_psb_init+0x10/0x10 [gma500_gfx] [ 65.838236] driver_register+0x19f/0x3a0 [ 65.838342] ? rcu_is_watching+0x11/0xb0 [ 65.838446] do_one_initcall+0xb5/0x3a0 [ 65.838546] ? __pfx_do_one_initcall+0x10/0x10 [ 65.838644] ? __kasan_slab_alloc+0x2c/0x70 [ 65.838741] ? rcu_is_watching+0x11/0xb0 [ 65.838837] ? __kmalloc_cache_noprof+0x3e8/0x6e0 [ 65.838937] ? klp_module_coming+0x1a0/0x2e0 [ 65.839033] ? do_init_module+0x85/0x7f0 [ 65.839126] ? kasan_unpoison+0x40/0x70 [ 65.839230] do_init_module+0x26e/0x7f0 [ 65.839341] ? __pfx_do_init_module+0x10/0x10 [ 65.839450] init_module_from_file+0x13f/0x160 [ 65.839549] ? __pfx_init_module_from_file+0x10/0x10 [ 65.839651] ? __lock_acquire+0x578/0xae0 [ 65.839791] ? do_raw_spin_unlock+0x55/0x230 [ 65.839886] ? idempotent_init_module+0x585/0x720 [ 65.839993] idempotent_init_module+0x1ff/0x720 [ 65.840097] ? __pfx_cred_has_capability.isra.0+0x10/0x10 [ 65.840211] ? __pfx_idempotent_init_module+0x10/0x10 Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Closes: https://lore.kernel.org/r/5aec1964-072c-4335-8f37-35e6efb4910e@suse.de Fixes: d930ffa ("drm/gma500: use drm_crtc_vblank_crtc()") Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Tested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Link: https://patch.msgid.link/20260130151319.31264-1-jani.nikula@intel.com

Fix PROCMAP_QUERY to fetch optional build ID only after dropping mmap_lock or per-VMA lock, whichever was used to lock VMA under question, to avoid deadlock reported by syzbot: -> #1 (&mm->mmap_lock){++++}-{4:4}: __might_fault+0xed/0x170 _copy_to_iter+0x118/0x1720 copy_page_to_iter+0x12d/0x1e0 filemap_read+0x720/0x10a0 blkdev_read_iter+0x2b5/0x4e0 vfs_read+0x7f4/0xae0 ksys_read+0x12a/0x250 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (&sb->s_type->i_mutex_key#8){++++}-{4:4}: __lock_acquire+0x1509/0x26d0 lock_acquire+0x185/0x340 down_read+0x98/0x490 blkdev_read_iter+0x2a7/0x4e0 __kernel_read+0x39a/0xa90 freader_fetch+0x1d5/0xa80 __build_id_parse.isra.0+0xea/0x6a0 do_procmap_query+0xd75/0x1050 procfs_procmap_ioctl+0x7a/0xb0 __x64_sys_ioctl+0x18e/0x210 do_syscall_64+0xcb/0xf80 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&mm->mmap_lock); lock(&sb->s_type->i_mutex_key#8); lock(&mm->mmap_lock); rlock(&sb->s_type->i_mutex_key#8); *** DEADLOCK *** This seems to be exacerbated (as we haven't seen these syzbot reports before that) by the recent: 777a856 ("lib/buildid: use __kernel_read() for sleepable context") To make this safe, we need to grab file refcount while VMA is still locked, but other than that everything is pretty straightforward. Internal build_id_parse() API assumes VMA is passed, but it only needs the underlying file reference, so just add another variant build_id_parse_file() that expects file passed directly. [akpm@linux-foundation.org: fix up kerneldoc] Link: https://lkml.kernel.org/r/20260129215340.3742283-1-andrii@kernel.org Fixes: ed5d583 ("fs/procfs: implement efficient VMA querying API for /proc/<pid>/maps") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reported-by: <syzbot+4e70c8e0a2017b432f7a@syzkaller.appspotmail.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Tested-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Oneway transactions sent to frozen targets via binder_proc_transaction() return a BR_TRANSACTION_PENDING_FROZEN error but they are still treated as successful since the target is expected to thaw at some point. It is then not safe to access 't' after BR_TRANSACTION_PENDING_FROZEN errors as the transaction could have been consumed by the now thawed target. This is the case for binder_netlink_report() which derreferences 't' after a pending frozen error, as pointed out by the following KASAN report: ================================================================== BUG: KASAN: slab-use-after-free in binder_netlink_report.isra.0+0x694/0x6c8 Read of size 8 at addr ffff00000f98ba38 by task binder-util/522 CPU: 4 UID: 0 PID: 522 Comm: binder-util Not tainted 6.19.0-rc6-00015-gc03e9c42ae8f #1 PREEMPT Hardware name: linux,dummy-virt (DT) Call trace: binder_netlink_report.isra.0+0x694/0x6c8 binder_transaction+0x66e4/0x79b8 binder_thread_write+0xab4/0x4440 binder_ioctl+0x1fd4/0x2940 [...] Allocated by task 522: __kmalloc_cache_noprof+0x17c/0x50c binder_transaction+0x584/0x79b8 binder_thread_write+0xab4/0x4440 binder_ioctl+0x1fd4/0x2940 [...] Freed by task 488: kfree+0x1d0/0x420 binder_free_transaction+0x150/0x234 binder_thread_read+0x2d08/0x3ce4 binder_ioctl+0x488/0x2940 [...] ================================================================== Instead, make a transaction copy so the data can be safely accessed by binder_netlink_report() after a pending frozen error. While here, add a comment about not using t->buffer in binder_netlink_report(). Cc: stable@vger.kernel.org Fixes: 6374034 ("binder: introduce transaction reports via netlink") Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260122180203.1502637-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

In ath12k_mac_op_link_sta_statistics(), the atomic context scope introduced by dp_lock also covers firmware stats request. Since that request could block, below issue is hit: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:575 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6866, name: iw preempt_count: 201, expected: 0 RCU nest depth: 0, expected: 0 3 locks held by iw/6866: #0:[...] #1:[...] #2: ffff9748f43230c8 (&dp->dp_lock){+.-.}-{3:3}, at: ath12k_mac_op_link_sta_statistics+0xc6/0x380 [ath12k] Preemption disabled at: [<ffffffffc0349656>] ath12k_mac_op_link_sta_statistics+0xc6/0x380 [ath12k] Call Trace: <TASK> show_stack dump_stack_lvl dump_stack __might_resched.cold __might_sleep __mutex_lock mutex_lock_nested ath12k_mac_get_fw_stats ath12k_mac_op_link_sta_statistics </TASK> Since firmware stats request doesn't require protection from dp_lock, move it outside to fix this issue. While moving, also refine that code hunk to make function parameters get populated when really necessary. Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3 Signed-off-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Reviewed-by: Vasanthakumar Thiagarajan <vasanthakumar.thiagarajan@oss.qualcomm.com> Link: https://patch.msgid.link/20251119-ath12k-ng-sleep-in-atomic-v1-1-5d1a726597db@oss.qualcomm.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>

Imran Shaik and others added 5 commits May 19, 2025 15:15

Merge remote-tracking branch tech/bsp/clk into integ

4846538

Merge remote-tracking branch tech/bsp/pinctrl into integ

1de231a

sgaud-quic force-pushed the qcom-next-staging branch 11 times, most recently from 8e30800 to e615cd2 Compare May 23, 2025 12:33

sgaud-quic force-pushed the qcom-next-staging branch 8 times, most recently from 37f9356 to a80388e Compare May 28, 2025 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating kernel-topics to qcom-next #1

Integrating kernel-topics to qcom-next #1
sgaud-quic wants to merge 5 commits intoqcom-next-stagingfrom
integ

sgaud-quic commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sgaud-quic commented May 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants