Free htab element out of bucket lock #8327

When bpf_timer is used in LRU hash map, calling check_and_free_fields() in htab_lru_map_delete_node() will invoke bpf_timer_cancel_and_free() to free the bpf_timer. If the timer is running on other CPUs and PREEMPT_RT is enabled, hrtimer_cancel will invoke hrtimer_cancel_wait_running() and it will try to acquire a spin-lock, however, htab_lru_map_delete_node() has already acquired a raw-spin-lock, it violates the lockdep rule and may trigger the "BUG: scheduling while atomic" warning. Fix the issue by moving the invocation of check_and_free_fields() out of bucket lock. Signed-off-by: Hou Tao <houtao1@huawei.com>

Use goto statement to bail out early when the target element is not found, instead of using a large else branch to handle the more likely case. This change doesn't affect functionality and simply make the code cleaner. Signed-off-by: Hou Tao <houtao1@huawei.com>

The freeing of special fields in map value may acquire a spin-lock (e.g., the freeing of bpf_timer), however, the lookup_and_delete_elem procedure has already held a raw-spin-lock, which violates the lockdep rule. The running context of __htab_map_lookup_and_delete_elem() has already disabled the migration. Therefore, it is OK to invoke free_htab_elem() after unlocking the bucket lock. Fix the potential problem by freeing element after unlocking bucket lock in __htab_map_lookup_and_delete_elem(). Signed-off-by: Hou Tao <houtao1@huawei.com>

During the update procedure, when overwrite element in a pre-allocated htab, the freeing of old_element is protected by the bucket lock. The reason why the bucket lock is necessary is that the old_element has already been stashed in htab->extra_elems after alloc_htab_elem() returns. If freeing the old_element after the bucket lock is unlocked, the stashed element may be reused by concurrent update procedure and the freeing of old_element will run concurrently with the reuse of the old_element. However, the invocation of check_and_free_fields() may acquire a spin-lock which violates the lockdep rule because its caller has already held a raw-spin-lock (bucket lock). The following warning will be reported when such race happens: BUG: scheduling while atomic: test_progs/676/0x00000003 3 locks held by test_progs/676: #0: ffffffff864b0240 (rcu_read_lock_trace){....}-{0:0}, at: bpf_prog_test_run_syscall+0x2c0/0x830 #1: ffff88810e961188 (&htab->lockdep_key){....}-{2:2}, at: htab_map_update_elem+0x306/0x1500 #2: ffff8881f4eac1b8 (&base->softirq_expiry_lock){....}-{2:2}, at: hrtimer_cancel_wait_running+0xe9/0x1b0 Modules linked in: bpf_testmod(O) Preemption disabled at: [<ffffffff817837a3>] htab_map_update_elem+0x293/0x1500 CPU: 0 UID: 0 PID: 676 Comm: test_progs Tainted: G ... 6.12.0+ #11 Tainted: [W]=WARN, [O]=OOT_MODULE Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)... Call Trace: <TASK> dump_stack_lvl+0x57/0x70 dump_stack+0x10/0x20 __schedule_bug+0x120/0x170 __schedule+0x300c/0x4800 schedule_rtlock+0x37/0x60 rtlock_slowlock_locked+0x6d9/0x54c0 rt_spin_lock+0x168/0x230 hrtimer_cancel_wait_running+0xe9/0x1b0 hrtimer_cancel+0x24/0x30 bpf_timer_delete_work+0x1d/0x40 bpf_timer_cancel_and_free+0x5e/0x80 bpf_obj_free_fields+0x262/0x4a0 check_and_free_fields+0x1d0/0x280 htab_map_update_elem+0x7fc/0x1500 bpf_prog_9f90bc20768e0cb9_overwrite_cb+0x3f/0x43 bpf_prog_ea601c4649694dbd_overwrite_timer+0x5d/0x7e bpf_prog_test_run_syscall+0x322/0x830 __sys_bpf+0x135d/0x3ca0 __x64_sys_bpf+0x75/0xb0 x64_sys_call+0x1b5/0xa10 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x4b/0x53 ... </TASK> It seems feasible to break the reuse and refill of per-cpu extra_elems into two independent parts: reuse the per-cpu extra_elems with bucket lock being held and refill the old_element as per-cpu extra_elems after the bucket lock is unlocked. However, it will make the concurrent overwrite procedures on the same CPU return unexpected -E2BIG error when the map is full. Therefore, the patch fixes the lock problem by breaking the cancelling of bpf_timer into two steps: 1) use hrtimer_try_to_cancel() and check its return value 2) if the timer is running, use hrtimer_cancel() through a kworker to cancel it again Considering that the current implementation of hrtimer_cancel() will try to spin on current CPU or acquire a being held softirq_expiry_lock when the current timer is running, these steps above are reasonable. However, it also has downside. When the timer is running, the cancelling of the timer is delayed when releasing the last map uref. The delay is also fixable (e.g., break the cancelling of bpf timer into two parts: one part in locked scope, another one in unlocked scope), so it can be revised later if necessary. It is a bit hard to decide the right fix tag. One reason is that the problem depends on PREEMPT_RT which is enabled in v6.12. Considering the softirq_expiry_lock lock exists since v5.4 and bpf_timer is introduced in v5.15, the bpf_timer commit is used in the fixes tag and an extra depends-on tag is added to state the dependency on PREEMPT_RT. Fixes: b00628b ("bpf: Introduce bpf timers.") Depends-on: v6.12 with PREEMPT_RT enabled Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Closes: https://lore.kernel.org/bpf/20241106084527.4gPrMnHt@linutronix.de Signed-off-by: Hou Tao <houtao1@huawei.com>

kernel-patches-daemon-bpf · 2025-01-14T22:25:44Z

Upstream branch: be339dd
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=923620
version: 2

The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao <houtao1@huawei.com>

kernel-patches-daemon-bpf · 2025-01-15T01:43:11Z

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=923620 expired. Closing PR.

kernel-patches-daemon-bpf bot added new bpf-next V1 V1-ci-fail labels Jan 7, 2025

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 2410e16 to 28f14d5 Compare January 8, 2025 15:11

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 9953d27 to 50a8c81 Compare January 8, 2025 15:11

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 28f14d5 to 9632415 Compare January 8, 2025 17:36

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 50a8c81 to 7d51d29 Compare January 8, 2025 17:36

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 9632415 to 21a4297 Compare January 8, 2025 17:42

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 7d51d29 to 3c294a7 Compare January 8, 2025 17:42

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 21a4297 to 36a754d Compare January 9, 2025 02:14

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 3c294a7 to 245a16e Compare January 9, 2025 02:16

kernel-patches-daemon-bpf bot added V2 and removed V1 V1-ci-fail labels Jan 9, 2025

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 245a16e to 250d8cb Compare January 9, 2025 06:10

kernel-patches-daemon-bpf bot added the V2-ci-fail label Jan 9, 2025

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 36a754d to b27feb5 Compare January 10, 2025 03:09

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 250d8cb to ab02356 Compare January 10, 2025 03:11

kernel-patches-daemon-bpf bot added V2-ci-pass and removed V2-ci-fail labels Jan 10, 2025

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from b27feb5 to ffc879e Compare January 10, 2025 21:30

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from ab02356 to 10d3195 Compare January 10, 2025 21:34

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from ffc879e to 77d4ead Compare January 10, 2025 22:22

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 10d3195 to eb922a6 Compare January 10, 2025 22:25

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 77d4ead to 483693f Compare January 10, 2025 22:41

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from eb922a6 to b0d3cba Compare January 10, 2025 22:44

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 483693f to d01d9d5 Compare January 11, 2025 01:36

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from b0d3cba to 16a551c Compare January 11, 2025 01:38

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from d01d9d5 to f1e85bb Compare January 13, 2025 19:42

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 16a551c to 88d0f74 Compare January 13, 2025 19:43

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from f1e85bb to a9b229c Compare January 14, 2025 22:24

Hou Tao added 4 commits January 14, 2025 14:25

kernel-patches-daemon-bpf bot force-pushed the series/922809=>bpf-next branch from 88d0f74 to 3c23b31 Compare January 14, 2025 22:25

kernel-patches-daemon-bpf bot added changes-requested and removed new labels Jan 15, 2025

kernel-patches-daemon-bpf bot closed this Jan 15, 2025

kernel-patches-daemon-bpf bot deleted the series/922809=>bpf-next branch January 17, 2025 09:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Free htab element out of bucket lock #8327

Free htab element out of bucket lock #8327

kernel-patches-daemon-bpf bot commented Jan 7, 2025

kernel-patches-daemon-bpf bot commented Jan 7, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 9, 2025

kernel-patches-daemon-bpf bot commented Jan 9, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 11, 2025

kernel-patches-daemon-bpf bot commented Jan 13, 2025

kernel-patches-daemon-bpf bot commented Jan 14, 2025

kernel-patches-daemon-bpf bot commented Jan 15, 2025

Free htab element out of bucket lock #8327

Free htab element out of bucket lock #8327

Conversation

kernel-patches-daemon-bpf bot commented Jan 7, 2025

kernel-patches-daemon-bpf bot commented Jan 7, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 8, 2025

kernel-patches-daemon-bpf bot commented Jan 9, 2025

kernel-patches-daemon-bpf bot commented Jan 9, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 10, 2025

kernel-patches-daemon-bpf bot commented Jan 11, 2025

kernel-patches-daemon-bpf bot commented Jan 13, 2025

kernel-patches-daemon-bpf bot commented Jan 14, 2025

kernel-patches-daemon-bpf bot commented Jan 15, 2025