Skip to content

Conversation

@kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=204281

@kernel-patches-bot
Copy link
Author

Master branch: ba5f4cf
series: https://patchwork.ozlabs.org/project/netdev/list/?series=204281
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/list/?series=204281, error message:
Cmd('git') failed due to: exit code(128)
cmdline: git am -3
stdout: 'Applying: bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Using index info to reconstruct a base tree...
Patch failed at 0001 bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
stderr: 'error: patch failed: kernel/bpf/percpu_freelist.c:40
error: kernel/bpf/percpu_freelist.c: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
hint: Use 'git am --show-current-patch' to see the failed patch'

@kernel-patches-bot
Copy link
Author

Master branch: ba5f4cf
series: https://patchwork.ozlabs.org/project/netdev/list/?series=204281
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/list/?series=204281
error message:
Cmd('git') failed due to: exit code(128)
cmdline: git am -3
stdout: 'Applying: bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Using index info to reconstruct a base tree...
Patch failed at 0001 bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
stderr: 'error: patch failed: kernel/bpf/percpu_freelist.c:40
error: kernel/bpf/percpu_freelist.c: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
hint: Use 'git am --show-current-patch' to see the failed patch'

conflict:

@kernel-patches-bot
Copy link
Author

Master branch: 1fd17c8
series: https://patchwork.ozlabs.org/project/netdev/list/?series=204281
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/list/?series=204281
error message:

  cmdline: git am -3
  stdout: 'Applying: bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Using index info to reconstruct a base tree...
Patch failed at 0001 bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: patch failed: kernel/bpf/percpu_freelist.c:40
error: kernel/bpf/percpu_freelist.c: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
hint: Use 'git am --show-current-patch' to see the failed patch'```

conflict: ``````

@kernel-patches-bot
Copy link
Author

Master branch: 1fd17c8
series: https://patchwork.ozlabs.org/project/netdev/list/?series=204281
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/list/?series=204281
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am -3
  stdout: 'Applying: bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Using index info to reconstruct a base tree...
Patch failed at 0001 bpf: use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: patch failed: kernel/bpf/percpu_freelist.c:40
error: kernel/bpf/percpu_freelist.c: patch does not apply
error: Did you hand edit your patch?
It does not apply to blobs recorded in its index.
hint: Use 'git am --show-current-patch' to see the failed patch'

conflict:


@kernel-patches-bot kernel-patches-bot deleted the series/204281=>bpf-next branch September 30, 2020 16:13
kernel-patches-bot pushed a commit that referenced this pull request May 12, 2022
kvm->arch.arm_pmu is set when userspace attempts to set the first PMU
attribute. As certain attributes are mandatory, arm_pmu ends up always
being set to a valid arm_pmu, otherwise KVM will refuse to run the VCPU.
However, this only happens if the VCPU has the PMU feature. If the VCPU
doesn't have the feature bit set, kvm->arch.arm_pmu will be left
uninitialized and equal to NULL.

KVM doesn't do ID register emulation for 32-bit guests and accesses to the
PMU registers aren't gated by the pmu_visibility() function. This is done
to prevent injecting unexpected undefined exceptions in guests which have
detected the presence of a hardware PMU. But even though the VCPU feature
is missing, KVM still attempts to emulate certain aspects of the PMU when
PMU registers are accessed. This leads to a NULL pointer dereference like
this one, which happens on an odroid-c4 board when running the
kvm-unit-tests pmu-cycle-counter test with kvmtool and without the PMU
feature being set:

[  454.402699] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000150
[  454.405865] Mem abort info:
[  454.408596]   ESR = 0x96000004
[  454.411638]   EC = 0x25: DABT (current EL), IL = 32 bits
[  454.416901]   SET = 0, FnV = 0
[  454.419909]   EA = 0, S1PTW = 0
[  454.423010]   FSC = 0x04: level 0 translation fault
[  454.427841] Data abort info:
[  454.430687]   ISV = 0, ISS = 0x00000004
[  454.434484]   CM = 0, WnR = 0
[  454.437404] user pgtable: 4k pages, 48-bit VAs, pgdp=000000000c924000
[  454.443800] [0000000000000150] pgd=0000000000000000, p4d=0000000000000000
[  454.450528] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[  454.456036] Modules linked in:
[  454.459053] CPU: 1 PID: 267 Comm: kvm-vcpu-0 Not tainted 5.18.0-rc4 #113
[  454.465697] Hardware name: Hardkernel ODROID-C4 (DT)
[  454.470612] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  454.477512] pc : kvm_pmu_event_mask.isra.0+0x14/0x74
[  454.482427] lr : kvm_pmu_set_counter_event_type+0x2c/0x80
[  454.487775] sp : ffff80000a9839c0
[  454.491050] x29: ffff80000a9839c0 x28: ffff000000a83a00 x27: 0000000000000000
[  454.498127] x26: 0000000000000000 x25: 0000000000000000 x24: ffff00000a510000
[  454.505198] x23: ffff000000a83a00 x22: ffff000003b01000 x21: 0000000000000000
[  454.512271] x20: 000000000000001f x19: 00000000000003ff x18: 0000000000000000
[  454.519343] x17: 000000008003fe98 x16: 0000000000000000 x15: 0000000000000000
[  454.526416] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  454.533489] x11: 000000008003fdbc x10: 0000000000009d20 x9 : 000000000000001b
[  454.540561] x8 : 0000000000000000 x7 : 0000000000000d00 x6 : 0000000000009d00
[  454.547633] x5 : 0000000000000037 x4 : 0000000000009d00 x3 : 0d09000000000000
[  454.554705] x2 : 000000000000001f x1 : 0000000000000000 x0 : 0000000000000000
[  454.561779] Call trace:
[  454.564191]  kvm_pmu_event_mask.isra.0+0x14/0x74
[  454.568764]  kvm_pmu_set_counter_event_type+0x2c/0x80
[  454.573766]  access_pmu_evtyper+0x128/0x170
[  454.577905]  perform_access+0x34/0x80
[  454.581527]  kvm_handle_cp_32+0x13c/0x160
[  454.585495]  kvm_handle_cp15_32+0x1c/0x30
[  454.589462]  handle_exit+0x70/0x180
[  454.592912]  kvm_arch_vcpu_ioctl_run+0x1c4/0x5e0
[  454.597485]  kvm_vcpu_ioctl+0x23c/0x940
[  454.601280]  __arm64_sys_ioctl+0xa8/0xf0
[  454.605160]  invoke_syscall+0x48/0x114
[  454.608869]  el0_svc_common.constprop.0+0xd4/0xfc
[  454.613527]  do_el0_svc+0x28/0x90
[  454.616803]  el0_svc+0x34/0xb0
[  454.619822]  el0t_64_sync_handler+0xa4/0x130
[  454.624049]  el0t_64_sync+0x18c/0x190
[  454.627675] Code: a9be7bfd 910003fd f9000bf3 52807ff3 (b9415001)
[  454.633714] ---[ end trace 0000000000000000 ]---

In this particular case, Linux hasn't detected the presence of a hardware
PMU because the PMU node is missing from the DTB, so userspace would have
been unable to set the VCPU PMU feature even if it attempted it. What
happens is that the 32-bit guest reads ID_DFR0, which advertises the
presence of the PMU, and when it tries to program a counter, it triggers
the NULL pointer dereference because kvm->arch.arm_pmu is NULL.

kvm-arch.arm_pmu was introduced by commit 46b1878 ("KVM: arm64:
Keep a per-VM pointer to the default PMU"). Until that commit, this
error would be triggered instead:

[   73.388140] ------------[ cut here ]------------
[   73.388189] Unknown PMU version 0
[   73.390420] WARNING: CPU: 1 PID: 264 at arch/arm64/kvm/pmu-emul.c:36 kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.399821] Modules linked in:
[   73.402835] CPU: 1 PID: 264 Comm: kvm-vcpu-0 Not tainted 5.17.0 #114
[   73.409132] Hardware name: Hardkernel ODROID-C4 (DT)
[   73.414048] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   73.420948] pc : kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.425863] lr : kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.430779] sp : ffff80000a8db9b0
[   73.434055] x29: ffff80000a8db9b0 x28: ffff000000dbaac0 x27: 0000000000000000
[   73.441131] x26: ffff000000dbaac0 x25: 00000000c600000d x24: 0000000000180720
[   73.448203] x23: ffff800009ffbe10 x22: ffff00000b612000 x21: 0000000000000000
[   73.455276] x20: 000000000000001f x19: 0000000000000000 x18: ffffffffffffffff
[   73.462348] x17: 000000008003fe98 x16: 0000000000000000 x15: 0720072007200720
[   73.469420] x14: 0720072007200720 x13: ffff800009d32488 x12: 00000000000004e6
[   73.476493] x11: 00000000000001a2 x10: ffff800009d32488 x9 : ffff800009d32488
[   73.483565] x8 : 00000000ffffefff x7 : ffff800009d8a488 x6 : ffff800009d8a488
[   73.490638] x5 : ffff0000f461a9d8 x4 : 0000000000000000 x3 : 0000000000000001
[   73.497710] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000000dbaac0
[   73.504784] Call trace:
[   73.507195]  kvm_pmu_event_mask.isra.0+0x6c/0x74
[   73.511768]  kvm_pmu_set_counter_event_type+0x2c/0x80
[   73.516770]  access_pmu_evtyper+0x128/0x16c
[   73.520910]  perform_access+0x34/0x80
[   73.524532]  kvm_handle_cp_32+0x13c/0x160
[   73.528500]  kvm_handle_cp15_32+0x1c/0x30
[   73.532467]  handle_exit+0x70/0x180
[   73.535917]  kvm_arch_vcpu_ioctl_run+0x20c/0x6e0
[   73.540489]  kvm_vcpu_ioctl+0x2b8/0x9e0
[   73.544283]  __arm64_sys_ioctl+0xa8/0xf0
[   73.548165]  invoke_syscall+0x48/0x114
[   73.551874]  el0_svc_common.constprop.0+0xd4/0xfc
[   73.556531]  do_el0_svc+0x28/0x90
[   73.559808]  el0_svc+0x28/0x80
[   73.562826]  el0t_64_sync_handler+0xa4/0x130
[   73.567054]  el0t_64_sync+0x1a0/0x1a4
[   73.570676] ---[ end trace 0000000000000000 ]---
[   73.575382] kvm: pmu event creation failed -2

The root cause remains the same: kvm->arch.pmuver was never set to
something sensible because the VCPU feature itself was never set.

The odroid-c4 is somewhat of a special case, because Linux doesn't probe
the PMU. But the above errors can easily be reproduced on any hardware,
with or without a PMU driver, as long as userspace doesn't set the PMU
feature.

Work around the fact that KVM advertises a PMU even when the VCPU feature
is not set by gating all PMU emulation on the feature. The guest can still
access the registers without KVM injecting an undefined exception.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220425145530.723858-1-alexandru.elisei@arm.com
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 2, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 3, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 3, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 3, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 4, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 4, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 4, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kernel-patches-daemon-bpf bot pushed a commit that referenced this pull request Mar 4, 2024
…th LTO kernel

In my locally build clang LTO kernel (enabling CONFIG_LTO and
CONFIG_LTO_CLANG_THIN), kprobe_multi_bench_attach/kernel subtest
failed like:
  test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
  test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
  libbpf: prog 'test_kprobe_empty': failed to attach: No such process
  test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -3
  #114/1   kprobe_multi_bench_attach/kernel:FAIL

There are multiple symbols in /sys/kernel/debug/tracing/available_filter_functions
are renamed in kallsyms due to cross file inlining. One example is for
  static function __access_remote_vm in mm/memory.c.
In a non-LTO kernel, we have the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    access_remote_vm (global, mm/memory.c)
      __access_remote_vm (static, mm/memory.c)

With LTO kernel, it is possible that access_remote_vm() is inlined by
ptrace_access_vm(). So we end up with the following call stack:
  ptrace_access_vm (global, kernel/ptrace.c)
    __access_remote_vm (static, mm/memory.c)
The compiler renames __access_remote_vm to __access_remote_vm.llvm.<hash>
to prevent potential name collision.

This patch removed __access_remote_vm and other similar functions from
kprobe_multi_attach by checking if the symbol like __access_remote_vm
does not exist in kallsyms with LTO kernel. The test succeeded after this change:
  #114/1   kprobe_multi_bench_attach/kernel:OK

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Apr 11, 2025
After ieee80211_do_stop() SKB from vif's txq could still be processed.
Indeed another concurrent vif schedule_and_wake_txq call could cause
those packets to be dequeued (see ieee80211_handle_wake_tx_queue())
without checking the sdata current state.

Because vif.drv_priv is now cleared in this function, this could lead to
driver crash.

For example in ath12k, ahvif is store in vif.drv_priv. Thus if
ath12k_mac_op_tx() is called after ieee80211_do_stop(), ahvif->ah can be
NULL, leading the ath12k_warn(ahvif->ah,...) call in this function to
trigger the NULL deref below.

  Unable to handle kernel paging request at virtual address dfffffc000000001
  KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
  batman_adv: bat0: Interface deactivated: brbh1337
  Mem abort info:
    ESR = 0x0000000096000004
    EC = 0x25: DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
    FSC = 0x04: level 0 translation fault
  Data abort info:
    ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
    CM = 0, WnR = 0, TnD = 0, TagAccess = 0
    GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
  [dfffffc000000001] address between user and kernel address ranges
  Internal error: Oops: 0000000096000004 [kernel-patches#1] SMP
  CPU: 1 UID: 0 PID: 978 Comm: lbd Not tainted 6.13.0-g633f875b8f1e kernel-patches#114
  Hardware name: HW (DT)
  pstate: 10000005 (nzcV daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : ath12k_mac_op_tx+0x6cc/0x29b8 [ath12k]
  lr : ath12k_mac_op_tx+0x174/0x29b8 [ath12k]
  sp : ffffffc086ace450
  x29: ffffffc086ace450 x28: 0000000000000000 x27: 1ffffff810d59ca4
  x26: ffffff801d05f7c0 x25: 0000000000000000 x24: 000000004000001e
  x23: ffffff8009ce4926 x22: ffffff801f9c0800 x21: ffffff801d05f7f0
  x20: ffffff8034a19f40 x19: 0000000000000000 x18: ffffff801f9c0958
  x17: ffffff800bc0a504 x16: dfffffc000000000 x15: ffffffc086ace4f8
  x14: ffffff801d05f83c x13: 0000000000000000 x12: ffffffb003a0bf03
  x11: 0000000000000000 x10: ffffffb003a0bf02 x9 : ffffff8034a19f40
  x8 : ffffff801d05f818 x7 : 1ffffff0069433dc x6 : ffffff8034a19ee0
  x5 : ffffff801d05f7f0 x4 : 0000000000000000 x3 : 0000000000000001
  x2 : 0000000000000000 x1 : dfffffc000000000 x0 : 0000000000000008
  Call trace:
   ath12k_mac_op_tx+0x6cc/0x29b8 [ath12k] (P)
   ieee80211_handle_wake_tx_queue+0x16c/0x260
   ieee80211_queue_skb+0xeec/0x1d20
   ieee80211_tx+0x200/0x2c8
   ieee80211_xmit+0x22c/0x338
   __ieee80211_subif_start_xmit+0x7e8/0xc60
   ieee80211_subif_start_xmit+0xc4/0xee0
   __ieee80211_subif_start_xmit_8023.isra.0+0x854/0x17a0
   ieee80211_subif_start_xmit_8023+0x124/0x488
   dev_hard_start_xmit+0x160/0x5a8
   __dev_queue_xmit+0x6f8/0x3120
   br_dev_queue_push_xmit+0x120/0x4a8
   __br_forward+0xe4/0x2b0
   deliver_clone+0x5c/0xd0
   br_flood+0x398/0x580
   br_dev_xmit+0x454/0x9f8
   dev_hard_start_xmit+0x160/0x5a8
   __dev_queue_xmit+0x6f8/0x3120
   ip6_finish_output2+0xc28/0x1b60
   __ip6_finish_output+0x38c/0x638
   ip6_output+0x1b4/0x338
   ip6_local_out+0x7c/0xa8
   ip6_send_skb+0x7c/0x1b0
   ip6_push_pending_frames+0x94/0xd0
   rawv6_sendmsg+0x1a98/0x2898
   inet_sendmsg+0x94/0xe0
   __sys_sendto+0x1e4/0x308
   __arm64_sys_sendto+0xc4/0x140
   do_el0_svc+0x110/0x280
   el0_svc+0x20/0x60
   el0t_64_sync_handler+0x104/0x138
   el0t_64_sync+0x154/0x158

To avoid that, empty vif's txq at ieee80211_do_stop() so no packet could
be dequeued after ieee80211_do_stop() (new packets cannot be queued
because SDATA_STATE_RUNNING is cleared at this point).

Fixes: ba8c3d6 ("mac80211: add an intermediate software queue implementation")
Signed-off-by: Remi Pommarel <repk@triplefau.lt>
Link: https://patch.msgid.link/ff7849e268562456274213c0476e09481a48f489.1742833382.git.repk@triplefau.lt
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 19, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 19, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 22, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 23, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
guidosarducci added a commit to guidosarducci/bpf-ci that referenced this pull request Sep 24, 2025
The Fixes: commit made use of the lower 3 bits of (void *)sk->sk_user_data
for flags, and refactored to simplify adding even more.

This change immediately broke 32-bit usage: in BPF's reuseport_array for
example, 'struct reuseport_array' has an array 'struct sock __rcu *ptrs[]'
whose members must be cleared on socket close via now-broken references
from sk->sk_user_data. This leads to subtle memory corruption and lock
issues that result in kernel hangs and panics while running BPF selftests:

root@qemu-armhf:/usr/libexec/kselftests-bpf# test_progs -a select_reuseport
bpf_testmod.ko is already unloaded.
Loading bpf_testmod.ko...
Successfully loaded bpf_testmod.ko.
test_config:PASS:netns_new 0 nsec
kernel-patches#356/1   select_reuseport/reuseport_sockarray IPv4/TCP LOOPBACK test_err_inner_map:OK
[...]
------------[ cut here ]------------
WARNING: CPU: 0 PID: 87 at kernel/locking/lockdep.c:238 __lock_acquire+0xac0/0xd1c
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: bpf_testmod(OE) bpf_preload
CPU: 0 UID: 0 PID: 87 Comm: test_progs Tainted: G           OE       6.17.0-rc1-00233-ge37b36224f81-dirty kernel-patches#114 NONE
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Generic DT based system
Call trace:
 dump_backtrace from show_stack+0x20/0x24
 r7:c01e2ebc r6:00000080 r5:60010093 r4:c14d3d80
 show_stack from dump_stack_lvl+0x90/0xc0
 dump_stack_lvl from dump_stack+0x18/0x1c
 r7:c01e2ebc r6:00000009 r5:000000ee r4:c14c5bc4
 dump_stack from __warn+0x8c/0x1b4
 __warn from warn_slowpath_fmt+0x130/0x1a4
 r8:c01e2ebc r7:c14bd144 r6:c14c5bc4 r5:c3cad400 r4:c1cf8a04
 warn_slowpath_fmt from __lock_acquire+0xac0/0xd1c
 r8:c2896b50 r7:00000000 r6:c58863b8 r5:c3cad400 r4:c3cadcc0
 __lock_acquire from lock_acquire.part.0+0xbc/0x240
 r10:00000000 r9:1c0ed000 r8:00000000 r7:60010013 r6:c1b902f0 r5:c1b902f0
 r4:df865cd0
 lock_acquire.part.0 from lock_acquire+0x90/0x168
 r10:c5886100 r9:c46a6c04 r8:00000000 r7:00000000 r6:00000000 r5:00000000
 r4:c58863b8
 lock_acquire from _raw_write_lock_bh+0x54/0x90
 r9:c46a6c04 r8:00000000 r7:00000055 r6:c58863b8 r5:c58863a8 r4:c0394774
 _raw_write_lock_bh from bpf_fd_reuseport_array_update_elem+0x16c/0x26c
 r6:c59a4000 r5:c5191400 r4:c58863a8
 bpf_fd_reuseport_array_update_elem from bpf_map_update_value+0x454/0x5dc
 r10:c329a901 r9:c329a900 r8:c1cf72f0 r7:c3cad400 r6:c595dc00 r5:00000000
 r4:00000000
 bpf_map_update_value from map_update_elem+0x210/0x430
 r10:c329a901 r9:00000004 r8:c595df40 r7:df865ec0 r6:c329a900 r5:c46a6c00
 r4:c46a6cf8
 map_update_elem from __sys_bpf+0x594/0xc94
 r10:00000000 r9:befb18b0 r8:00000051 r7:00000000 r6:00000002 r5:df865eb0
 r4:00000020
 __sys_bpf from sys_bpf+0x34/0x3c
 r10:00000182 r9:c3cad400 r8:c0100234 r7:00000182 r6:00000002 r5:befb18b0
 r4:00000020
 sys_bpf from ret_fast_syscall+0x0/0x1c
Exception stack(0xdf865fa8 to 0xdf865ff0)
5fa0:                   00000020 befb18b0 00000002 befb18b0 00000020 00000000
5fc0: 00000020 befb18b0 00000002 00000182 00839395 b6fa3ce0 00000000 012ac774
5fe0: befb1880 befb1870 00863133 b6ec3312
irq event stamp: 260676
hardirqs last  enabled at (260676): [<c0149fac>] __local_bh_enable_ip+0xc4/0x1b0
hardirqs last disabled at (260675): [<c014a024>] __local_bh_enable_ip+0x13c/0x1b0
softirqs last  enabled at (260668): [<c0a1c31c>] release_sock+0x94/0x98
softirqs last disabled at (260674): [<c03946f4>] bpf_fd_reuseport_array_update_elem+0xec/0x26c
---[ end trace 0000000000000000 ]---

Reviewing kernel usage of sk->sk_user_data and the current flag bits:

    #define SK_USER_DATA_NOCOPY    1UL
    #define SK_USER_DATA_BPF       2UL
    #define SK_USER_DATA_PSOCK     4UL

reveals that SK_USER_DATA_PSOCK and SK_USER_DATA_BPF both imply
SK_USER_DATA_NOCOPY, and suggests we can instead use an equivalent
2-bit enum like:

    enum sk_user_data {
        SK_USER_DATA_NONE       = 0,
        SK_USER_DATA_NOCOPY     = 1,
        SK_USER_DATA_BPF        = 2,
        SK_USER_DATA_PSOCK      = 3,
    };

Implement this to fix the pointer corruption, and update related call
signatures and comments to clarify the change from multiple flag bits to
an enum value, with a note highlighting the 2-bit limitation.

Fixes: 2a01337 ("net: fix refcount bug in sk_psock_get (2)")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants