-
Notifications
You must be signed in to change notification settings - Fork 146
tools: bpftool: support creating and dumping outer maps #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(hash-of-maps or array-of-maps), bpftool does not allow to do so. It seems that the only reason for that is historical. Lookups for outer maps was added in commit 14dc6f0 ("bpf: Add syscall lookup support for fd array and htab"), and although the relevant code in bpftool had not been merged yet, I suspect it had already been written with the assumption that user space could not read outer maps. Let's remove the restriction, dump for outer maps works with no further change. Reported-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Quentin Monnet <quentin@isovalent.com> --- tools/bpf/bpftool/map.c | 4 ---- 1 file changed, 4 deletions(-)
hash-of-map in bpftool. This is because the kernel needs an inner_map_fd to collect metadata on the inner maps to be supported by the new map, but bpftool does not provide a way to pass this file descriptor. Add a new optional "inner_map" keyword that can be used to pass a reference to a map, retrieve a fd to that map, and pass it as the inner_map_fd. Add related documentation and bash completion. Note that we can reference the inner map by its name, meaning we can have several times the keyword "name" with different meanings (mandatory outer map name, and possibly a name to use to find the inner_map_fd). The bash completion will offer it just once, and will not suggest "name" on the following command: # bpftool map create /sys/fs/bpf/my_outer_map type hash_of_maps \ inner_map name my_inner_map [TAB] Fixing that specific case seems too convoluted. Completion will work as expected, however, if the outer map name comes first and the "inner_map name ..." is passed second. Signed-off-by: Quentin Monnet <quentin@isovalent.com> --- .../bpf/bpftool/Documentation/bpftool-map.rst | 10 +++- tools/bpf/bpftool/bash-completion/bpftool | 22 ++++++++- tools/bpf/bpftool/map.c | 48 +++++++++++++------ 3 files changed, 62 insertions(+), 18 deletions(-)
Master branch: 95cec14 patch https://patchwork.ozlabs.org/project/netdev/patch/20200904161313.29535-2-quentin@isovalent.com/ applied successfully |
At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=199591 expired. Closing PR. |
error likes: error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*): Looks like the BPF stack limit of 512 bytes is exceeded. Please move large on stack variables into BPF per-cpu array map. The error is triggered by the following LLVM patch: https://reviews.llvm.org/D87134 For example, the following code is from test_sysctl_loop1.c: static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx) { volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string"; ... } Without the above LLVM patch, the compiler did optimization to load the string (59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load, occupying 64 byte stack size. With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit. So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on the stack, total stack size exceeds 512 bytes, hence compiler complains and quits. To fix the issue, removing "volatile" key word or changing "volatile" to "const"/"static const" does not work, the string is put in .rodata.str1.1 section, which libbpf did not process it and errors out with libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Defining the string const as global variable can fix the issue as it puts the string constant in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process '.rodata.str*.*' properly, the global definition can be changed back to local definition. Defining tcp_mem_name as a global, however, triggered a verifier failure. ./test_progs -n 7/21 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid stack off=0 size=1 verification time 6975 usec stack depth 160+64 processed 889 insns (limit 1000000) max_states_per_insn 4 total_states 14 peak_states 14 mark_read 10 libbpf: -- END LOG -- libbpf: failed to load program 'sysctl_tcp_mem' libbpf: failed to load object 'test_sysctl_loop2.o' test_bpf_verif_scale:FAIL:114 #7/21 test_sysctl_loop2.o:FAIL This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code like const char tcp_mem_name[] = "<...long string...>"; ... char name[64]; ... for (i = 0; i < sizeof(tcp_mem_name); ++i) if (name[i] != tcp_mem_name[i]) return 0; In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and 79 for test_sysctl_loop2.c. Without promotion-to-global change, old compiler generates code where the overflowed stack access is actually filled with valid value, so hiding the bpf program bug. With promotion-to-global change, the code is different, more specifically, the previous loading constants to stack is gone, and "name" occupies stack[-64:0] and overflow access triggers a verifier error. To fix the issue, adjust "name" buffer size properly. Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> --- tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 2 +- tools/testing/selftests/bpf/progs/test_sysctl_loop2.c | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) Changelog: v1 -> v2: . The tcp_mem_name change actually triggers a verifier failure due to a bpf program bug. Fixing the bpf program bug can make test pass with both old and latest llvm. (Alexei)
error likes: error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*): Looks like the BPF stack limit of 512 bytes is exceeded. Please move large on stack variables into BPF per-cpu array map. The error is triggered by the following LLVM patch: https://reviews.llvm.org/D87134 For example, the following code is from test_sysctl_loop1.c: static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx) { volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string"; ... } Without the above LLVM patch, the compiler did optimization to load the string (59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load, occupying 64 byte stack size. With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit. So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on the stack, total stack size exceeds 512 bytes, hence compiler complains and quits. To fix the issue, removing "volatile" key word or changing "volatile" to "const"/"static const" does not work, the string is put in .rodata.str1.1 section, which libbpf did not process it and errors out with libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Defining the string const as global variable can fix the issue as it puts the string constant in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process '.rodata.str*.*' properly, the global definition can be changed back to local definition. Defining tcp_mem_name as a global, however, triggered a verifier failure. ./test_progs -n 7/21 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid stack off=0 size=1 verification time 6975 usec stack depth 160+64 processed 889 insns (limit 1000000) max_states_per_insn 4 total_states 14 peak_states 14 mark_read 10 libbpf: -- END LOG -- libbpf: failed to load program 'sysctl_tcp_mem' libbpf: failed to load object 'test_sysctl_loop2.o' test_bpf_verif_scale:FAIL:114 #7/21 test_sysctl_loop2.o:FAIL This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code like const char tcp_mem_name[] = "<...long string...>"; ... char name[64]; ... for (i = 0; i < sizeof(tcp_mem_name); ++i) if (name[i] != tcp_mem_name[i]) return 0; In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and 79 for test_sysctl_loop2.c. Without promotion-to-global change, old compiler generates code where the overflowed stack access is actually filled with valid value, so hiding the bpf program bug. With promotion-to-global change, the code is different, more specifically, the previous loading constants to stack is gone, and "name" occupies stack[-64:0] and overflow access triggers a verifier error. To fix the issue, adjust "name" buffer size properly. Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> --- tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 4 ++-- tools/testing/selftests/bpf/progs/test_sysctl_loop2.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) Changelog: v2 -> v3: . using sizeof(tcp_mem_name) instead of hardcoded value for local buf "name". (Andrii) v1 -> v2: . The tcp_mem_name change actually triggers a verifier failure due to a bpf program bug. Fixing the bpf program bug can make test pass with both old and latest llvm. (Alexei)
Andrii reported that with latest clang, when building selftests, we have error likes: error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*): Looks like the BPF stack limit of 512 bytes is exceeded. Please move large on stack variables into BPF per-cpu array map. The error is triggered by the following LLVM patch: https://reviews.llvm.org/D87134 For example, the following code is from test_sysctl_loop1.c: static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx) { volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string"; ... } Without the above LLVM patch, the compiler did optimization to load the string (59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load, occupying 64 byte stack size. With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit. So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on the stack, total stack size exceeds 512 bytes, hence compiler complains and quits. To fix the issue, removing "volatile" key word or changing "volatile" to "const"/"static const" does not work, the string is put in .rodata.str1.1 section, which libbpf did not process it and errors out with libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1 libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name' in section '.rodata.str1.1' Defining the string const as global variable can fix the issue as it puts the string constant in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process '.rodata.str*.*' properly, the global definition can be changed back to local definition. Defining tcp_mem_name as a global, however, triggered a verifier failure. ./test_progs -n 7/21 libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: invalid stack off=0 size=1 verification time 6975 usec stack depth 160+64 processed 889 insns (limit 1000000) max_states_per_insn 4 total_states 14 peak_states 14 mark_read 10 libbpf: -- END LOG -- libbpf: failed to load program 'sysctl_tcp_mem' libbpf: failed to load object 'test_sysctl_loop2.o' test_bpf_verif_scale:FAIL:114 #7/21 test_sysctl_loop2.o:FAIL This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code like const char tcp_mem_name[] = "<...long string...>"; ... char name[64]; ... for (i = 0; i < sizeof(tcp_mem_name); ++i) if (name[i] != tcp_mem_name[i]) return 0; In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and 79 for test_sysctl_loop2.c. Without promotion-to-global change, old compiler generates code where the overflowed stack access is actually filled with valid value, so hiding the bpf program bug. With promotion-to-global change, the code is different, more specifically, the previous loading constants to stack is gone, and "name" occupies stack[-64:0] and overflow access triggers a verifier error. To fix the issue, adjust "name" buffer size properly. Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200909171542.3673449-1-yhs@fb.com
I got the following lockdep splat while testing: ====================================================== WARNING: possible circular locking dependency detected 5.8.0-rc7-00172-g021118712e59 #932 Not tainted ------------------------------------------------------ btrfs/229626 is trying to acquire lock: ffffffff828513f0 (cpu_hotplug_lock){++++}-{0:0}, at: alloc_workqueue+0x378/0x450 but task is already holding lock: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #7 (&fs_info->scrub_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_scrub_dev+0x11c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #6 (&fs_devs->device_list_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_run_dev_stats+0x49/0x480 commit_cowonly_roots+0xb5/0x2a0 btrfs_commit_transaction+0x516/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #5 (&fs_info->tree_log_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_commit_transaction+0x4bb/0xa60 sync_filesystem+0x6b/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0xe/0x30 btrfs_kill_super+0x12/0x20 deactivate_locked_super+0x29/0x60 cleanup_mnt+0xb8/0x140 task_work_run+0x6d/0xb0 __prepare_exit_to_usermode+0x1cc/0x1e0 do_syscall_64+0x5c/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #4 (&fs_info->reloc_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 btrfs_record_root_in_trans+0x43/0x70 start_transaction+0xd1/0x5d0 btrfs_dirty_inode+0x42/0xd0 touch_atime+0xa1/0xd0 btrfs_file_mmap+0x3f/0x60 mmap_region+0x3a4/0x640 do_mmap+0x376/0x580 vm_mmap_pgoff+0xd5/0x120 ksys_mmap_pgoff+0x193/0x230 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #3 (&mm->mmap_lock#2){++++}-{3:3}: __might_fault+0x68/0x90 _copy_to_user+0x1e/0x80 perf_read+0x141/0x2c0 vfs_read+0xad/0x1b0 ksys_read+0x5f/0xe0 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #2 (&cpuctx_mutex){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x88/0x150 perf_event_init+0x1db/0x20b start_kernel+0x3ae/0x53c secondary_startup_64+0xa4/0xb0 -> #1 (pmus_lock){+.+.}-{3:3}: __mutex_lock+0x9f/0x930 perf_event_init_cpu+0x4f/0x150 cpuhp_invoke_callback+0xb1/0x900 _cpu_up.constprop.26+0x9f/0x130 cpu_up+0x7b/0xc0 bringup_nonboot_cpus+0x4f/0x60 smp_init+0x26/0x71 kernel_init_freeable+0x110/0x258 kernel_init+0xa/0x103 ret_from_fork+0x1f/0x30 -> #0 (cpu_hotplug_lock){++++}-{0:0}: __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 cpus_read_lock+0x39/0xb0 alloc_workqueue+0x378/0x450 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 other info that might help us debug this: Chain exists of: cpu_hotplug_lock --> &fs_devs->device_list_mutex --> &fs_info->scrub_lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&fs_info->scrub_lock); lock(&fs_devs->device_list_mutex); lock(&fs_info->scrub_lock); lock(cpu_hotplug_lock); *** DEADLOCK *** 2 locks held by btrfs/229626: #0: ffff88bfe8bb86e0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_scrub_dev+0xbd/0x630 #1: ffff889dd3889518 (&fs_info->scrub_lock){+.+.}-{3:3}, at: btrfs_scrub_dev+0x11c/0x630 stack backtrace: CPU: 15 PID: 229626 Comm: btrfs Kdump: loaded Not tainted 5.8.0-rc7-00172-g021118712e59 #932 Hardware name: Quanta Tioga Pass Single Side 01-0030993006/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 Call Trace: dump_stack+0x78/0xa0 check_noncircular+0x165/0x180 __lock_acquire+0x1272/0x2310 lock_acquire+0x9e/0x360 ? alloc_workqueue+0x378/0x450 cpus_read_lock+0x39/0xb0 ? alloc_workqueue+0x378/0x450 alloc_workqueue+0x378/0x450 ? rcu_read_lock_sched_held+0x52/0x80 __btrfs_alloc_workqueue+0x15d/0x200 btrfs_alloc_workqueue+0x51/0x160 scrub_workers_get+0x5a/0x170 btrfs_scrub_dev+0x18c/0x630 ? start_transaction+0xd1/0x5d0 btrfs_dev_replace_by_ioctl.cold.21+0x10a/0x1d4 btrfs_ioctl+0x2799/0x30a0 ? do_sigaction+0x102/0x250 ? lockdep_hardirqs_on_prepare+0xca/0x160 ? _raw_spin_unlock_irq+0x24/0x30 ? trace_hardirqs_on+0x1c/0xe0 ? _raw_spin_unlock_irq+0x24/0x30 ? do_sigaction+0x102/0x250 ? ksys_ioctl+0x83/0xc0 ksys_ioctl+0x83/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This happens because we're allocating the scrub workqueues under the scrub and device list mutex, which brings in a whole host of other dependencies. Because the work queue allocation is done with GFP_KERNEL, it can trigger reclaim, which can lead to a transaction commit, which in turns needs the device_list_mutex, it can lead to a deadlock. A different problem for which this fix is a solution. Fix this by moving the actual allocation outside of the scrub lock, and then only take the lock once we're ready to actually assign them to the fs_info. We'll now have to cleanup the workqueues in a few more places, so I've added a helper to do the refcount dance to safely free the workqueues. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
…s metrics" test Linux 5.9 introduced perf test case "Parse and process metrics" and on s390 this test case always dumps core: [root@t35lp67 perf]# ./perf test -vvvv -F 67 67: Parse and process metrics : --- start --- metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC parsing metric: inst_retired.any / cpu_clk_unhalted.thread Segmentation fault (core dumped) [root@t35lp67 perf]# I debugged this core dump and gdb shows this call chain: (gdb) where #0 0x000003ffabc3192a in __strnlen_c_1 () from /lib64/libc.so.6 #1 0x000003ffabc293de in strcasestr () from /lib64/libc.so.6 #2 0x0000000001102ba2 in match_metric(list=0x1e6ea20 "inst_retired.any", n=<optimized out>) at util/metricgroup.c:368 #3 find_metric (map=<optimized out>, map=<optimized out>, metric=0x1e6ea20 "inst_retired.any") at util/metricgroup.c:765 #4 __resolve_metric (ids=0x0, map=<optimized out>, metric_list=0x0, metric_no_group=<optimized out>, m=<optimized out>) at util/metricgroup.c:844 #5 resolve_metric (ids=0x0, map=0x0, metric_list=0x0, metric_no_group=<optimized out>) at util/metricgroup.c:881 #6 metricgroup__add_metric (metric=<optimized out>, metric_no_group=metric_no_group@entry=false, events=<optimized out>, events@entry=0x3ffd84fb878, metric_list=0x0, metric_list@entry=0x3ffd84fb868, map=0x0) at util/metricgroup.c:943 #7 0x00000000011034ae in metricgroup__add_metric_list (map=0x13f9828 <map>, metric_list=0x3ffd84fb868, events=0x3ffd84fb878, metric_no_group=<optimized out>, list=<optimized out>) at util/metricgroup.c:988 #8 parse_groups (perf_evlist=perf_evlist@entry=0x1e70260, str=str@entry=0x12f34b2 "IPC", metric_no_group=<optimized out>, metric_no_merge=<optimized out>, fake_pmu=fake_pmu@entry=0x1462f18 <perf_pmu.fake>, metric_events=0x3ffd84fba58, map=0x1) at util/metricgroup.c:1040 #9 0x0000000001103eb2 in metricgroup__parse_groups_test( evlist=evlist@entry=0x1e70260, map=map@entry=0x13f9828 <map>, str=str@entry=0x12f34b2 "IPC", metric_no_group=metric_no_group@entry=false, metric_no_merge=metric_no_merge@entry=false, metric_events=0x3ffd84fba58) at util/metricgroup.c:1082 #10 0x00000000010c84d8 in __compute_metric (ratio2=0x0, name2=0x0, ratio1=<synthetic pointer>, name1=0x12f34b2 "IPC", vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:159 #11 compute_metric (ratio=<synthetic pointer>, vals=0x3ffd84fbad8, name=0x12f34b2 "IPC") at tests/parse-metric.c:189 #12 test_ipc () at tests/parse-metric.c:208 ..... ..... omitted many more lines This test case was added with commit 218ca91 ("perf tests: Add parse metric test for frontend metric"). When I compile with make DEBUG=y it works fine and I do not get a core dump. It turned out that the above listed function call chain worked on a struct pmu_event array which requires a trailing element with zeroes which was missing. The marco map_for_each_event() loops over that array tests for members metric_expr/metric_name/metric_group being non-NULL. Adding this element fixes the issue. Output after: [root@t35lp46 perf]# ./perf test 67 67: Parse and process metrics : Ok [root@t35lp46 perf]# Committer notes: As Ian remarks, this is not s390 specific: <quote Ian> This also shows up with address sanitizer on all architectures (perhaps change the patch title) and perhaps add a "Fixes: <commit>" tag. ================================================================= ==4718==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55c93b4d59e8 at pc 0x55c93a1541e2 bp 0x7ffd24327c60 sp 0x7ffd24327c58 READ of size 8 at 0x55c93b4d59e8 thread T0 #0 0x55c93a1541e1 in find_metric tools/perf/util/metricgroup.c:764:2 #1 0x55c93a153e6c in __resolve_metric tools/perf/util/metricgroup.c:844:9 #2 0x55c93a152f18 in resolve_metric tools/perf/util/metricgroup.c:881:9 #3 0x55c93a1528db in metricgroup__add_metric tools/perf/util/metricgroup.c:943:9 #4 0x55c93a151996 in metricgroup__add_metric_list tools/perf/util/metricgroup.c:988:9 #5 0x55c93a1511b9 in parse_groups tools/perf/util/metricgroup.c:1040:8 #6 0x55c93a1513e1 in metricgroup__parse_groups_test tools/perf/util/metricgroup.c:1082:9 #7 0x55c93a0108ae in __compute_metric tools/perf/tests/parse-metric.c:159:8 #8 0x55c93a010744 in compute_metric tools/perf/tests/parse-metric.c:189:9 #9 0x55c93a00f5ee in test_ipc tools/perf/tests/parse-metric.c:208:2 #10 0x55c93a00f1e8 in test__parse_metric tools/perf/tests/parse-metric.c:345:2 #11 0x55c939fd7202 in run_test tools/perf/tests/builtin-test.c:410:9 #12 0x55c939fd6736 in test_and_print tools/perf/tests/builtin-test.c:440:9 #13 0x55c939fd58c3 in __cmd_test tools/perf/tests/builtin-test.c:661:4 #14 0x55c939fd4e02 in cmd_test tools/perf/tests/builtin-test.c:807:9 #15 0x55c939e4763d in run_builtin tools/perf/perf.c:313:11 #16 0x55c939e46475 in handle_internal_command tools/perf/perf.c:365:8 #17 0x55c939e4737e in run_argv tools/perf/perf.c:409:2 #18 0x55c939e45f7e in main tools/perf/perf.c:539:3 0x55c93b4d59e8 is located 0 bytes to the right of global variable 'pme_test' defined in 'tools/perf/tests/parse-metric.c:17:25' (0x55c93b4d54a0) of size 1352 SUMMARY: AddressSanitizer: global-buffer-overflow tools/perf/util/metricgroup.c:764:2 in find_metric Shadow bytes around the buggy address: 0x0ab9a7692ae0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692af0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0ab9a7692b30: 00 00 00 00 00 00 00 00 00 00 00 00 00[f9]f9 f9 0x0ab9a7692b40: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b50: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x0ab9a7692b60: f9 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 00 00 0x0ab9a7692b70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0ab9a7692b80: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc </quote> I'm also adding the missing "Fixes" tag and setting just .name to NULL, as doing it that way is more compact (the compiler will zero out everything else) and the table iterators look for .name being NULL as the sentinel marking the end of the table. Fixes: 0a507af ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200825071211.16959-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Krzysztof Kozlowski says: ==================== nfc: s3fwrn5: Few cleanups Changes since v2: 1. Fix dtschema ID after rename (patch 1/8). 2. Apply patch 9/9 (defconfig change). Changes since v1: 1. Rename dtschema file and add additionalProperties:false, as Rob suggested, 2. Add Marek's tested-by, 3. New patches: #4, #5, #6, #7 and #9. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b972fdb ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") didn't clear all the information from the scanned system and, more specifically, left ghes_hw.num_dimms to its previous value. On a second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the leftover num_dimms value which is not 0 and thus the 0 check in enumerate_dimms() will get bypassed and it would go directly to the pointer deref: d = &hw->dimms[hw->num_dimms]; which is, of course, NULL: #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:enumerate_dimms.cold+0x7b/0x375 Reset the whole ghes_hw on driver unregister so that no stale values are used on a second system scan. Fixes: b972fdb ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") Cc: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200911164817.GA19320@zn.tnic
The aliases were never released causing the following leaks: Indirect leak of 1224 byte(s) in 9 object(s) allocated from: #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322 #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778 #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295 #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367 #5 0x56332c76a09b in run_test tests/builtin-test.c:410 #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440 #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695 #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807 #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 956a783 ("perf test: Test pmu-events aliases") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 #10 0x559fbbc0109b in run_test tests/builtin-test.c:410 #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test_generic_metric() missed to release entries in the pctx. Asan reported following leak (and more): Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14) #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497) #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111 #4 0x55f7e7341667 in expr__add_ref util/expr.c:120 #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783 #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858 #7 0x55f7e712390b in compute_single tests/parse-metric.c:128 #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180 #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196 #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295 #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355 #12 0x55f7e70be09b in run_test tests/builtin-test.c:410 #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 6d432c4 ("perf tools: Add test_generic_metric function") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The metricgroup__add_metric() can find multiple match for a metric group and it's possible to fail. Also it can fail in the middle like in resolve_metric() even for single metric. In those cases, the intermediate list and ids will be leaked like: Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683 #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906 #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940 #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993 #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045 #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087 #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164 #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196 #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318 #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356 #11 0x55f7e70be09b in run_test tests/builtin-test.c:410 #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 83de0b7 ("perf metric: Collect referenced metrics in struct metric_ref_node") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200915031819.386559-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andrii Nakryiko says: ==================== This patch set introduces a new set of BTF APIs to libbpf that allow to conveniently produce BTF types and strings. These APIs will allow libbpf to do more intrusive modifications of program's BTF (by rewriting it, at least as of right now), which is necessary for the upcoming libbpf static linking. But they are complete and generic, so can be adopted by anyone who has a need to produce BTF type information. One such example outside of libbpf is pahole, which was actually converted to these APIs (locally, pending landing of these changes in libbpf) completely and shows reduction in amount of custom pahole code necessary and brings nice savings in memory usage (about 370MB reduction at peak for my kernel configuration) and even BTF deduplication times (one second reduction, 23.7s -> 22.7s). Memory savings are due to avoiding pahole's own copy of "uncompressed" raw BTF data. Time reduction comes from faster string search and deduplication by relying on hashmap instead of BST used by pahole's own code. Consequently, these APIs are already tested on real-world complicated kernel BTF, but there is also pretty extensive selftest doing extra validations. Selftests in patch #3 add a set of generic ASSERT_{EQ,STREQ,ERR,OK} macros that are useful for writing shorter and less repretitive selftests. I decided to keep them local to that selftest for now, but if they prove to be useful in more contexts we should move them to test_progs.h. And few more (e.g., inequality tests) macros are probably necessary to have a more complete set. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> v2->v3: - resending original patches #7-9 as patches #1-3 due to merge conflict; v1->v2: - fixed comments (John); - renamed btf__append_xxx() into btf__add_xxx() (Alexei); - added btf__find_str() in addition to btf__add_str(); - btf__new_empty() now sets kernel FD to -1 initially. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ido Schimmel says: ==================== drop_monitor: Convert to use devlink tracepoint Drop monitor is able to monitor both software and hardware originated drops. Software drops are monitored by having drop monitor register its probe on the 'kfree_skb' tracepoint. Hardware originated drops are monitored by having devlink call into drop monitor whenever it receives a dropped packet from the underlying hardware. This patch set converts drop monitor to monitor both software and hardware originated drops in the same way - by registering its probe on the relevant tracepoint. In addition to drop monitor being more consistent, it is now also possible to build drop monitor as module instead of as a builtin and still monitor hardware originated drops. Initially, CONFIG_NET_DEVLINK implied CONFIG_NET_DROP_MONITOR, but after commit def2fbf ("kconfig: allow symbols implied by y to become m") we can have CONFIG_NET_DEVLINK=y and CONFIG_NET_DROP_MONITOR=m and hardware originated drops will not be monitored. Patch set overview: Patch #1 adds a tracepoint in devlink for trap reports. Patch #2 prepares probe functions in drop monitor for the new tracepoint. Patch #3 converts drop monitor to use the new tracepoint. Patches #4-#6 perform cleanups after the conversion. Patch #7 adds a test case for drop monitor. Both software originated drops and hardware originated drops (using netdevsim) are tested. Tested: | CONFIG_NET_DEVLINK | CONFIG_NET_DROP_MONITOR | Build | SW drops | HW drops | | -------------------|-------------------------|-------|----------|----------| | y | y | v | v | v | | y | m | v | v | v | | y | n | v | x | x | | n | y | v | v | x | | n | m | v | v | x | | n | n | v | x | x | ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
With latest llvm trunk, bpf programs under samples/bpf directory, if using CORE, may experience the following errors: LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1' #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c) ... #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e) #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5) #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8) ... Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o The reason is due to llvm change https://reviews.llvm.org/D87153 where the CORE relocation global generation is moved from the beginning of target dependent optimization (llc) to the beginning of target independent optimization (opt). Since samples/bpf programs did not use vmlinux.h and its clang compilation uses native architecture, we need to adjust arch triple at opt level to do CORE relocation global generation properly. Otherwise, the above error will appear. This patch fixed the issue by introduce opt and llvm-dis to compilation chain, which will do proper CORE relocation global generation as well as O2 level optimization. Tested with llvm10, llvm11 and trunk/llvm12. Signed-off-by: Yonghong Song <yhs@fb.com>
With latest llvm trunk, bpf programs under samples/bpf directory, if using CORE, may experience the following errors: LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1' #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c) ... #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e) #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5) #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8) ... Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o The reason is due to llvm change https://reviews.llvm.org/D87153 where the CORE relocation global generation is moved from the beginning of target dependent optimization (llc) to the beginning of target independent optimization (opt). Since samples/bpf programs did not use vmlinux.h and its clang compilation uses native architecture, we need to adjust arch triple at opt level to do CORE relocation global generation properly. Otherwise, the above error will appear. This patch fixed the issue by introduce opt and llvm-dis to compilation chain, which will do proper CORE relocation global generation as well as O2 level optimization. Tested with llvm10, llvm11 and trunk/llvm12. Signed-off-by: Yonghong Song <yhs@fb.com>
With latest llvm trunk, bpf programs under samples/bpf directory, if using CORE, may experience the following errors: LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1' #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c) ... #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e) #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5) #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8) ... Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o The reason is due to llvm change https://reviews.llvm.org/D87153 where the CORE relocation global generation is moved from the beginning of target dependent optimization (llc) to the beginning of target independent optimization (opt). Since samples/bpf programs did not use vmlinux.h and its clang compilation uses native architecture, we need to adjust arch triple at opt level to do CORE relocation global generation properly. Otherwise, the above error will appear. This patch fixed the issue by introduce opt and llvm-dis to compilation chain, which will do proper CORE relocation global generation as well as O2 level optimization. Tested with llvm10, llvm11 and trunk/llvm12. Signed-off-by: Yonghong Song <yhs@fb.com>
With latest llvm trunk, bpf programs under samples/bpf directory, if using CORE, may experience the following errors: LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace. Stack dump: 0. Program arguments: llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o 1. Running pass 'Function Pass Manager' on module '<stdin>'. 2. Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@bpf_prog1' #0 0x000000000183c26c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x183c26c) ... #7 0x00000000017c375e (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x17c375e) #8 0x00000000016a75c5 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16a75c5) #9 0x00000000016ab4f8 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/data/users/yhs/work/llvm-project/llvm/build.cur/install/bin/llc+0x16ab4f8) ... Aborted (core dumped) | llc -march=bpf -filetype=obj -o samples/bpf/test_probe_write_user_kern.o The reason is due to llvm change https://reviews.llvm.org/D87153 where the CORE relocation global generation is moved from the beginning of target dependent optimization (llc) to the beginning of target independent optimization (opt). Since samples/bpf programs did not use vmlinux.h and its clang compilation uses native architecture, we need to adjust arch triple at opt level to do CORE relocation global generation properly. Otherwise, the above error will appear. This patch fixed the issue by introduce opt and llvm-dis to compilation chain, which will do proper CORE relocation global generation as well as O2 level optimization. Tested with llvm10, llvm11 and trunk/llvm12. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org>
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush() generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC, which causes the flush_bio to be throttled by wbt_wait(). An example from v5.4, similar problem also exists in upstream: crash> bt 2091206 PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0" #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8 #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4 #2 [ffff800084a2f880] schedule at ffff800040bfa4b4 #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4 #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0 #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254 #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38 #8 [ffff800084a2fa60] generic_make_request at ffff800040570138 #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4 #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs] #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs] #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs] #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs] #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs] #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs] #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08 #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc #18 [ffff800084a2fe70] kthread at ffff800040118de4 After commit 2def284 ("xfs: don't allow log IO to be throttled"), the metadata submitted by xlog_write_iclog() should not be throttled. But due to the existence of the dm layer, throttling flush_bio indirectly causes the metadata bio to be throttled. Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes wbt_should_throttle() return false to avoid wbt_wait(). Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Tianxiang Peng <txpeng@tencent.com> Reviewed-by: Hao Peng <flyingpeng@tencent.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
As reported by CVE-2025-29481 [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The CVE report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Put the above condition back to bpf_object__init_prog to make sure that the program start is also within the bounds of the section to avoid the potential buffer overflow. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Reported-by: lmarch2 <2524158037@qq.com> Cc: stable@vger.kernel.org Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://www.cve.org/CVERecord?id=CVE-2025-29481 Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
As reported by CVE-2025-29481 [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The CVE report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Put the above condition back to bpf_object__init_prog to make sure that the program start is also within the bounds of the section to avoid the potential buffer overflow. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Reported-by: lmarch2 <2524158037@qq.com> Cc: stable@vger.kernel.org Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://www.cve.org/CVERecord?id=CVE-2025-29481 Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Reported-by: lmarch2 <2524158037@qq.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Reported-by: lmarch2 <2524158037@qq.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Reported-by: lmarch2 <2524158037@qq.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
As shown in [1], it is possible to corrupt a BPF ELF file such that arbitrary BPF instructions are loaded by libbpf. This can be done by setting a symbol (BPF program) section offset to a large (unsigned) number such that <section start + symbol offset> overflows and points before the section data in the memory. Consider the situation below where: - prog_start = sec_start + symbol_offset <-- size_t overflow here - prog_end = prog_start + prog_size prog_start sec_start prog_end sec_end | | | | v v v v .....................|################################|............ The report in [1] also provides a corrupted BPF ELF which can be used as a reproducer: $ readelf -S crash Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [ 2] uretprobe.mu[...] PROGBITS 0000000000000000 00000040 0000000000000068 0000000000000000 AX 0 0 8 $ readelf -s crash Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name ... 6: ffffffffffffffb8 104 FUNC GLOBAL DEFAULT 2 handle_tp Here, the handle_tp prog has section offset ffffffffffffffb8, i.e. will point before the actual memory where section 2 is allocated. This is also reported by AddressSanitizer: ================================================================= ==1232==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c7302fe0000 at pc 0x7fc3046e4b77 bp 0x7ffe64677cd0 sp 0x7ffe64677490 READ of size 104 at 0x7c7302fe0000 thread T0 #0 0x7fc3046e4b76 in memcpy (/lib64/libasan.so.8+0xe4b76) #1 0x00000040df3e in bpf_object__init_prog /src/libbpf/src/libbpf.c:856 #2 0x00000040df3e in bpf_object__add_programs /src/libbpf/src/libbpf.c:928 #3 0x00000040df3e in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3930 #4 0x00000040df3e in bpf_object_open /src/libbpf/src/libbpf.c:8067 #5 0x00000040f176 in bpf_object__open_file /src/libbpf/src/libbpf.c:8090 #6 0x000000400c16 in main /poc/poc.c:8 #7 0x7fc3043d25b4 in __libc_start_call_main (/lib64/libc.so.6+0x35b4) #8 0x7fc3043d2667 in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3667) #9 0x000000400b34 in _start (/poc/poc+0x400b34) 0x7c7302fe0000 is located 64 bytes before 104-byte region [0x7c7302fe0040,0x7c7302fe00a8) allocated by thread T0 here: #0 0x7fc3046e716b in malloc (/lib64/libasan.so.8+0xe716b) #1 0x7fc3045ee600 in __libelf_set_rawdata_wrlock (/lib64/libelf.so.1+0xb600) #2 0x7fc3045ef018 in __elf_getdata_rdlock (/lib64/libelf.so.1+0xc018) #3 0x00000040642f in elf_sec_data /src/libbpf/src/libbpf.c:3740 The problem here is that currently, libbpf only checks that the program end is within the section bounds. There used to be a check `while (sec_off < sec_sz)` in bpf_object__add_programs, however, it was removed by commit 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions"). Add a check for detecting the overflow of `sec_off + prog_sz` to bpf_object__init_prog to fix this issue. [1] https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Fixes: 6245947 ("libbpf: Allow gaps in BPF program sections to support overriden weak functions") Reported-by: lmarch2 <2524158037@qq.com> Signed-off-by: Viktor Malik <vmalik@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Link: https://github.com/lmarch2/poc/blob/main/libbpf/libbpf.md Link: https://lore.kernel.org/bpf/20250415155014.397603-1-vmalik@redhat.com
Without CONFIG_DRM_XE_GPUSVM set, GPU SVM is not initialized thus below warning pops. Refine the flush work code to be controlled by the config to avoid below warning: " [ 453.132028] ------------[ cut here ]------------ [ 453.132527] WARNING: CPU: 9 PID: 4491 at kernel/workqueue.c:4205 __flush_work+0x379/0x3a0 [ 453.133355] Modules linked in: xe drm_ttm_helper ttm gpu_sched drm_buddy drm_suballoc_helper drm_gpuvm drm_exec [ 453.134352] CPU: 9 UID: 0 PID: 4491 Comm: xe_exec_mix_mod Tainted: G U W 6.15.0-rc3+ kernel-patches#7 PREEMPT(full) [ 453.135405] Tainted: [U]=USER, [W]=WARN ... [ 453.136921] RIP: 0010:__flush_work+0x379/0x3a0 [ 453.137417] Code: 8b 45 00 48 8b 55 08 89 c7 48 c1 e8 04 83 e7 08 83 e0 0f 83 cf 02 89 c6 48 0f ba 6d 00 03 e9 d5 fe ff ff 0f 0b e9 db fd ff ff <0f> 0b 45 31 e4 e9 d1 fd ff ff 0f 0b e9 03 ff ff ff 0f 0b e9 d6 fe [ 453.139250] RSP: 0018:ffffc90000c67b18 EFLAGS: 00010246 [ 453.139782] RAX: 0000000000000000 RBX: ffff888108a24000 RCX: 0000000000002000 [ 453.140521] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8881016d61c8 [ 453.141253] RBP: ffff8881016d61c8 R08: 0000000000000000 R09: 0000000000000000 [ 453.141985] R10: 0000000000000000 R11: 0000000008a24000 R12: 0000000000000001 [ 453.142709] R13: 0000000000000002 R14: 0000000000000000 R15: ffff888107db8c00 [ 453.143450] FS: 00007f44853d4c80(0000) GS:ffff8882f469b000(0000) knlGS:0000000000000000 [ 453.144276] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 453.144853] CR2: 00007f4487629228 CR3: 00000001016aa000 CR4: 00000000000406f0 [ 453.145594] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 453.146320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 453.147061] Call Trace: [ 453.147336] <TASK> [ 453.147579] ? tick_nohz_tick_stopped+0xd/0x30 [ 453.148067] ? xas_load+0x9/0xb0 [ 453.148435] ? xa_load+0x6f/0xb0 [ 453.148781] __xe_vm_bind_ioctl+0xbd5/0x1500 [xe] [ 453.149338] ? dev_printk_emit+0x48/0x70 [ 453.149762] ? _dev_printk+0x57/0x80 [ 453.150148] ? drm_ioctl+0x17c/0x440 [ 453.150544] ? __drm_dev_vprintk+0x36/0x90 [ 453.150983] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 453.151575] ? drm_ioctl_kernel+0x9f/0xf0 [ 453.151998] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 453.152560] drm_ioctl_kernel+0x9f/0xf0 [ 453.152968] drm_ioctl+0x20f/0x440 [ 453.153332] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 453.153893] ? ioctl_has_perm.constprop.0.isra.0+0xae/0x100 [ 453.154489] ? memory_bm_test_bit+0x5/0x60 [ 453.154935] xe_drm_ioctl+0x47/0x70 [xe] [ 453.155419] __x64_sys_ioctl+0x8d/0xc0 [ 453.155824] do_syscall_64+0x47/0x110 [ 453.156228] entry_SYSCALL_64_after_hwframe+0x76/0x7e " v2 (Matt): refine commit message to have more details add Fixes tag move the code to xe_svm.h which already have the config remove a blank line per codestyle suggestion Fixes: 63f6e48 ("drm/xe: Add SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com (cherry picked from commit 9d80698) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
There are two sites in atm mpoa code that believe the fetched object net_device is of lec type. However, both of them do just name checking to ensure that the device name starts with "lec" pattern string. That is, malicious user can hijack this by creating another device starting with that pattern, thereby causing type confusion. For example, create a *team* interface with lecX name, bind that interface and send messages will get a crash like below: [ 18.450000] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [ 18.452366] BUG: unable to handle page fault for address: ffff888005702a70 [ 18.454253] #PF: supervisor instruction fetch in kernel mode [ 18.455058] #PF: error_code(0x0011) - permissions violation [ 18.455366] PGD 3801067 P4D 3801067 PUD 3802067 PMD 80000000056000e3 [ 18.455725] Oops: 0011 [kernel-patches#1] PREEMPT SMP PTI [ 18.455966] CPU: 0 PID: 130 Comm: trigger Not tainted 6.1.90 kernel-patches#7 [ 18.456921] RIP: 0010:0xffff888005702a70 [ 18.457151] Code: ..... [ 18.458168] RSP: 0018:ffffc90000677bf8 EFLAGS: 00010286 [ 18.458461] RAX: ffff888005702a70 RBX: ffff888005702000 RCX: 000000000000001b [ 18.458850] RDX: ffffc90000677c10 RSI: ffff88800565e0a8 RDI: ffff888005702000 [ 18.459248] RBP: ffffc90000677c68 R08: 0000000000000000 R09: 0000000000000000 [ 18.459644] R10: 0000000000000000 R11: ffff888005702a70 R12: ffff88800556c000 [ 18.460033] R13: ffff888005964900 R14: ffff8880054b4000 R15: ffff8880054b5000 [ 18.460425] FS: 0000785e61b5a740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 [ 18.460872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 18.461183] CR2: ffff888005702a70 CR3: 00000000054c2000 CR4: 00000000000006f0 [ 18.461580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 18.461974] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 18.462368] Call Trace: [ 18.462518] <TASK> [ 18.462645] ? __die_body+0x64/0xb0 [ 18.462856] ? page_fault_oops+0x353/0x3e0 [ 18.463092] ? exc_page_fault+0xaf/0xd0 [ 18.463322] ? asm_exc_page_fault+0x22/0x30 [ 18.463589] ? msg_from_mpoad+0x431/0x9d0 [ 18.463820] ? vcc_sendmsg+0x165/0x3b0 [ 18.464031] vcc_sendmsg+0x20a/0x3b0 [ 18.464238] ? wake_bit_function+0x80/0x80 [ 18.464511] __sys_sendto+0x38c/0x3a0 [ 18.464729] ? percpu_counter_add_batch+0x87/0xb0 [ 18.465002] __x64_sys_sendto+0x22/0x30 [ 18.465219] do_syscall_64+0x6c/0xa0 [ 18.465465] ? preempt_count_add+0x54/0xb0 [ 18.465697] ? up_read+0x37/0x80 [ 18.465883] ? do_user_addr_fault+0x25e/0x5b0 [ 18.466126] ? exit_to_user_mode_prepare+0x12/0xb0 [ 18.466435] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 18.466727] RIP: 0033:0x785e61be4407 [ 18.467948] RSP: 002b:00007ffe61ae2150 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 18.468368] RAX: ffffffffffffffda RBX: 0000785e61b5a740 RCX: 0000785e61be4407 [ 18.468758] RDX: 000000000000019c RSI: 00007ffe61ae21c0 RDI: 0000000000000003 [ 18.469149] RBP: 00007ffe61ae2370 R08: 0000000000000000 R09: 0000000000000000 [ 18.469542] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 18.469936] R13: 00007ffe61ae2498 R14: 0000785e61d74000 R15: 000057bddcbabd98 Correctly validating the net_device object has several methods. For example, function xgbe_netdev_event() checks `netdev_ops` field, function clip_device_event() checks `type` field. Considering the related variable `lec_netdev_ops` is not defined in the same file, so introduce another type value `ARPHRD_ATM_LANE` for a simple and correct check. By the way, this bug dates back to pre-git history (2.3.15), hence use the first reference for tracking. Signed-off-by: Lin Ma <linma@zju.edu.cn> Fixes: 1da177e ("Linux-2.6.12-rc2") Signed-off-by: NipaLocal <nipa@local>
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] kernel-patches#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] kernel-patches#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] kernel-patches#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NipaLocal <nipa@local>
A crash in conntrack was reported while trying to unlink the conntrack entry from the hash bucket list: [exception RIP: __nf_ct_delete_from_lists+172] [..] kernel-patches#7 [ff539b5a2b043aa0] nf_ct_delete at ffffffffc124d421 [nf_conntrack] kernel-patches#8 [ff539b5a2b043ad0] nf_ct_gc_expired at ffffffffc124d999 [nf_conntrack] kernel-patches#9 [ff539b5a2b043ae0] __nf_conntrack_find_get at ffffffffc124efbc [nf_conntrack] [..] The nf_conn struct is marked as allocated from slab but appears to be in a partially initialised state: ct hlist pointer is garbage; looks like the ct hash value (hence crash). ct->status is equal to IPS_CONFIRMED|IPS_DYING, which is expected ct->timeout is 30000 (=30s), which is unexpected. Everything else looks like normal udp conntrack entry. If we ignore ct->status and pretend its 0, the entry matches those that are newly allocated but not yet inserted into the hash: - ct hlist pointers are overloaded and store/cache the raw tuple hash - ct->timeout matches the relative time expected for a new udp flow rather than the absolute 'jiffies' value. If it were not for the presence of IPS_CONFIRMED, __nf_conntrack_find_get() would have skipped the entry. Theory is that we did hit following race: cpu x cpu y cpu z found entry E found entry E E is expired <preemption> nf_ct_delete() return E to rcu slab init_conntrack E is re-inited, ct->status set to 0 reply tuplehash hnnode.pprev stores hash value. cpu y found E right before it was deleted on cpu x. E is now re-inited on cpu z. cpu y was preempted before checking for expiry and/or confirm bit. ->refcnt set to 1 E now owned by skb ->timeout set to 30000 If cpu y were to resume now, it would observe E as expired but would skip E due to missing CONFIRMED bit. nf_conntrack_confirm gets called sets: ct->status |= CONFIRMED This is wrong: E is not yet added to hashtable. cpu y resumes, it observes E as expired but CONFIRMED: <resumes> nf_ct_expired() -> yes (ct->timeout is 30s) confirmed bit set. cpu y will try to delete E from the hashtable: nf_ct_delete() -> set DYING bit __nf_ct_delete_from_lists Even this scenario doesn't guarantee a crash: cpu z still holds the table bucket lock(s) so y blocks: wait for spinlock held by z CONFIRMED is set but there is no guarantee ct will be added to hash: "chaintoolong" or "clash resolution" logic both skip the insert step. reply hnnode.pprev still stores the hash value. unlocks spinlock return NF_DROP <unblocks, then crashes on hlist_nulls_del_rcu pprev> In case CPU z does insert the entry into the hashtable, cpu y will unlink E again right away but no crash occurs. Without 'cpu y' race, 'garbage' hlist is of no consequence: ct refcnt remains at 1, eventually skb will be free'd and E gets destroyed via: nf_conntrack_put -> nf_conntrack_destroy -> nf_ct_destroy. To resolve this, move the IPS_CONFIRMED assignment after the table insertion but before the unlock. Pablo points out that the confirm-bit-store could be reordered to happen before hlist add resp. the timeout fixup, so switch to set_bit and before_atomic memory barrier to prevent this. It doesn't matter if other CPUs can observe a newly inserted entry right before the CONFIRMED bit was set: Such event cannot be distinguished from above "E is the old incarnation" case: the entry will be skipped. Also change nf_ct_should_gc() to first check the confirmed bit. The gc sequence is: 1. Check if entry has expired, if not skip to next entry 2. Obtain a reference to the expired entry. 3. Call nf_ct_should_gc() to double-check step 1. nf_ct_should_gc() is thus called only for entries that already failed an expiry check. After this patch, once the confirmed bit check passes ct->timeout has been altered to reflect the absolute 'best before' date instead of a relative time. Step 3 will therefore not remove the entry. Without this change to nf_ct_should_gc() we could still get this sequence: 1. Check if entry has expired. 2. Obtain a reference. 3. Call nf_ct_should_gc() to double-check step 1: 4 - entry is still observed as expired 5 - meanwhile, ct->timeout is corrected to absolute value on other CPU and confirm bit gets set 6 - confirm bit is seen 7 - valid entry is removed again First do check 6), then 4) so the gc expiry check always picks up either confirmed bit unset (entry gets skipped) or expiry re-check failure for re-inited conntrack objects. This change cannot be backported to releases before 5.19. Without commit 8a75a2c ("netfilter: conntrack: remove unconfirmed list") |= IPS_CONFIRMED line cannot be moved without further changes. Cc: Razvan Cojocaru <rzvncj@gmail.com> Link: https://lore.kernel.org/netfilter-devel/20250627142758.25664-1-fw@strlen.de/ Link: https://lore.kernel.org/netfilter-devel/4239da15-83ff-4ca4-939d-faef283471bb@gmail.com/ Fixes: 1397af5 ("netfilter: conntrack: remove the percpu dying list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull request for series with
subject: tools: bpftool: support creating and dumping outer maps
version: 1
url: https://patchwork.ozlabs.org/project/netdev/list/?series=199591