-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
branch reorganization and cleanup #9
Comments
current plan look ok?
|
Lots of commits lack Signed-off-by tags. |
FYI - I have edited the original issue description The important branches are now:
TODO for @MichaelZhuxx or @tekkamanninja:
|
@MichaelZhuxx please delete the |
Sure, Done |
…nect It's possible to trigger NULL pointer dereference by local unprivileged user, when calling getsockname() after failed bind() (e.g. the bind fails because LLCP_SAP_MAX used as SAP): BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 1 PID: 426 Comm: llcp_sock_getna Not tainted 5.13.0-rc2-next-20210521+ #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1 04/01/2014 Call Trace: llcp_sock_getname+0xb1/0xe0 __sys_getpeername+0x95/0xc0 ? lockdep_hardirqs_on_prepare+0xd5/0x180 ? syscall_enter_from_user_mode+0x1c/0x40 __x64_sys_getpeername+0x11/0x20 do_syscall_64+0x36/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This can be reproduced with Syzkaller C repro (bind followed by getpeername): https://syzkaller.appspot.com/x/repro.c?x=14def446e00000 Cc: <stable@vger.kernel.org> Fixes: d646960 ("NFC: Initial LLCP support") Reported-by: syzbot+80fb126e7f7d8b1a5914@syzkaller.appspotmail.com Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20210531072138.5219-1-krzysztof.kozlowski@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
ASan reported a memory leak caused by info_linear not being deallocated. The info_linear was allocated during in perf_event__synthesize_one_bpf_prog(). This patch adds the corresponding free() when bpf_prog_info_node is freed in perf_env__purge_bpf(). $ sudo ./perf record -- sleep 5 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB perf.data (8 samples) ] ================================================================= ==297735==ERROR: LeakSanitizer: detected memory leaks Direct leak of 7688 byte(s) in 19 object(s) allocated from: #0 0x4f420f in malloc (/home/user/linux/tools/perf/perf+0x4f420f) #1 0xc06a74 in bpf_program__get_prog_info_linear /home/user/linux/tools/lib/bpf/libbpf.c:11113:16 #2 0xb426fe in perf_event__synthesize_one_bpf_prog /home/user/linux/tools/perf/util/bpf-event.c:191:16 #3 0xb42008 in perf_event__synthesize_bpf_events /home/user/linux/tools/perf/util/bpf-event.c:410:9 #4 0x594596 in record__synthesize /home/user/linux/tools/perf/builtin-record.c:1490:8 #5 0x58c9ac in __cmd_record /home/user/linux/tools/perf/builtin-record.c:1798:8 #6 0x58990b in cmd_record /home/user/linux/tools/perf/builtin-record.c:2901:8 #7 0x7b2a20 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b12ff in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2583 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0d79 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7fa357ef6b74 in __libc_start_main /usr/src/debug/glibc-2.33-8.fc34.x86_64/csu/../csu/libc-start.c:332:16 Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20210602224024.300485-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Loading and then unloading module g_dpgp on a VM that does not support the driver currently throws a WARN_ON message because the port has not been initialized. Removing an unused driver is a valid use-case and the WARN_ON kernel warning is a bit excessive, so remove it. Cleans up: [27654.638698] ------------[ cut here ]------------ [27654.638705] WARNING: CPU: 6 PID: 2956336 at drivers/usb/gadget/function/u_serial.c:1201 gserial_free_line+0x7c/0x90 [u_serial] [27654.638728] Modules linked in: g_dbgp(-) u_serial usb_f_tcm target_core_mod libcomposite udc_core vmw_vmci mcb i2c_nforce2 i2c_amd756 nfit cx8800 videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 cx88xx tveeprom videobuf2_common videodev mc ccp hid_generic hid intel_ishtp cros_ec mc13xxx_core vfio_mdev mdev i915 i2c_algo_bit kvm ppdev parport zatm eni suni uPD98402 atm rio_scan binder_linux hwmon_vid video ipmi_devintf ipmi_msghandler zstd nls_utf8 decnet qrtr ns sctp ip6_udp_tunnel udp_tunnel fcrypt pcbc nhc_udp nhc_ipv6 nhc_routing nhc_mobility nhc_hop nhc_dest nhc_fragment 6lowpan ts_kmp dccp_ipv6 dccp_ipv4 dccp snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq_dummy snd_seq snd_seq_device xen_front_pgdir_shbuf binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common snd_hda_codec_generic ledtrig_audio snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd rapl soundcore joydev input_leds mac_hid serio_raw efi_pstore [27654.638880] qemu_fw_cfg sch_fq_codel msr virtio_rng autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl drm_ttm_helper crct10dif_pclmul ttm drm_kms_helper syscopyarea sysfillrect sysimgblt virtio_net fb_sys_fops cec net_failover rc_core ahci psmouse drm libahci lpc_ich virtio_blk failover [last unloaded: u_ether] [27654.638949] CPU: 6 PID: 2956336 Comm: modprobe Tainted: P O 5.13.0-9-generic #9 [27654.638956] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [27654.638969] RIP: 0010:gserial_free_line+0x7c/0x90 [u_serial] [27654.638981] Code: 20 00 00 00 00 e8 74 1a ba c9 4c 89 e7 e8 8c fe ff ff 48 8b 3d 75 3b 00 00 44 89 f6 e8 3d 7c 69 c9 5b 41 5c 41 5d 41 5e 5d c3 <0f> 0b 4c 89 ef e8 4a 1a ba c9 5b 41 5c 41 5d 41 5e 5d c3 90 0f 1f [27654.638986] RSP: 0018:ffffba0b81403da0 EFLAGS: 00010246 [27654.638992] RAX: 0000000000000000 RBX: ffffffffc0eaf6a0 RCX: 0000000000000000 [27654.638996] RDX: ffff8e21c0cac8c0 RSI: 0000000000000006 RDI: ffffffffc0eaf6a0 [27654.639000] RBP: ffffba0b81403dc0 R08: ffffba0b81403de0 R09: fefefefefefefeff [27654.639003] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [27654.639006] R13: ffffffffc0eaf6a0 R14: 0000000000000000 R15: 0000000000000000 [27654.639010] FS: 00007faa1935e740(0000) GS:ffff8e223bd80000(0000) knlGS:0000000000000000 [27654.639015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27654.639019] CR2: 00007ffc840cd4e8 CR3: 000000000e1ac006 CR4: 0000000000370ee0 [27654.639028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [27654.639031] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [27654.639035] Call Trace: [27654.639044] dbgp_exit+0x1c/0xa1a [g_dbgp] [27654.639054] __do_sys_delete_module.constprop.0+0x144/0x260 [27654.639066] ? call_rcu+0xe/0x10 [27654.639073] __x64_sys_delete_module+0x12/0x20 [27654.639081] do_syscall_64+0x61/0xb0 [27654.639092] ? exit_to_user_mode_loop+0xec/0x160 [27654.639098] ? exit_to_user_mode_prepare+0x37/0xb0 [27654.639104] ? syscall_exit_to_user_mode+0x27/0x50 [27654.639110] ? __x64_sys_close+0x12/0x40 [27654.639119] ? do_syscall_64+0x6e/0xb0 [27654.639126] ? exit_to_user_mode_prepare+0x37/0xb0 [27654.639132] ? syscall_exit_to_user_mode+0x27/0x50 [27654.639137] ? __x64_sys_newfstatat+0x1e/0x20 [27654.639146] ? do_syscall_64+0x6e/0xb0 [27654.639154] ? exc_page_fault+0x8f/0x170 [27654.639159] ? asm_exc_page_fault+0x8/0x30 [27654.639166] entry_SYSCALL_64_after_hwframe+0x44/0xae [27654.639173] RIP: 0033:0x7faa194a4b2b [27654.639179] Code: 73 01 c3 48 8b 0d 3d 73 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d 73 0c 00 f7 d8 64 89 01 48 [27654.639185] RSP: 002b:00007ffc840d0578 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [27654.639191] RAX: ffffffffffffffda RBX: 000056060f9f4e70 RCX: 00007faa194a4b2b [27654.639194] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000056060f9f4ed8 [27654.639197] RBP: 000056060f9f4e70 R08: 0000000000000000 R09: 0000000000000000 [27654.639200] R10: 00007faa1951eac0 R11: 0000000000000206 R12: 000056060f9f4ed8 [27654.639203] R13: 0000000000000000 R14: 000056060f9f4ed8 R15: 00007ffc840d06c8 [27654.639219] ---[ end trace 8dd0ea0bb32ce94a ]--- Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210701144305.110078-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Enabling the framebuffer leads to a system hang. Running, as a debug hack, the store_pan() function in drivers/video/fbdev/core/fbsysfs.c without taking the console_lock, allows to see the crash backtrace on the serial line. ~ # echo 0 0 > /sys/class/graphics/fb0/pan [ 9.719414] Unhandled exception: IPSR = 00000005 LR = fffffff1 [ 9.726937] CPU: 0 PID: 49 Comm: sh Not tainted 5.13.0-rc5 #9 [ 9.733008] Hardware name: STM32 (Device Tree Support) [ 9.738296] PC is at clk_gate_is_enabled+0x0/0x28 [ 9.743426] LR is at stm32f4_pll_div_set_rate+0xf/0x38 [ 9.748857] pc : [<0011e4be>] lr : [<0011f9e3>] psr: 0100000b [ 9.755373] sp : 00bc7be0 ip : 00000000 fp : 001f3ac4 [ 9.760812] r10: 002610d0 r9 : 01efe920 r8 : 00540560 [ 9.766269] r7 : 02e7ddb0 r6 : 0173eed8 r5 : 00000000 r4 : 004027c0 [ 9.773081] r3 : 0011e4bf r2 : 02e7ddb0 r1 : 0173eed8 r0 : 1d3267b8 [ 9.779911] xPSR: 0100000b [ 9.782719] CPU: 0 PID: 49 Comm: sh Not tainted 5.13.0-rc5 #9 [ 9.788791] Hardware name: STM32 (Device Tree Support) [ 9.794120] [<0000afa1>] (unwind_backtrace) from [<0000a33f>] (show_stack+0xb/0xc) [ 9.802421] [<0000a33f>] (show_stack) from [<0000a8df>] (__invalid_entry+0x4b/0x4c) The `pll_num' field in the post_div_data configuration contained a wrong value which also referenced an uninitialized hardware clock when clk_register_pll_div() was called. Fixes: 517633e ("clk: stm32f4: Add post divisor for I2S & SAI PLLs") Signed-off-by: Dario Binacchi <dariobin@libero.it> Reviewed-by: Gabriel Fernandez <gabriel.fernandez@st.com> Link: https://lore.kernel.org/r/20210725160725.10788-1-dariobin@libero.it Signed-off-by: Stephen Boyd <sboyd@kernel.org>
[ Upstream commit 24b5b19 ] Enabling the framebuffer leads to a system hang. Running, as a debug hack, the store_pan() function in drivers/video/fbdev/core/fbsysfs.c without taking the console_lock, allows to see the crash backtrace on the serial line. ~ # echo 0 0 > /sys/class/graphics/fb0/pan [ 9.719414] Unhandled exception: IPSR = 00000005 LR = fffffff1 [ 9.726937] CPU: 0 PID: 49 Comm: sh Not tainted 5.13.0-rc5 #9 [ 9.733008] Hardware name: STM32 (Device Tree Support) [ 9.738296] PC is at clk_gate_is_enabled+0x0/0x28 [ 9.743426] LR is at stm32f4_pll_div_set_rate+0xf/0x38 [ 9.748857] pc : [<0011e4be>] lr : [<0011f9e3>] psr: 0100000b [ 9.755373] sp : 00bc7be0 ip : 00000000 fp : 001f3ac4 [ 9.760812] r10: 002610d0 r9 : 01efe920 r8 : 00540560 [ 9.766269] r7 : 02e7ddb0 r6 : 0173eed8 r5 : 00000000 r4 : 004027c0 [ 9.773081] r3 : 0011e4bf r2 : 02e7ddb0 r1 : 0173eed8 r0 : 1d3267b8 [ 9.779911] xPSR: 0100000b [ 9.782719] CPU: 0 PID: 49 Comm: sh Not tainted 5.13.0-rc5 #9 [ 9.788791] Hardware name: STM32 (Device Tree Support) [ 9.794120] [<0000afa1>] (unwind_backtrace) from [<0000a33f>] (show_stack+0xb/0xc) [ 9.802421] [<0000a33f>] (show_stack) from [<0000a8df>] (__invalid_entry+0x4b/0x4c) The `pll_num' field in the post_div_data configuration contained a wrong value which also referenced an uninitialized hardware clock when clk_register_pll_div() was called. Fixes: 517633e ("clk: stm32f4: Add post divisor for I2S & SAI PLLs") Signed-off-by: Dario Binacchi <dariobin@libero.it> Reviewed-by: Gabriel Fernandez <gabriel.fernandez@st.com> Link: https://lore.kernel.org/r/20210725160725.10788-1-dariobin@libero.it Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
FD uses xyarray__entry that may return NULL if an index is out of bounds. If NULL is returned then a segv happens as FD unconditionally dereferences the pointer. This was happening in a case of with perf iostat as shown below. The fix is to make FD an "int*" rather than an int and handle the NULL case as either invalid input or a closed fd. $ sudo gdb --args perf stat --iostat list ... Breakpoint 1, perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 50 { (gdb) bt #0 perf_evsel__alloc_fd (evsel=0x5555560951a0, ncpus=1, nthreads=1) at evsel.c:50 #1 0x000055555585c188 in evsel__open_cpu (evsel=0x5555560951a0, cpus=0x555556093410, threads=0x555556086fb0, start_cpu=0, end_cpu=1) at util/evsel.c:1792 #2 0x000055555585cfb2 in evsel__open (evsel=0x5555560951a0, cpus=0x0, threads=0x555556086fb0) at util/evsel.c:2045 #3 0x000055555585d0db in evsel__open_per_thread (evsel=0x5555560951a0, threads=0x555556086fb0) at util/evsel.c:2065 #4 0x00005555558ece64 in create_perf_stat_counter (evsel=0x5555560951a0, config=0x555555c34700 <stat_config>, target=0x555555c2f1c0 <target>, cpu=0) at util/stat.c:590 #5 0x000055555578e927 in __run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:833 #6 0x000055555578f3c6 in run_perf_stat (argc=1, argv=0x7fffffffe4a0, run_idx=0) at builtin-stat.c:1048 #7 0x0000555555792ee5 in cmd_stat (argc=1, argv=0x7fffffffe4a0) at builtin-stat.c:2534 #8 0x0000555555835ed3 in run_builtin (p=0x555555c3f540 <commands+288>, argc=3, argv=0x7fffffffe4a0) at perf.c:313 #9 0x0000555555836154 in handle_internal_command (argc=3, argv=0x7fffffffe4a0) at perf.c:365 #10 0x000055555583629f in run_argv (argcp=0x7fffffffe2ec, argv=0x7fffffffe2e0) at perf.c:409 #11 0x0000555555836692 in main (argc=3, argv=0x7fffffffe4a0) at perf.c:539 ... (gdb) c Continuing. Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (uncore_iio_0/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/). /bin/dmesg | grep -i perf may provide additional information. Program received signal SIGSEGV, Segmentation fault. 0x00005555559b03ea in perf_evsel__close_fd_cpu (evsel=0x5555560951a0, cpu=1) at evsel.c:166 166 if (FD(evsel, cpu, thread) >= 0) v3. fixes a bug in perf_evsel__run_ioctl where the sense of a branch was backward. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210918054440.2350466-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
commit 57f0ff0 upstream. It's later supposed to be either a correct address or NULL. Without the initialization, it may contain an undefined value which results in the following segmentation fault: # perf top --sort comm -g --ignore-callees=do_idle terminates with: #0 0x00007ffff56b7685 in __strlen_avx2 () from /lib64/libc.so.6 #1 0x00007ffff55e3802 in strdup () from /lib64/libc.so.6 #2 0x00005555558cb139 in hist_entry__init (callchain_size=<optimized out>, sample_self=true, template=0x7fffde7fb110, he=0x7fffd801c250) at util/hist.c:489 #3 hist_entry__new (template=template@entry=0x7fffde7fb110, sample_self=sample_self@entry=true) at util/hist.c:564 #4 0x00005555558cb4ba in hists__findnew_entry (hists=hists@entry=0x5555561d9e38, entry=entry@entry=0x7fffde7fb110, al=al@entry=0x7fffde7fb420, sample_self=sample_self@entry=true) at util/hist.c:657 #5 0x00005555558cba1b in __hists__add_entry (hists=hists@entry=0x5555561d9e38, al=0x7fffde7fb420, sym_parent=<optimized out>, bi=bi@entry=0x0, mi=mi@entry=0x0, sample=sample@entry=0x7fffde7fb4b0, sample_self=true, ops=0x0, block_info=0x0) at util/hist.c:288 #6 0x00005555558cbb70 in hists__add_entry (sample_self=true, sample=0x7fffde7fb4b0, mi=0x0, bi=0x0, sym_parent=<optimized out>, al=<optimized out>, hists=0x5555561d9e38) at util/hist.c:1056 #7 iter_add_single_cumulative_entry (iter=0x7fffde7fb460, al=<optimized out>) at util/hist.c:1056 #8 0x00005555558cc8a4 in hist_entry_iter__add (iter=iter@entry=0x7fffde7fb460, al=al@entry=0x7fffde7fb420, max_stack_depth=<optimized out>, arg=arg@entry=0x7fffffff7db0) at util/hist.c:1231 #9 0x00005555557cdc9a in perf_event__process_sample (machine=<optimized out>, sample=0x7fffde7fb4b0, evsel=<optimized out>, event=<optimized out>, tool=0x7fffffff7db0) at builtin-top.c:842 #10 deliver_event (qe=<optimized out>, qevent=<optimized out>) at builtin-top.c:1202 #11 0x00005555558a9318 in do_flush (show_progress=false, oe=0x7fffffff80e0) at util/ordered-events.c:244 #12 __ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP, timestamp=timestamp@entry=0) at util/ordered-events.c:323 #13 0x00005555558a9789 in __ordered_events__flush (timestamp=<optimized out>, how=<optimized out>, oe=<optimized out>) at util/ordered-events.c:339 #14 ordered_events__flush (how=OE_FLUSH__TOP, oe=0x7fffffff80e0) at util/ordered-events.c:341 #15 ordered_events__flush (oe=oe@entry=0x7fffffff80e0, how=how@entry=OE_FLUSH__TOP) at util/ordered-events.c:339 #16 0x00005555557cd631 in process_thread (arg=0x7fffffff7db0) at builtin-top.c:1114 #17 0x00007ffff7bb817a in start_thread () from /lib64/libpthread.so.0 #18 0x00007ffff5656dc3 in clone () from /lib64/libc.so.6 If you look at the frame #2, the code is: 488 if (he->srcline) { 489 he->srcline = strdup(he->srcline); 490 if (he->srcline == NULL) 491 goto err_rawdata; 492 } If he->srcline is not NULL (it is not NULL if it is uninitialized rubbish), it gets strdupped and strdupping a rubbish random string causes the problem. Also, if you look at the commit 1fb7d06, it adds the srcline property into the struct, but not initializing it everywhere needed. Committer notes: Now I see, when using --ignore-callees=do_idle we end up here at line 2189 in add_callchain_ip(): 2181 if (al.sym != NULL) { 2182 if (perf_hpp_list.parent && !*parent && 2183 symbol__match_regex(al.sym, &parent_regex)) 2184 *parent = al.sym; 2185 else if (have_ignore_callees && root_al && 2186 symbol__match_regex(al.sym, &ignore_callees_regex)) { 2187 /* Treat this symbol as the root, 2188 forgetting its callees. */ 2189 *root_al = al; 2190 callchain_cursor_reset(cursor); 2191 } 2192 } And the al that doesn't have the ->srcline field initialized will be copied to the root_al, so then, back to: 1211 int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al, 1212 int max_stack_depth, void *arg) 1213 { 1214 int err, err2; 1215 struct map *alm = NULL; 1216 1217 if (al) 1218 alm = map__get(al->map); 1219 1220 err = sample__resolve_callchain(iter->sample, &callchain_cursor, &iter->parent, 1221 iter->evsel, al, max_stack_depth); 1222 if (err) { 1223 map__put(alm); 1224 return err; 1225 } 1226 1227 err = iter->ops->prepare_entry(iter, al); 1228 if (err) 1229 goto out; 1230 1231 err = iter->ops->add_single_entry(iter, al); 1232 if (err) 1233 goto out; 1234 That al at line 1221 is what hist_entry_iter__add() (called from sample__resolve_callchain()) saw as 'root_al', and then: iter->ops->add_single_entry(iter, al); will go on with al->srcline with a bogus value, I'll add the above sequence to the cset and apply, thanks! Signed-off-by: Michael Petlan <mpetlan@redhat.com> CC: Milian Wolff <milian.wolff@kdab.com> Cc: Jiri Olsa <jolsa@redhat.com> Fixes: 1fb7d06 ("perf report Use srcline from callchain for hist entries") Link: https //lore.kernel.org/r/20210719145332.29747-1-mpetlan@redhat.com Reported-by: Juri Lelli <jlelli@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a cell has 'nbits' equal to a multiple of BITS_PER_BYTE the logic *p &= GENMASK((cell->nbits%BITS_PER_BYTE) - 1, 0); will become undefined behavior because nbits modulo BITS_PER_BYTE is 0, and we subtract one from that making a large number that is then shifted more than the number of bits that fit into an unsigned long. UBSAN reports this problem: UBSAN: shift-out-of-bounds in drivers/nvmem/core.c:1386:8 shift exponent 64 is too large for 64-bit type 'unsigned long' CPU: 6 PID: 7 Comm: kworker/u16:0 Not tainted 5.15.0-rc3+ #9 Hardware name: Google Lazor (rev3+) with KB Backlight (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace+0x0/0x170 show_stack+0x24/0x30 dump_stack_lvl+0x64/0x7c dump_stack+0x18/0x38 ubsan_epilogue+0x10/0x54 __ubsan_handle_shift_out_of_bounds+0x180/0x194 __nvmem_cell_read+0x1ec/0x21c nvmem_cell_read+0x58/0x94 nvmem_cell_read_variable_common+0x4c/0xb0 nvmem_cell_read_variable_le_u32+0x40/0x100 a6xx_gpu_init+0x170/0x2f4 adreno_bind+0x174/0x284 component_bind_all+0xf0/0x264 msm_drm_bind+0x1d8/0x7a0 try_to_bring_up_master+0x164/0x1ac __component_add+0xbc/0x13c component_add+0x20/0x2c dp_display_probe+0x340/0x384 platform_probe+0xc0/0x100 really_probe+0x110/0x304 __driver_probe_device+0xb8/0x120 driver_probe_device+0x4c/0xfc __device_attach_driver+0xb0/0x128 bus_for_each_drv+0x90/0xdc __device_attach+0xc8/0x174 device_initial_probe+0x20/0x2c bus_probe_device+0x40/0xa4 deferred_probe_work_func+0x7c/0xb8 process_one_work+0x128/0x21c process_scheduled_works+0x40/0x54 worker_thread+0x1ec/0x2a8 kthread+0x138/0x158 ret_from_fork+0x10/0x20 Fix it by making sure there are any bits to mask out. Fixes: 69aba79 ("nvmem: Add a simple NVMEM framework for consumers") Cc: Douglas Anderson <dianders@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20211013124511.18726-1-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e6e22e6 upstream. System crash was seen when I/O was run against an NVMe target and aborts were occurring. Crash stack is: -- relevant crash stack -- BUG: kernel NULL pointer dereference, address: 0000000000000010 : #6 [ffffae1f8666bdd0] page_fault at ffffffffa740122e [exception RIP: qla_nvme_abort_work+339] RIP: ffffffffc0f592e3 RSP: ffffae1f8666be80 RFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff9b581fc8af80 RCX: ffffffffc0f83bd0 RDX: 0000000000000001 RSI: ffff9b5839c6c7c8 RDI: 0000000008000000 RBP: ffff9b6832f85000 R8: ffffffffc0f68160 R9: ffffffffc0f70652 R10: ffffae1f862ffdc8 R11: 0000000000000300 R12: 000000000000010d R13: 0000000000000000 R14: ffff9b5839cea000 R15: 0ffff9b583fab170 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffae1f8666be98] process_one_work at ffffffffa6aba184 #8 [ffffae1f8666bed8] worker_thread at ffffffffa6aba39d #9 [ffffae1f8666bf10] kthread at ffffffffa6ac06ed The crash was due to a stale SRB structure access after it was aborted. Fix the issue by removing stale access. Link: https://lore.kernel.org/r/20210908164622.19240-5-njavali@marvell.com Fixes: 2cabf10 ("scsi: qla2xxx: Fix hang on NVMe command timeouts") Cc: stable@vger.kernel.org Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5ec0a6f ] Host crashes when pci_enable_atomic_ops_to_root() is called for VFs with virtual buses. The virtual buses added to SR-IOV have bus->self set to NULL and host crashes due to this. PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash" ... #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6 #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417 #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14 #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace [exception RIP: pcie_capability_read_dword+28] RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000 RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000 RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8 R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6 #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re] #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re] Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit in Device Control 2 is reserved for VFs. The PF value applies to all associated VFs. Return -EINVAL if pci_enable_atomic_ops_to_root() is called for a VF. Link: https://lore.kernel.org/r/1631354585-16597-1-git-send-email-selvin.xavier@broadcom.com Fixes: 35f5ace ("RDMA/bnxt_re: Enable global atomic ops if platform supports") Fixes: 430a236 ("PCI: Add pci_enable_atomic_ops_to_root()") Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f1797e4 ] module_alloc() first tries to allocate module text within 24 bits direct jump from kernel text, and tries a wider allocation if first one fails. When first allocation fails the following is observed in kernel logs: vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null) CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G W 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4 [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c [e2a53c10] [c00338c0] module_alloc+0xa0/0xac [e2a53c40] [c027a368] load_module+0x2ae0/0x8148 [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130 [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28 ... Add __GFP_NOWARN flag to first allocation so that no warning appears when it fails. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Fixes: 2ec13df ("powerpc/modules: Load modules closer to kernel text") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit af11dee ] ================================================================================ UBSAN: shift-out-of-bounds in arch/powerpc/mm/kasan/book3s_32.c:22:23 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.5-gentoo-PowerMacG4 #9 Call Trace: [c214be60] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable) [c214be80] [c0b99288] ubsan_epilogue+0x10/0x5c [c214be90] [c0b98fe0] __ubsan_handle_shift_out_of_bounds+0x94/0x138 [c214bf00] [c1c0f010] kasan_init_region+0xd8/0x26c [c214bf30] [c1c0ed84] kasan_init+0xc0/0x198 [c214bf70] [c1c08024] setup_arch+0x18/0x54c [c214bfc0] [c1c037f0] start_kernel+0x90/0x33c [c214bff0] [00003610] 0x3610 ================================================================================ This happens when the directly mapped memory is a power of 2. Fix it by checking the shift and set the result to 0 when shift is -1 Fixes: 7974c47 ("powerpc/32s: Implement dedicated kasan_init_region()") Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215169 Link: https://lore.kernel.org/r/15cbc3439d4ad988b225e2119ec99502a5cc6ad3.1638261744.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit df1f679 ] KeyWest i2c @0xf8001003 irq 42 /uni-n@f8000000/i2c@f8001000 BUG: key c2d00cbc has not been registered! ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:4801 lockdep_init_map_type+0x4c0/0xb4c Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.5-gentoo-PowerMacG4 #9 NIP: c01a9428 LR: c01a9428 CTR: 00000000 REGS: e1033cf0 TRAP: 0700 Not tainted (5.15.5-gentoo-PowerMacG4) MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24002002 XER: 00000000 GPR00: c01a9428 e1033db0 c2d1cf20 00000016 00000004 00000001 c01c0630 e1033a73 GPR08: 00000000 00000000 00000000 e1033db0 24002004 00000000 f8729377 00000003 GPR16: c1829a9c 00000000 18305357 c1416fc0 c1416f80 c006ac60 c2d00ca8 c1416f00 GPR24: 00000000 c21586f0 c2160000 00000000 c2d00cbc c2170000 c216e1a0 c2160000 NIP [c01a9428] lockdep_init_map_type+0x4c0/0xb4c LR [c01a9428] lockdep_init_map_type+0x4c0/0xb4c Call Trace: [e1033db0] [c01a9428] lockdep_init_map_type+0x4c0/0xb4c (unreliable) [e1033df0] [c1c177b8] kw_i2c_add+0x334/0x424 [e1033e20] [c1c18294] pmac_i2c_init+0x9ec/0xa9c [e1033e80] [c1c1a790] smp_core99_probe+0xbc/0x35c [e1033eb0] [c1c03cb0] kernel_init_freeable+0x190/0x5a4 [e1033f10] [c000946c] kernel_init+0x28/0x154 [e1033f30] [c0035148] ret_from_kernel_thread+0x14/0x1c Add missing lockdep_register_key() Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/69e4f55565bb45ebb0843977801b245af0c666fe.1638264741.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>
When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
Fix an error detected by memory sanitizer: ``` ==4033==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55fb0fbedfc7 in read_alias_info tools/perf/util/pmu.c:457:6 #1 0x55fb0fbea339 in check_info_data tools/perf/util/pmu.c:1434:2 #2 0x55fb0fbea339 in perf_pmu__check_alias tools/perf/util/pmu.c:1504:9 #3 0x55fb0fbdca85 in parse_events_add_pmu tools/perf/util/parse-events.c:1429:32 #4 0x55fb0f965230 in parse_events_parse tools/perf/util/parse-events.y:299:6 #5 0x55fb0fbdf6b2 in parse_events__scanner tools/perf/util/parse-events.c:1822:8 #6 0x55fb0fbdf8c1 in __parse_events tools/perf/util/parse-events.c:2094:8 #7 0x55fb0fa8ffa9 in parse_events tools/perf/util/parse-events.h:41:9 #8 0x55fb0fa8ffa9 in test_event tools/perf/tests/parse-events.c:2393:8 #9 0x55fb0fa8f458 in test__pmu_events tools/perf/tests/parse-events.c:2551:15 #10 0x55fb0fa6d93f in run_test tools/perf/tests/builtin-test.c:242:9 #11 0x55fb0fa6d93f in test_and_print tools/perf/tests/builtin-test.c:271:8 #12 0x55fb0fa6d082 in __cmd_test tools/perf/tests/builtin-test.c:442:5 #13 0x55fb0fa6d082 in cmd_test tools/perf/tests/builtin-test.c:564:9 #14 0x55fb0f942720 in run_builtin tools/perf/perf.c:322:11 #15 0x55fb0f942486 in handle_internal_command tools/perf/perf.c:375:8 #16 0x55fb0f941dab in run_argv tools/perf/perf.c:419:2 #17 0x55fb0f941dab in main tools/perf/perf.c:535:3 ``` Fixes: 7b723db ("perf pmu: Be lazy about loading event info files from sysfs") Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230914022425.1489035-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The following call trace shows a deadlock issue due to recursive locking of mutex "device_mutex". First lock acquire is in target_for_each_device() and second in target_free_device(). PID: 148266 TASK: ffff8be21ffb5d00 CPU: 10 COMMAND: "iscsi_ttx" #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224 #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7 #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3 #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod] #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod] #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod] #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod] #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod] #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod] #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod] #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod] #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod] #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364 Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
commit 830a2a4 upstream. If a BTI exception is taken from EL1, the entry code will treat this as an unhandled exception and will panic() the kernel. This is inconsistent with the way we handle FPAC exceptions, which have a dedicated handler and only necessarily kill the thread from which the exception was taken from, and we don't log all the information that could be relevant to debug the issue. The code in do_bti() has: BUG_ON(!user_mode(regs)); ... and it seems like the intent was to call this for EL1 BTI exceptions, as with FPAC, but this was omitted due to an oversight. This patch adds separate EL0 and EL1 BTI exception handlers, with the latter calling die() directly to report the original context the BTI exception was taken from. This matches our handling of FPAC exceptions. Prior to this patch, a BTI failure is reported as: | Unhandled 64-bit el1h sync exception on CPU0, ESR 0x0000000034000002 -- BTI | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00131-g7d937ff0221d-dirty starfive-tech#9 | Hardware name: linux,dummy-virt (DT) | pstate: 20400809 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c) | pc : test_bti_callee+0x4/0x10 | lr : test_bti_caller+0x1c/0x28 | sp : ffff80000800bdf0 | x29: ffff80000800bdf0 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffff80000a2b8000 x22: 0000000000000000 x21: 0000000000000000 | x20: ffff8000099fa5b0 x19: ffff800009ff7000 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000041a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000040000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000f83 | x5 : ffff80000a2b6000 x4 : ffff0000028d0000 x3 : ffff800009f78378 | x2 : 0000000000000000 x1 : 0000000040210000 x0 : ffff8000080257e4 | Kernel panic - not syncing: Unhandled exception | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00131-g7d937ff0221d-dirty starfive-tech#9 | Hardware name: linux,dummy-virt (DT) | Call trace: | dump_backtrace.part.0+0xcc/0xe0 | show_stack+0x18/0x5c | dump_stack_lvl+0x64/0x80 | dump_stack+0x18/0x34 | panic+0x170/0x360 | arm64_exit_nmi.isra.0+0x0/0x80 | el1h_64_sync_handler+0x64/0xd0 | el1h_64_sync+0x64/0x68 | test_bti_callee+0x4/0x10 | smp_cpus_done+0xb0/0xbc | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 With this patch applied, a BTI failure is reported as: | Internal error: Oops - BTI: 0000000034000002 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc3-00132-g0ad98265d582-dirty starfive-tech#8 | Hardware name: linux,dummy-virt (DT) | pstate: 20400809 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=-c) | pc : test_bti_callee+0x4/0x10 | lr : test_bti_caller+0x1c/0x28 | sp : ffff80000800bdf0 | x29: ffff80000800bdf0 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: ffff80000a2b8000 x22: 0000000000000000 x21: 0000000000000000 | x20: ffff8000099fa5b0 x19: ffff800009ff7000 x18: fffffbfffda37000 | x17: 3120676e696d7573 x16: 7361202c6e6f6974 x15: 0000000041a90000 | x14: 0040000000000041 x13: 0040000000000001 x12: ffff000001a90000 | x11: fffffbfffda37480 x10: 0068000000000703 x9 : 0001000040000000 | x8 : 0000000000090000 x7 : 0068000000000f03 x6 : 0060000000000f83 | x5 : ffff80000a2b6000 x4 : ffff0000028d0000 x3 : ffff800009f78378 | x2 : 0000000000000000 x1 : 0000000040210000 x0 : ffff800008025804 | Call trace: | test_bti_callee+0x4/0x10 | smp_cpus_done+0xb0/0xbc | smp_init+0x7c/0x8c | kernel_init_freeable+0x128/0x28c | kernel_init+0x28/0x13c | ret_from_fork+0x10/0x20 | Code: d50323bf d53cd040 d65f03c0 d503233f (d50323bf) Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Amit Daniel Kachhap <amit.kachhap@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220913101732.3925290-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915 |in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd |Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] |CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9 |Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014 | ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002 | 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008 | ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000 |Call Trace: | [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e | [<ffffffff81073c86>] __might_sleep+0xe6/0x150 | [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50 | [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc] | [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc] | [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc] | [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc] | [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd] | [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd] | [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0 | [<ffffffff8117f889>] SyS_write+0x49/0xb0 | [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
…f-times' Eduard Zingerman says: ==================== verify callbacks as if they are called unknown number of times This series updates verifier logic for callback functions handling. Current master simulates callback body execution exactly once, which leads to verifier not detecting unsafe programs like below: static int unsafe_on_zero_iter_cb(__u32 idx, struct num_context *ctx) { ctx->i = 0; return 0; } SEC("?raw_tp") int unsafe_on_zero_iter(void *unused) { struct num_context loop_ctx = { .i = 32 }; __u8 choice_arr[2] = { 0, 1 }; bpf_loop(100, unsafe_on_zero_iter_cb, &loop_ctx, 0); return choice_arr[loop_ctx.i]; } This was reported previously in [0]. The basic idea of the fix is to schedule callback entry state for verification in env->head until some identical, previously visited state in current DFS state traversal is found. Same logic as with open coded iterators, and builds on top recent fixes [1] for those. The series is structured as follows: - patches #1,2,3 update strobemeta, xdp_synproxy selftests and bpf_loop_bench benchmark to allow convergence of the bpf_loop callback states; - patches #4,5 just shuffle the code a bit; - patch #6 is the main part of the series; - patch #7 adds test cases for #6; - patch #8 extend patch #6 with same speculative scalar widening logic, as used for open coded iterators; - patch #9 adds test cases for #8; - patch #10 extends patch #6 to track maximal number of callback executions specifically for bpf_loop(); - patch #11 adds test cases for #10. Veristat results comparing this series to master+patches #1,2,3 using selftests show the following difference: File Program States (A) States (B) States (DIFF) ------------------------- ------------- ---------- ---------- ------------- bpf_loop_bench.bpf.o benchmark 1 2 +1 (+100.00%) pyperf600_bpf_loop.bpf.o on_event 322 407 +85 (+26.40%) strobemeta_bpf_loop.bpf.o on_event 113 151 +38 (+33.63%) xdp_synproxy_kern.bpf.o syncookie_tc 341 291 -50 (-14.66%) xdp_synproxy_kern.bpf.o syncookie_xdp 344 301 -43 (-12.50%) Veristat results comparing this series to master using Tetragon BPF files [2] also show some differences. States diff varies from +2% to +15% on 23 programs out of 186, no new failures. Changelog: - V3 [5] -> V4, changes suggested by Andrii: - validate mark_chain_precision() result in patch #10; - renaming s/cumulative_callback_depth/callback_unroll_depth/. - V2 [4] -> V3: - fixes in expected log messages for test cases: - callback_result_precise; - parent_callee_saved_reg_precise_with_callback; - parent_stack_slot_precise_with_callback; - renamings (suggested by Alexei): - s/callback_iter_depth/cumulative_callback_depth/ - s/is_callback_iter_next/calls_callback/ - s/mark_callback_iter_next/mark_calls_callback/ - prepare_func_exit() updated to exit with -EFAULT when callee->in_callback_fn is true but calls_callback() is not true for callsite; - test case 'bpf_loop_iter_limit_nested' rewritten to use return value check instead of verifier log message checks (suggested by Alexei). - V1 [3] -> V2, changes suggested by Andrii: - small changes for error handling code in __check_func_call(); - callback body processing log is now matched in relevant verifier_subprog_precision.c tests; - R1 passed to bpf_loop() is now always marked as precise; - log level 2 message for bpf_loop() iteration termination instead of iteration depth messages; - __no_msg macro removed; - bpf_loop_iter_limit_nested updated to avoid using __no_msg; - commit message for patch #3 updated according to Alexei's request. [0] https://lore.kernel.org/bpf/CA+vRuzPChFNXmouzGG+wsy=6eMcfr1mFG0F3g7rbg-sedGKW3w@mail.gmail.com/ [1] https://lore.kernel.org/bpf/20231024000917.12153-1-eddyz87@gmail.com/ [2] git@github.com:cilium/tetragon.git [3] https://lore.kernel.org/bpf/20231116021803.9982-1-eddyz87@gmail.com/T/#t [4] https://lore.kernel.org/bpf/20231118013355.7943-1-eddyz87@gmail.com/T/#t [5] https://lore.kernel.org/bpf/20231120225945.11741-1-eddyz87@gmail.com/T/#t ==================== Link: https://lore.kernel.org/r/20231121020701.26440-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
[ Upstream commit a84fbf2 ] Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX. ``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 starfive-tech#3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 starfive-tech#4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 starfive-tech#5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 starfive-tech#6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 starfive-tech#7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 starfive-tech#8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 starfive-tech#9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 starfive-tech#10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 starfive-tech#11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 starfive-tech#12 0x558265bb9389 in main tools/perf/perf.c:537:3 ``` The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed. Fixes: 8a96f45 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
When scanning namespaces, it is possible to get valid data from the first call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second call in nvme_update_ns_info_block(). In particular, if the NSID becomes inactive between the two commands, a storage device may return a buffer filled with zero as per 4.1.5.1. In this case, we can get a kernel crash due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will be set to zero. PID: 326 TASK: ffff95fec3cd8000 CPU: 29 COMMAND: "kworker/u98:10" #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926 [exception RIP: blk_stack_limits+434] RIP: ffffffff92191872 RSP: ffffad8f8702fc80 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff95efa0c91800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00000000ffffffff R8: ffff95fec7df35a8 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff95fed33c09a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core] #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core] This happened when the check for valid data was moved out of nvme_identify_ns() into one of the callers. Fix this by checking in both callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186 Fixes: 0dd6fff ("nvme: bring back auto-removal of deleted namespaces during sequential scan") Cc: stable@vger.kernel.org Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
commit d8b90d6 upstream. When scanning namespaces, it is possible to get valid data from the first call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second call in nvme_update_ns_info_block(). In particular, if the NSID becomes inactive between the two commands, a storage device may return a buffer filled with zero as per 4.1.5.1. In this case, we can get a kernel crash due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will be set to zero. PID: 326 TASK: ffff95fec3cd8000 CPU: 29 COMMAND: "kworker/u98:10" #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788 starfive-tech#3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb starfive-tech#4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce starfive-tech#5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595 starfive-tech#6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6 starfive-tech#7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926 [exception RIP: blk_stack_limits+434] RIP: ffffffff92191872 RSP: ffffad8f8702fc80 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff95efa0c91800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00000000ffffffff R8: ffff95fec7df35a8 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff95fed33c09a8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 starfive-tech#8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core] starfive-tech#9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core] This happened when the check for valid data was moved out of nvme_identify_ns() into one of the callers. Fix this by checking in both callers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186 Fixes: 0dd6fff ("nvme: bring back auto-removal of deleted namespaces during sequential scan") Cc: stable@vger.kernel.org Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f starfive-tech#3 [fffffe0000549eb8] do_nmi at ffffffff81079582 starfive-tech#4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- starfive-tech#5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b starfive-tech#6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 starfive-tech#7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 starfive-tech#8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] starfive-tech#9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] starfive-tech#10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] starfive-tech#11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] starfive-tech#12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb starfive-tech#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 starfive-tech#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 starfive-tech#15 [ffff88aa841efb88] device_del at ffffffff82179d23 starfive-tech#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] starfive-tech#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] starfive-tech#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a starfive-tech#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff starfive-tech#20 [ffff88aa841eff10] kthread at ffffffff811d87a0 starfive-tech#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f starfive-tech#3 [fffffe0000549eb8] do_nmi at ffffffff81079582 starfive-tech#4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- starfive-tech#5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b starfive-tech#6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 starfive-tech#7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 starfive-tech#8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] starfive-tech#9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] starfive-tech#10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] starfive-tech#11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] starfive-tech#12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb starfive-tech#13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 starfive-tech#14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 starfive-tech#15 [ffff88aa841efb88] device_del at ffffffff82179d23 starfive-tech#16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] starfive-tech#17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] starfive-tech#18 [ffff88aa841efde8] process_one_work at ffffffff811c589a starfive-tech#19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff starfive-tech#20 [ffff88aa841eff10] kthread at ffffffff811d87a0 starfive-tech#21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
…ux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2023-12-04 This series provides bug fixes to mlx5 driver. V1->V2: - Drop commit #9 ("net/mlx5e: Forbid devlink reload if IPSec rules are offloaded"), we are working on a better fix Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Validate @smb->WordCount to avoid reading off the end of @smb and thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs] Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328 CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 #9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? smbCalcSize+0x32/0x40 [cifs] ? smbCalcSize+0x32/0x40 [cifs] kasan_check_range+0x105/0x1b0 smbCalcSize+0x32/0x40 [cifs] checkSMB+0x162/0x370 [cifs] ? __pfx_checkSMB+0x10/0x10 [cifs] cifs_handle_standard+0xbc/0x2f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_demultiplex_thread+0xed1/0x1360 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This fixes CVE-2023-6606. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
[ Upstream commit b35858b ] Validate @smb->WordCount to avoid reading off the end of @smb and thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs] Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328 CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 starfive-tech#9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? smbCalcSize+0x32/0x40 [cifs] ? smbCalcSize+0x32/0x40 [cifs] kasan_check_range+0x105/0x1b0 smbCalcSize+0x32/0x40 [cifs] checkSMB+0x162/0x370 [cifs] ? __pfx_checkSMB+0x10/0x10 [cifs] cifs_handle_standard+0xbc/0x2f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_demultiplex_thread+0xed1/0x1360 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This fixes CVE-2023-6606. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit b35858b upstream. Validate @smb->WordCount to avoid reading off the end of @smb and thus causing the following KASAN splat: BUG: KASAN: slab-out-of-bounds in smbCalcSize+0x32/0x40 [cifs] Read of size 2 at addr ffff88801c024ec5 by task cifsd/1328 CPU: 1 PID: 1328 Comm: cifsd Not tainted 6.7.0-rc5 starfive-tech#9 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4a/0x80 print_report+0xcf/0x650 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __phys_addr+0x46/0x90 kasan_report+0xd8/0x110 ? smbCalcSize+0x32/0x40 [cifs] ? smbCalcSize+0x32/0x40 [cifs] kasan_check_range+0x105/0x1b0 smbCalcSize+0x32/0x40 [cifs] checkSMB+0x162/0x370 [cifs] ? __pfx_checkSMB+0x10/0x10 [cifs] cifs_handle_standard+0xbc/0x2f0 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 cifs_demultiplex_thread+0xed1/0x1360 [cifs] ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] ? srso_alias_return_thunk+0x5/0xfbef5 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? __pfx_lock_release+0x10/0x10 ? srso_alias_return_thunk+0x5/0xfbef5 ? mark_held_locks+0x1a/0x90 ? lockdep_hardirqs_on_prepare+0x136/0x210 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kthread_parkme+0xce/0xf0 ? __pfx_cifs_demultiplex_thread+0x10/0x10 [cifs] kthread+0x18d/0x1d0 ? kthread+0xdb/0x1d0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x34/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This fixes CVE-2023-6606. Reported-by: j51569436@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218218 Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fc3a553 ] An issue occurred while reading an ELF file in libbpf.c during fuzzing: Program received signal SIGSEGV, Segmentation fault. 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 4206 in libbpf.c (gdb) bt #0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 #1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706 #2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437 starfive-tech#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497 starfive-tech#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16 starfive-tech#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one () starfive-tech#6 0x000000000087ad92 in tracing::span::Span::in_scope () starfive-tech#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir () starfive-tech#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} () starfive-tech#9 0x00000000005f2601 in main () (gdb) scn_data was null at this code(tools/lib/bpf/src/libbpf.c): if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) { The scn_data is derived from the code above: scn = elf_sec_by_idx(obj, sec_idx); scn_data = elf_sec_data(obj, scn); relo_sec_name = elf_sec_str(obj, shdr->sh_name); sec_name = elf_sec_name(obj, scn); if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL return -EINVAL; In certain special scenarios, such as reading a malformed ELF file, it is possible that scn_data may be a null pointer Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com> Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Changye Wu <wuchangye@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc3a553 ] An issue occurred while reading an ELF file in libbpf.c during fuzzing: Program received signal SIGSEGV, Segmentation fault. 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 4206 in libbpf.c (gdb) bt #0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 #1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706 #2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437 starfive-tech#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497 starfive-tech#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16 starfive-tech#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one () starfive-tech#6 0x000000000087ad92 in tracing::span::Span::in_scope () starfive-tech#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir () starfive-tech#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} () starfive-tech#9 0x00000000005f2601 in main () (gdb) scn_data was null at this code(tools/lib/bpf/src/libbpf.c): if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) { The scn_data is derived from the code above: scn = elf_sec_by_idx(obj, sec_idx); scn_data = elf_sec_data(obj, scn); relo_sec_name = elf_sec_str(obj, shdr->sh_name); sec_name = elf_sec_name(obj, scn); if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL return -EINVAL; In certain special scenarios, such as reading a malformed ELF file, it is possible that scn_data may be a null pointer Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com> Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Changye Wu <wuchangye@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
…nel range [ Upstream commit e2ceb1d ] Because of what seems to be a typo, a 6Ghz-only phy for which the BDF does not allow the 7115Mhz channel will fail to register: WARNING: CPU: 2 PID: 106 at net/wireless/core.c:907 wiphy_register+0x914/0x954 Modules linked in: ath11k_pci sbsa_gwdt CPU: 2 PID: 106 Comm: kworker/u8:5 Not tainted 6.3.0-rc7-next-20230418-00549-g1e096a17625a-dirty starfive-tech#9 Hardware name: Freebox V7R Board (DT) Workqueue: ath11k_qmi_driver_event ath11k_qmi_driver_event_work pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : wiphy_register+0x914/0x954 lr : ieee80211_register_hw+0x67c/0xc10 sp : ffffff800b123aa0 x29: ffffff800b123aa0 x28: 0000000000000000 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000006 x24: ffffffc008d51418 x23: ffffffc008cb0838 x22: ffffff80176c2460 x21: 0000000000000168 x20: ffffff80176c0000 x19: ffffff80176c03e0 x18: 0000000000000014 x17: 00000000cbef338c x16: 00000000d2a26f21 x15: 00000000ad6bb85f x14: 0000000000000020 x13: 0000000000000020 x12: 00000000ffffffbd x11: 0000000000000208 x10: 00000000fffffdf7 x9 : ffffffc009394718 x8 : ffffff80176c0528 x7 : 000000007fffffff x6 : 0000000000000006 x5 : 0000000000000005 x4 : ffffff800b304284 x3 : ffffff800b304284 x2 : ffffff800b304d98 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: wiphy_register+0x914/0x954 ieee80211_register_hw+0x67c/0xc10 ath11k_mac_register+0x7c4/0xe10 ath11k_core_qmi_firmware_ready+0x1f4/0x570 ath11k_qmi_driver_event_work+0x198/0x590 process_one_work+0x1b8/0x328 worker_thread+0x6c/0x414 kthread+0x100/0x104 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]--- ath11k_pci 0002:01:00.0: ieee80211 registration failed: -22 ath11k_pci 0002:01:00.0: failed register the radio with mac80211: -22 ath11k_pci 0002:01:00.0: failed to create pdev core: -22 Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20230421145445.2612280-1-mbizon@freebox.fr Signed-off-by: Sasha Levin <sashal@kernel.org>
…lblished(). [ Upstream commit 66b60b0 ] syzkaller reported a warning [0] in inet_csk_destroy_sock() with no repro. WARN_ON(inet_sk(sk)->inet_num && !inet_csk(sk)->icsk_bind_hash); However, the syzkaller's log hinted that connect() failed just before the warning due to FAULT_INJECTION. [1] When connect() is called for an unbound socket, we search for an available ephemeral port. If a bhash bucket exists for the port, we call __inet_check_established() or __inet6_check_established() to check if the bucket is reusable. If reusable, we add the socket into ehash and set inet_sk(sk)->inet_num. Later, we look up the corresponding bhash2 bucket and try to allocate it if it does not exist. Although it rarely occurs in real use, if the allocation fails, we must revert the changes by check_established(). Otherwise, an unconnected socket could illegally occupy an ehash entry. Note that we do not put tw back into ehash because sk might have already responded to a packet for tw and it would be better to free tw earlier under such memory presure. [0]: WARNING: CPU: 0 PID: 350830 at net/ipv4/inet_connection_sock.c:1193 inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) Code: 41 5c 41 5d 41 5e e9 2d 4a 3d fd e8 28 4a 3d fd 48 89 ef e8 f0 cd 7d ff 5b 5d 41 5c 41 5d 41 5e e9 13 4a 3d fd e8 0e 4a 3d fd <0f> 0b e9 61 fe ff ff e8 02 4a 3d fd 4c 89 e7 be 03 00 00 00 e8 05 RSP: 0018:ffffc9000b21fd38 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000009e78 RCX: ffffffff840bae40 RDX: ffff88806e46c600 RSI: ffffffff840bb012 RDI: ffff88811755cca8 RBP: ffff88811755c880 R08: 0000000000000003 R09: 0000000000000000 R10: 0000000000009e78 R11: 0000000000000000 R12: ffff88811755c8e0 R13: ffff88811755c892 R14: ffff88811755c918 R15: 0000000000000000 FS: 00007f03e5243800(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32f21000 CR3: 0000000112ffe001 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> ? inet_csk_destroy_sock (net/ipv4/inet_connection_sock.c:1193) dccp_close (net/dccp/proto.c:1078) inet_release (net/ipv4/af_inet.c:434) __sock_release (net/socket.c:660) sock_close (net/socket.c:1423) __fput (fs/file_table.c:377) __fput_sync (fs/file_table.c:462) __x64_sys_close (fs/open.c:1557 fs/open.c:1539 fs/open.c:1539) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e53852bb Code: 03 00 00 00 0f 05 48 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 43 c9 f5 ff 8b 7c 24 0c 41 89 c0 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 35 44 89 c7 89 44 24 0c e8 a1 c9 f5 ff 8b 44 RSP: 002b:00000000005dfba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f03e53852bb RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000167c R10: 0000000008a79680 R11: 0000000000000293 R12: 00007f03e4e43000 R13: 00007f03e4e43170 R14: 00007f03e4e43178 R15: 00007f03e4e43170 </TASK> [1]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 0 PID: 350833 Comm: syz-executor.1 Not tainted 6.7.0-12272-g2121c43f88f5 starfive-tech#9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3748) kmem_cache_alloc (mm/slub.c:3763 mm/slub.c:3842 mm/slub.c:3867) inet_bind2_bucket_create (net/ipv4/inet_hashtables.c:135) __inet_hash_connect (net/ipv4/inet_hashtables.c:1100) dccp_v4_connect (net/dccp/ipv4.c:116) __inet_stream_connect (net/ipv4/af_inet.c:676) inet_stream_connect (net/ipv4/af_inet.c:747) __sys_connect_file (net/socket.c:2048 (discriminator 2)) __sys_connect (net/socket.c:2065) __x64_sys_connect (net/socket.c:2072) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f03e5284e5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007f03e4641cc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007f03e5284e5d RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000000b R14: 00007f03e52e5530 R15: 0000000000000000 </TASK> Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: 28044fc ("net: Add a bhash2 table hashed by port and address") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5e6e080 ] Since flush bios are implemented as writes with no data and the preflush flag per Christoph's comment [1]. And we need to change it in rnbd accordingly. Otherwise, I got splatting when create fs from rnbd client. [ 464.028545] ------------[ cut here ]------------ [ 464.028553] WARNING: CPU: 0 PID: 65 at block/blk-core.c:751 submit_bio_noacct+0x32c/0x5d0 [ ... ] [ 464.028668] CPU: 0 PID: 65 Comm: kworker/0:1H Tainted: G OE 6.4.0-rc1 starfive-tech#9 [ 464.028671] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 [ 464.028673] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [ 464.028717] RIP: 0010:submit_bio_noacct+0x32c/0x5d0 [ 464.028720] Code: 03 0f 85 51 fe ff ff 48 8b 43 18 8b 88 04 03 00 00 85 c9 0f 85 3f fe ff ff e9 be fd ff ff 0f b6 d0 3c 0d 74 26 83 fa 01 74 21 <0f> 0b b8 0a 00 00 00 e9 56 fd ff ff 4c 89 e7 e8 70 a1 03 00 84 c0 [ 464.028722] RSP: 0018:ffffaf3680b57c68 EFLAGS: 00010202 [ 464.028724] RAX: 0000000000060802 RBX: ffffa09dcc18bf00 RCX: 0000000000000000 [ 464.028726] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffffa09dde081d00 [ 464.028727] RBP: ffffaf3680b57c98 R08: ffffa09dde081d00 R09: ffffa09e38327200 [ 464.028729] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa09dde081d00 [ 464.028730] R13: ffffa09dcb06e1e8 R14: 0000000000000000 R15: 0000000000200000 [ 464.028733] FS: 0000000000000000(0000) GS:ffffa09e3bc00000(0000) knlGS:0000000000000000 [ 464.028735] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 464.028736] CR2: 000055a4e8206c40 CR3: 0000000119f06000 CR4: 00000000003506f0 [ 464.028738] Call Trace: [ 464.028740] <TASK> [ 464.028746] submit_bio+0x1b/0x80 [ 464.028748] rnbd_srv_rdma_ev+0x50d/0x10c0 [rnbd_server] [ 464.028754] ? percpu_ref_get_many.constprop.0+0x55/0x140 [rtrs_server] [ 464.028760] ? __this_cpu_preempt_check+0x13/0x20 [ 464.028769] process_io_req+0x1dc/0x450 [rtrs_server] [ 464.028775] rtrs_srv_inv_rkey_done+0x67/0xb0 [rtrs_server] [ 464.028780] __ib_process_cq+0xbc/0x1f0 [ib_core] [ 464.028793] ib_cq_poll_work+0x2b/0xa0 [ib_core] [ 464.028804] process_one_work+0x2a9/0x580 [1]. https://lore.kernel.org/all/ZFHgefWofVt24tRl@infradead.org/ Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230512034631.28686-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 4be9075 upstream. The driver creates /sys/kernel/debug/dri/0/mob_ttm even when the corresponding ttm_resource_manager is not allocated. This leads to a crash when trying to read from this file. Add a check to create mob_ttm, system_mob_ttm, and gmr_ttm debug file only when the corresponding ttm_resource_manager is allocated. crash> bt PID: 3133409 TASK: ffff8fe4834a5000 CPU: 3 COMMAND: "grep" #0 [ffffb954506b3b20] machine_kexec at ffffffffb2a6bec3 #1 [ffffb954506b3b78] __crash_kexec at ffffffffb2bb598a #2 [ffffb954506b3c38] crash_kexec at ffffffffb2bb68c1 starfive-tech#3 [ffffb954506b3c50] oops_end at ffffffffb2a2a9b1 starfive-tech#4 [ffffb954506b3c70] no_context at ffffffffb2a7e913 starfive-tech#5 [ffffb954506b3cc8] __bad_area_nosemaphore at ffffffffb2a7ec8c starfive-tech#6 [ffffb954506b3d10] do_page_fault at ffffffffb2a7f887 starfive-tech#7 [ffffb954506b3d40] page_fault at ffffffffb360116e [exception RIP: ttm_resource_manager_debug+0x11] RIP: ffffffffc04afd11 RSP: ffffb954506b3df0 RFLAGS: 00010246 RAX: ffff8fe41a6d1200 RBX: 0000000000000000 RCX: 0000000000000940 RDX: 0000000000000000 RSI: ffffffffc04b4338 RDI: 0000000000000000 RBP: ffffb954506b3e08 R8: ffff8fee3ffad000 R9: 0000000000000000 R10: ffff8fe41a76a000 R11: 0000000000000001 R12: 00000000ffffffff R13: 0000000000000001 R14: ffff8fe5bb6f3900 R15: ffff8fe41a6d1200 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 starfive-tech#8 [ffffb954506b3e00] ttm_resource_manager_show at ffffffffc04afde7 [ttm] starfive-tech#9 [ffffb954506b3e30] seq_read at ffffffffb2d8f9f3 RIP: 00007f4c4eda8985 RSP: 00007ffdbba9e9f8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000000037e000 RCX: 00007f4c4eda8985 RDX: 000000000037e000 RSI: 00007f4c41573000 RDI: 0000000000000003 RBP: 000000000037e000 R8: 0000000000000000 R9: 000000000037fe30 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4c41573000 R13: 0000000000000003 R14: 00007f4c41572010 R15: 0000000000000003 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Fixes: af4a25b ("drm/vmwgfx: Add debugfs entries for various ttm resource managers") Cc: <stable@vger.kernel.org> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240312093551.196609-1-jfalempe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e starfive-tech#3 [fffffe00003fced0] do_nmi at ffffffff8922660d starfive-tech#4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 starfive-tech#5 [ffffa655314979e8] io_serial_in at ffffffff89792594 starfive-tech#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 starfive-tech#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 starfive-tech#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 starfive-tech#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 starfive-tech#10 [ffffa65531497ac8] console_unlock at ffffffff89316124 starfive-tech#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 starfive-tech#12 [ffffa65531497b68] printk at ffffffff89318306 starfive-tech#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 starfive-tech#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] starfive-tech#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] starfive-tech#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] starfive-tech#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] starfive-tech#18 [ffffa65531497f10] kthread at ffffffff892d2e72 starfive-tech#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f8bbc07 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e starfive-tech#3 [fffffe00003fced0] do_nmi at ffffffff8922660d starfive-tech#4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 starfive-tech#5 [ffffa655314979e8] io_serial_in at ffffffff89792594 starfive-tech#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 starfive-tech#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 starfive-tech#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 starfive-tech#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 starfive-tech#10 [ffffa65531497ac8] console_unlock at ffffffff89316124 starfive-tech#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 starfive-tech#12 [ffffa65531497b68] printk at ffffffff89318306 starfive-tech#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 starfive-tech#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] starfive-tech#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] starfive-tech#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] starfive-tech#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] starfive-tech#18 [ffffa65531497f10] kthread at ffffffff892d2e72 starfive-tech#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 #1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) #2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) starfive-tech#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 starfive-tech#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 starfive-tech#5 0x55c94045c776 in ui_browser__show ui/browser.c:288 starfive-tech#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 starfive-tech#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 starfive-tech#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 starfive-tech#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 starfive-tech#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 starfive-tech#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 starfive-tech#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 starfive-tech#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 starfive-tech#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 starfive-tech#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 starfive-tech#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 starfive-tech#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 starfive-tech#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 starfive-tech#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 starfive-tech#20 0x55c9400e53ad in main tools/perf/perf.c:561 starfive-tech#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 starfive-tech#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 starfive-tech#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 769e6a1 ] ui_browser__show() is capturing the input title that is stack allocated memory in hist_browser__run(). Avoid a use after return by strdup-ing the string. Committer notes: Further explanation from Ian Rogers: My command line using tui is: $ sudo bash -c 'rm /tmp/asan.log*; export ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a sleep 1; /tmp/perf/perf mem report' I then go to the perf annotate view and quit. This triggers the asan error (from the log file): ``` ==1254591==ERROR: AddressSanitizer: stack-use-after-return on address 0x7f2813331920 at pc 0x7f28180 65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10 READ of size 80 at 0x7f2813331920 thread T0 #0 0x7f2818065990 in __interceptor_strlen ../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461 starfive-tech#1 0x7f2817698251 in SLsmg_write_wrapped_string (/lib/x86_64-linux-gnu/libslang.so.2+0x98251) starfive-tech#2 0x7f28176984b9 in SLsmg_write_nstring (/lib/x86_64-linux-gnu/libslang.so.2+0x984b9) starfive-tech#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60 starfive-tech#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266 starfive-tech#5 0x55c94045c776 in ui_browser__show ui/browser.c:288 starfive-tech#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206 starfive-tech#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458 starfive-tech#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412 starfive-tech#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527 starfive-tech#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613 starfive-tech#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661 starfive-tech#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671 starfive-tech#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141 starfive-tech#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805 starfive-tech#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374 starfive-tech#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516 starfive-tech#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350 starfive-tech#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403 starfive-tech#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447 starfive-tech#20 0x55c9400e53ad in main tools/perf/perf.c:561 starfive-tech#21 0x7f28170456c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 starfive-tech#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360 starfive-tech#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId: 84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93) Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746 This frame has 1 object(s): [32, 192) 'title' (line 747) <== Memory access at offset 32 is inside this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork ``` hist_browser__run isn't on the stack so the asan error looks legit. There's no clean init/exit on struct ui_browser so I may be trading a use-after-return for a memory leak, but that seems look a good trade anyway. Fixes: 05e8b08 ("perf ui browser: Stop using 'self'") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
…PLES event" commit 5b3cde1 upstream. This reverts commit 7d1405c. This causes segfaults in some cases, as reported by Milian: ``` sudo /usr/bin/perf record -z --call-graph dwarf -e cycles -e raw_syscalls:sys_enter ls ... [ perf record: Woken up 3 times to write data ] malloc(): invalid next size (unsorted) Aborted ``` Backtrace with GDB + debuginfod: ``` malloc(): invalid next size (unsorted) Thread 1 "perf" received signal SIGABRT, Aborted. __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c 44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; (gdb) bt #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 starfive-tech#1 0x00007ffff6ea8eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78 starfive-tech#2 0x00007ffff6e50a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/ raise.c:26 starfive-tech#3 0x00007ffff6e384c3 in __GI_abort () at abort.c:79 starfive-tech#4 0x00007ffff6e39354 in __libc_message_impl (fmt=fmt@entry=0x7ffff6fc22ea "%s\n") at ../sysdeps/posix/libc_fatal.c:132 starfive-tech#5 0x00007ffff6eb3085 in malloc_printerr (str=str@entry=0x7ffff6fc5850 "malloc(): invalid next size (unsorted)") at malloc.c:5772 starfive-tech#6 0x00007ffff6eb657c in _int_malloc (av=av@entry=0x7ffff6ff6ac0 <main_arena>, bytes=bytes@entry=368) at malloc.c:4081 starfive-tech#7 0x00007ffff6eb877e in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3754 starfive-tech#8 0x000055555569bdb6 in perf_session.do_write_header () starfive-tech#9 0x00005555555a373a in __cmd_record.constprop.0 () starfive-tech#10 0x00005555555a6846 in cmd_record () starfive-tech#11 0x000055555564db7f in run_builtin () starfive-tech#12 0x000055555558ed77 in main () ``` Valgrind memcheck: ``` ==45136== Invalid write of size 8 ==45136== at 0x2B38A5: perf_event__synthesize_id_sample (in /usr/bin/perf) ==45136== by 0x157069: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ==45136== Syscall param write(buf) points to unaddressable byte(s) ==45136== at 0x575953D: __libc_write (write.c:26) ==45136== by 0x575953D: write (write.c:24) ==45136== by 0x35761F: ion (in /usr/bin/perf) ==45136== by 0x357778: writen (in /usr/bin/perf) ==45136== by 0x1548F7: record__write (in /usr/bin/perf) ==45136== by 0x15708A: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== Address 0x6a866a8 is 0 bytes after a block of size 40 alloc'd ==45136== at 0x4849BF3: calloc (vg_replace_malloc.c:1675) ==45136== by 0x3574AB: zalloc (in /usr/bin/perf) ==45136== by 0x1570E0: __cmd_record.constprop.0 (in /usr/bin/perf) ==45136== by 0x15A845: cmd_record (in /usr/bin/perf) ==45136== by 0x201B7E: run_builtin (in /usr/bin/perf) ==45136== by 0x142D76: main (in /usr/bin/perf) ==45136== ----- Closes: https://lore.kernel.org/linux-perf-users/23879991.0LEYPuXRzz@milian-workstation/ Reported-by: Milian Wolff <milian.wolff@kdab.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org # 6.8+ Link: https://lore.kernel.org/lkml/Zl9ksOlHJHnKM70p@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [starfive-tech#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 starfive-tech#6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) starfive-tech#1 btrfs_drop_extents (fs/btrfs/file.c:411:4) starfive-tech#2 log_one_extent (fs/btrfs/tree-log.c:4732:9) starfive-tech#3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) starfive-tech#4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) starfive-tech#5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) starfive-tech#6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) starfive-tech#7 btrfs_sync_file (fs/btrfs/file.c:1933:8) starfive-tech#8 vfs_fsync_range (fs/sync.c:188:9) starfive-tech#9 vfs_fsync (fs/sync.c:202:9) starfive-tech#10 do_fsync (fs/sync.c:212:9) starfive-tech#11 __do_sys_fdatasync (fs/sync.c:225:9) starfive-tech#12 __se_sys_fdatasync (fs/sync.c:223:1) starfive-tech#13 __x64_sys_fdatasync (fs/sync.c:223:1) starfive-tech#14 do_syscall_x64 (arch/x86/entry/common.c:52:14) starfive-tech#15 do_syscall_64 (arch/x86/entry/common.c:83:7) starfive-tech#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 starfive-tech#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] starfive-tech#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] starfive-tech#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] starfive-tech#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] starfive-tech#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] starfive-tech#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] starfive-tech#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] starfive-tech#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] starfive-tech#11 dio_complete at ffffffff8c2b9fa7 starfive-tech#12 do_blockdev_direct_IO at ffffffff8c2bc09f starfive-tech#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] starfive-tech#14 generic_file_direct_write at ffffffff8c1dcf14 starfive-tech#15 __generic_file_write_iter at ffffffff8c1dd07b starfive-tech#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] starfive-tech#17 aio_write at ffffffff8c2cc72e starfive-tech#18 kmem_cache_alloc at ffffffff8c248dde starfive-tech#19 do_io_submit at ffffffff8c2ccada starfive-tech#20 do_syscall_64 at ffffffff8c004984 starfive-tech#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9d274c1 upstream. We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 starfive-tech#6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) starfive-tech#3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) starfive-tech#4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) starfive-tech#5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) starfive-tech#6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) starfive-tech#7 btrfs_sync_file (fs/btrfs/file.c:1933:8) starfive-tech#8 vfs_fsync_range (fs/sync.c:188:9) starfive-tech#9 vfs_fsync (fs/sync.c:202:9) starfive-tech#10 do_fsync (fs/sync.c:212:9) starfive-tech#11 __do_sys_fdatasync (fs/sync.c:225:9) starfive-tech#12 __se_sys_fdatasync (fs/sync.c:223:1) starfive-tech#13 __x64_sys_fdatasync (fs/sync.c:223:1) starfive-tech#14 do_syscall_x64 (arch/x86/entry/common.c:52:14) starfive-tech#15 do_syscall_64 (arch/x86/entry/common.c:83:7) starfive-tech#16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be346c1 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 starfive-tech#3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] starfive-tech#4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] starfive-tech#5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] starfive-tech#6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] starfive-tech#7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] starfive-tech#8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] starfive-tech#9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] starfive-tech#10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] starfive-tech#11 dio_complete at ffffffff8c2b9fa7 starfive-tech#12 do_blockdev_direct_IO at ffffffff8c2bc09f starfive-tech#13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] starfive-tech#14 generic_file_direct_write at ffffffff8c1dcf14 starfive-tech#15 __generic_file_write_iter at ffffffff8c1dd07b starfive-tech#16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] starfive-tech#17 aio_write at ffffffff8c2cc72e starfive-tech#18 kmem_cache_alloc at ffffffff8c248dde starfive-tech#19 do_io_submit at ffffffff8c2ccada starfive-tech#20 do_syscall_64 at ffffffff8c004984 starfive-tech#21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
@tekkamanninja please reorganize this repository so that:
starfive-tech/linux/master
from torvalds/linux.git/masterstarfive-tech/linux/Fedora
withstarfive-tech/linux/fedora-vic-7100_5.10.6
and make it defaultstarfive-tech/linux/esmil-starlight
from @esmil branch https://github.com/esmil/linux/tree/starlightThe text was updated successfully, but these errors were encountered: