-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel.dmesg_restrict
not working with journalctl
#6
Comments
By default, if your user is part of the You may check the groups allowed to read the system journal (and thus
|
…_map [ Upstream commit 39df730 ] Detected via gcc's ASan: Direct leak of 2048 byte(s) in 64 object(s) allocated from: 6 #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370) 7 #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43 8 #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85 9 #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250 10 #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382 11 #5 0x556b0f0e3231 in print_events util/parse-events.c:2514 12 #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 13 #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 14 #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 15 #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 16 #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520 17 #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct") Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54569ba ] Detected with gcc's ASan: Direct leak of 66 byte(s) in 5 object(s) allocated from: #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x560c8761034d in collect_config util/config.c:597 #2 0x560c8760d9cb in get_value util/config.c:169 #3 0x560c8760dfd7 in perf_parse_file util/config.c:285 #4 0x560c8760e0d2 in perf_config_from_file util/config.c:476 #5 0x560c876108fd in perf_config_set__init util/config.c:661 #6 0x560c87610c72 in perf_config_set__new util/config.c:709 #7 0x560c87610d2f in perf_config__init util/config.c:718 #8 0x560c87610e5d in perf_config util/config.c:730 #9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442 #10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Taeung Song <treeze.taeung@gmail.com> Fixes: 20105ca ("perf config: Introduce perf_config_set class") Link: http://lkml.kernel.org/r/20190316080556.3075-6-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bde851 ] Detected with gcc's ASan: Direct leak of 4356 byte(s) in 120 object(s) allocated from: #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215 #2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339 #3 0x55719af66272 in print_events util/parse-events.c:2542 #4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 #5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520 #9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 40218da ("perf list: Show SDT and pre-cached events") Link: http://lkml.kernel.org/r/20190316080556.3075-7-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42dfa45 ] Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e5330a5e in zalloc util/util.h:23 #2 0x5625e5330a9b in perf_counts__new util/counts.c:10 #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #7 0x5625e51528e6 in run_test tests/builtin-test.c:358 #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #10 0x5625e515572f in cmd_test tests/builtin-test.c:722 #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e532560d in zalloc util/util.h:23 #2 0x5625e532566b in xyarray__new util/xyarray.c:10 #3 0x5625e5330aba in perf_counts__new util/counts.c:15 #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #8 0x5625e51528e6 in run_test tests/builtin-test.c:358 #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #11 0x5625e515572f in cmd_test tests/builtin-test.c:722 #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
…_event_on_all_cpus test [ Upstream commit 93faa52 ] ================================================================= ==7497==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45 #2 0x5625e5326703 in cpu_map__read util/cpumap.c:103 #3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120 #4 0x5625e5326915 in cpu_map__new util/cpumap.c:135 #5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36 #6 0x5625e51528e6 in run_test tests/builtin-test.c:358 #7 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #9 0x5625e515572f in cmd_test tests/builtin-test.c:722 #10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object") Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f97a899 ] ================================================================= ==7506==ERROR: LeakSanitizer: detected memory leaks Direct leak of 13 byte(s) in 3 object(s) allocated from: #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x5625e53aaef0 in expr__find_other util/expr.y:221 #2 0x5625e51bcd3f in test__expr tests/expr.c:52 #3 0x5625e51528e6 in run_test tests/builtin-test.c:358 #4 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #6 0x5625e515572f in cmd_test tests/builtin-test.c:722 #7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON") Link: http://lkml.kernel.org/r/20190316080556.3075-16-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d982b33 ] ================================================================= ==20875==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1160 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x55bd50005599 in zalloc util/util.h:23 #2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327 #3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216 #4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69 #5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358 #6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388 #7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583 #8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722 #9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520 #13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 19 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c34a838 ] Clients may submit a new requests from the completion callback context. The driver was not prepared to receive a request in this state because it already held the request queue lock and a recursive lock error is triggered. Now all completions are queued up until we are ready to drop the queue lock and then delivered. The fault was triggered by TCP over an IPsec connection in the LTP test suite: LTP: starting tcp4_ipsec02 (tcp_ipsec.sh -p ah -m transport -s "100 1000 65535") BUG: spinlock recursion on CPU#1, genload/943 lock: 0xbf3c3094, .magic: dead4ead, .owner: genload/943, .owner_cpu: 1 CPU: 1 PID: 943 Comm: genload Tainted: G O 4.9.62-axis5-devel #6 Hardware name: Axis ARTPEC-6 Platform (unwind_backtrace) from [<8010d134>] (show_stack+0x18/0x1c) (show_stack) from [<803a289c>] (dump_stack+0x84/0x98) (dump_stack) from [<8016e164>] (do_raw_spin_lock+0x124/0x128) (do_raw_spin_lock) from [<804de1a4>] (artpec6_crypto_submit+0x2c/0xa0) (artpec6_crypto_submit) from [<804def38>] (artpec6_crypto_prepare_submit_hash+0xd0/0x54c) (artpec6_crypto_prepare_submit_hash) from [<7f3165f0>] (ah_output+0x2a4/0x3dc [ah4]) (ah_output [ah4]) from [<805df9bc>] (xfrm_output_resume+0x178/0x4a4) (xfrm_output_resume) from [<805d283c>] (xfrm4_output+0xac/0xbc) (xfrm4_output) from [<80587928>] (ip_queue_xmit+0x140/0x3b4) (ip_queue_xmit) from [<805a13b4>] (tcp_transmit_skb+0x4c4/0x95c) (tcp_transmit_skb) from [<8059f218>] (tcp_rcv_state_process+0xdf4/0xdfc) (tcp_rcv_state_process) from [<805a7530>] (tcp_v4_do_rcv+0x64/0x1ac) (tcp_v4_do_rcv) from [<805a9724>] (tcp_v4_rcv+0xa34/0xb74) (tcp_v4_rcv) from [<80581d34>] (ip_local_deliver_finish+0x78/0x2b0) (ip_local_deliver_finish) from [<8058259c>] (ip_local_deliver+0xe4/0x104) (ip_local_deliver) from [<805d23ec>] (xfrm4_transport_finish+0xf4/0x144) (xfrm4_transport_finish) from [<805df564>] (xfrm_input+0x4f4/0x74c) (xfrm_input) from [<804de420>] (artpec6_crypto_task+0x208/0x38c) (artpec6_crypto_task) from [<801271b0>] (tasklet_action+0x60/0xec) (tasklet_action) from [<801266d4>] (__do_softirq+0xcc/0x3a4) (__do_softirq) from [<80126d20>] (irq_exit+0xf4/0x15c) (irq_exit) from [<801741e8>] (__handle_domain_irq+0x68/0xbc) (__handle_domain_irq) from [<801014f0>] (gic_handle_irq+0x50/0x94) (gic_handle_irq) from [<80657370>] (__irq_usr+0x50/0x80) Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
…_map [ Upstream commit 39df730 ] Detected via gcc's ASan: Direct leak of 2048 byte(s) in 64 object(s) allocated from: 6 #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370) 7 #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43 8 #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85 9 #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250 10 #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382 11 #5 0x556b0f0e3231 in print_events util/parse-events.c:2514 12 #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 13 #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 14 #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 15 #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 16 #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520 17 #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 8989605 ("perf tools: Do not put a variable sized type not at the end of a struct") Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 54569ba ] Detected with gcc's ASan: Direct leak of 66 byte(s) in 5 object(s) allocated from: #0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x560c8761034d in collect_config util/config.c:597 #2 0x560c8760d9cb in get_value util/config.c:169 #3 0x560c8760dfd7 in perf_parse_file util/config.c:285 #4 0x560c8760e0d2 in perf_config_from_file util/config.c:476 #5 0x560c876108fd in perf_config_set__init util/config.c:661 #6 0x560c87610c72 in perf_config_set__new util/config.c:709 #7 0x560c87610d2f in perf_config__init util/config.c:718 #8 0x560c87610e5d in perf_config util/config.c:730 #9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442 #10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Taeung Song <treeze.taeung@gmail.com> Fixes: 20105ca ("perf config: Introduce perf_config_set class") Link: http://lkml.kernel.org/r/20190316080556.3075-6-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8bde851 ] Detected with gcc's ASan: Direct leak of 4356 byte(s) in 120 object(s) allocated from: #0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215 #2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339 #3 0x55719af66272 in print_events util/parse-events.c:2542 #4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58 #5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520 #9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 40218da ("perf list: Show SDT and pre-cached events") Link: http://lkml.kernel.org/r/20190316080556.3075-7-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 42dfa45 ] Using gcc's ASan, Changbin reports: ================================================================= ==7494==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e5330a5e in zalloc util/util.h:23 #2 0x5625e5330a9b in perf_counts__new util/counts.c:10 #3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #7 0x5625e51528e6 in run_test tests/builtin-test.c:358 #8 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #10 0x5625e515572f in cmd_test tests/builtin-test.c:722 #11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 72 byte(s) in 1 object(s) allocated from: #0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x5625e532560d in zalloc util/util.h:23 #2 0x5625e532566b in xyarray__new util/xyarray.c:10 #3 0x5625e5330aba in perf_counts__new util/counts.c:15 #4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47 #5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505 #6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347 #7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47 #8 0x5625e51528e6 in run_test tests/builtin-test.c:358 #9 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #11 0x5625e515572f in cmd_test tests/builtin-test.c:722 #12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) His patch took care of evsel->prev_raw_counts, but the above backtraces are about evsel->counts, so fix that instead. Reported-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
…_event_on_all_cpus test [ Upstream commit 93faa52 ] ================================================================= ==7497==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f0333a88f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x5625e5326213 in cpu_map__trim_new util/cpumap.c:45 #2 0x5625e5326703 in cpu_map__read util/cpumap.c:103 #3 0x5625e53267ef in cpu_map__read_all_cpu_map util/cpumap.c:120 #4 0x5625e5326915 in cpu_map__new util/cpumap.c:135 #5 0x5625e517b355 in test__openat_syscall_event_on_all_cpus tests/openat-syscall-all-cpus.c:36 #6 0x5625e51528e6 in run_test tests/builtin-test.c:358 #7 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #8 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #9 0x5625e515572f in cmd_test tests/builtin-test.c:722 #10 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #11 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #12 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #13 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #14 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: f30a79b ("perf tools: Add reference counting for cpu_map object") Link: http://lkml.kernel.org/r/20190316080556.3075-15-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f97a899 ] ================================================================= ==7506==ERROR: LeakSanitizer: detected memory leaks Direct leak of 13 byte(s) in 3 object(s) allocated from: #0 0x7f03339d6070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070) #1 0x5625e53aaef0 in expr__find_other util/expr.y:221 #2 0x5625e51bcd3f in test__expr tests/expr.c:52 #3 0x5625e51528e6 in run_test tests/builtin-test.c:358 #4 0x5625e5152baf in test_and_print tests/builtin-test.c:388 #5 0x5625e51543fe in __cmd_test tests/builtin-test.c:583 #6 0x5625e515572f in cmd_test tests/builtin-test.c:722 #7 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #8 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #9 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #10 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520 #11 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Signed-off-by: Changbin Du <changbin.du@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 0751673 ("perf tools: Add a simple expression parser for JSON") Link: http://lkml.kernel.org/r/20190316080556.3075-16-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d982b33 ] ================================================================= ==20875==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1160 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138) #1 0x55bd50005599 in zalloc util/util.h:23 #2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327 #3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216 #4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69 #5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358 #6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388 #7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583 #8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722 #9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302 #10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354 #11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398 #12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520 #13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) Indirect leak of 19 byte(s) in 1 object(s) allocated from: #0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30) #1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f) Signed-off-by: Changbin Du <changbin.du@gmail.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Fixes: 6a6cd11 ("perf test: Add test for the sched tracepoint format fields") Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c34a838 ] Clients may submit a new requests from the completion callback context. The driver was not prepared to receive a request in this state because it already held the request queue lock and a recursive lock error is triggered. Now all completions are queued up until we are ready to drop the queue lock and then delivered. The fault was triggered by TCP over an IPsec connection in the LTP test suite: LTP: starting tcp4_ipsec02 (tcp_ipsec.sh -p ah -m transport -s "100 1000 65535") BUG: spinlock recursion on CPU#1, genload/943 lock: 0xbf3c3094, .magic: dead4ead, .owner: genload/943, .owner_cpu: 1 CPU: 1 PID: 943 Comm: genload Tainted: G O 4.9.62-axis5-devel #6 Hardware name: Axis ARTPEC-6 Platform (unwind_backtrace) from [<8010d134>] (show_stack+0x18/0x1c) (show_stack) from [<803a289c>] (dump_stack+0x84/0x98) (dump_stack) from [<8016e164>] (do_raw_spin_lock+0x124/0x128) (do_raw_spin_lock) from [<804de1a4>] (artpec6_crypto_submit+0x2c/0xa0) (artpec6_crypto_submit) from [<804def38>] (artpec6_crypto_prepare_submit_hash+0xd0/0x54c) (artpec6_crypto_prepare_submit_hash) from [<7f3165f0>] (ah_output+0x2a4/0x3dc [ah4]) (ah_output [ah4]) from [<805df9bc>] (xfrm_output_resume+0x178/0x4a4) (xfrm_output_resume) from [<805d283c>] (xfrm4_output+0xac/0xbc) (xfrm4_output) from [<80587928>] (ip_queue_xmit+0x140/0x3b4) (ip_queue_xmit) from [<805a13b4>] (tcp_transmit_skb+0x4c4/0x95c) (tcp_transmit_skb) from [<8059f218>] (tcp_rcv_state_process+0xdf4/0xdfc) (tcp_rcv_state_process) from [<805a7530>] (tcp_v4_do_rcv+0x64/0x1ac) (tcp_v4_do_rcv) from [<805a9724>] (tcp_v4_rcv+0xa34/0xb74) (tcp_v4_rcv) from [<80581d34>] (ip_local_deliver_finish+0x78/0x2b0) (ip_local_deliver_finish) from [<8058259c>] (ip_local_deliver+0xe4/0x104) (ip_local_deliver) from [<805d23ec>] (xfrm4_transport_finish+0xf4/0x144) (xfrm4_transport_finish) from [<805df564>] (xfrm_input+0x4f4/0x74c) (xfrm_input) from [<804de420>] (artpec6_crypto_task+0x208/0x38c) (artpec6_crypto_task) from [<801271b0>] (tasklet_action+0x60/0xec) (tasklet_action) from [<801266d4>] (__do_softirq+0xcc/0x3a4) (__do_softirq) from [<80126d20>] (irq_exit+0xf4/0x15c) (irq_exit) from [<801741e8>] (__handle_domain_irq+0x68/0xbc) (__handle_domain_irq) from [<801014f0>] (gic_handle_irq+0x50/0x94) (gic_handle_irq) from [<80657370>] (__irq_usr+0x50/0x80) Signed-off-by: Lars Persson <larper@axis.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
@evict were all your questions and concerns ruled out? |
[ Upstream commit b9abbdf ] By calling maps__insert() we assume to get 2 references on the map, which we relese within maps__remove call. However if there's already same map name, we currently don't bump the reference and can crash, like: Program received signal SIGABRT, Aborted. 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6 #1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6 #2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6 #3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131 #5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148 #6 map__put (map=0x1224df0) at util/map.c:299 #7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953 #8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959 #9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65 #10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728 #11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741 #12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362 #13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243 #14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322 #15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8, ... Add the map to the list and getting the reference event if we find the map with same name. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Fixes: 1e62856 ("perf symbols: Fix slowness due to -ffunction-section") Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit b1c17a9 ] Various things in eBPF really require us to disable preemption before running an eBPF program. syzbot reported : BUG: assuming atomic context at net/core/flow_dissector.c:737 in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3 2 locks held by syz-executor.3/24710: #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850 #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822 CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 __cant_sleep kernel/sched/core.c:6165 [inline] __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142 bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737 __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853 skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline] skb_probe_transport_header include/linux/skbuff.h:2500 [inline] skb_probe_transport_header include/linux/skbuff.h:2493 [inline] tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 __do_sys_writev fs/read_write.c:1131 [inline] __se_sys_writev fs/read_write.c:1128 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128 do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: d58e468 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Petar Penkov <ppenkov@google.com> Cc: Stanislav Fomichev <sdf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4b732a upstream. There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well. This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done & and results in the following oops, [ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4b732a upstream. There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well. This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done & and results in the following oops, [ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…and crashes commit 62d54f3 upstream. Send operates on read only trees and expects them to never change while it is using them. This is part of its initial design, and this expection is due to two different reasons: 1) When it was introduced, no operations were allowed to modifiy read-only subvolumes/snapshots (including defrag for example). 2) It keeps send from having an impact on other filesystem operations. Namely send does not need to keep locks on the trees nor needs to hold on to transaction handles and delay transaction commits. This ends up being a consequence of the former reason. However the deduplication feature was introduced later (on September 2013, while send was introduced in July 2012) and it allowed for deduplication with destination files that belong to read-only trees (subvolumes and snapshots). That means that having a send operation (either full or incremental) running in parallel with a deduplication that has the destination inode in one of the trees used by the send operation, can result in tree nodes and leaves getting freed and reused while send is using them. This problem is similar to the problem solved for the root nodes getting freed and reused when a snapshot is made against one tree that is currenly being used by a send operation, fixed in commits [1] and [2]. These commits explain in detail how the problem happens and the explanation is valid for any node or leaf that is not the root of a tree as well. This problem was also discussed and explained recently in a thread [3]. The problem is very easy to reproduce when using send with large trees (snapshots) and just a few concurrent deduplication operations that target files in the trees used by send. A stress test case is being sent for fstests that triggers the issue easily. The most common error to hit is the send ioctl return -EIO with the following messages in dmesg/syslog: [1631617.204075] BTRFS error (device sdc): did not find backref in send_root. inode=63292, offset=0, disk_byte=5228134400 found extent=5228134400 [1631633.251754] BTRFS error (device sdc): parent transid verify failed on 32243712 wanted 24 found 27 The first one is very easy to hit while the second one happens much less frequently, except for very large trees (in that test case, snapshots with 100000 files having large xattrs to get deep and wide trees). Less frequently, at least one BUG_ON can be hit: [1631742.130080] ------------[ cut here ]------------ [1631742.130625] kernel BUG at fs/btrfs/ctree.c:1806! [1631742.131188] invalid opcode: 0000 [#6] SMP DEBUG_PAGEALLOC PTI [1631742.131726] CPU: 1 PID: 13394 Comm: btrfs Tainted: G B D W 5.0.0-rc8-btrfs-next-45 #1 [1631742.132265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [1631742.133399] RIP: 0010:read_node_slot+0x122/0x130 [btrfs] (...) [1631742.135061] RSP: 0018:ffffb530021ebaa0 EFLAGS: 00010246 [1631742.135615] RAX: ffff93ac8912e000 RBX: 000000000000009d RCX: 0000000000000002 [1631742.136173] RDX: 000000000000009d RSI: ffff93ac564b0d08 RDI: ffff93ad5b48c000 [1631742.136759] RBP: ffffb530021ebb7d R08: 0000000000000001 R09: ffffb530021ebb7d [1631742.137324] R10: ffffb530021eba70 R11: 0000000000000000 R12: ffff93ac87d0a708 [1631742.137900] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 [1631742.138455] FS: 00007f4cdb1528c0(0000) GS:ffff93ad76a80000(0000) knlGS:0000000000000000 [1631742.139010] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1631742.139568] CR2: 00007f5acb3d0420 CR3: 000000012be3e006 CR4: 00000000003606e0 [1631742.140131] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1631742.140719] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1631742.141272] Call Trace: [1631742.141826] ? do_raw_spin_unlock+0x49/0xc0 [1631742.142390] tree_advance+0x173/0x1d0 [btrfs] [1631742.142948] btrfs_compare_trees+0x268/0x690 [btrfs] [1631742.143533] ? process_extent+0x1070/0x1070 [btrfs] [1631742.144088] btrfs_ioctl_send+0x1037/0x1270 [btrfs] [1631742.144645] _btrfs_ioctl_send+0x80/0x110 [btrfs] [1631742.145161] ? trace_sched_stick_numa+0xe0/0xe0 [1631742.145685] btrfs_ioctl+0x13fe/0x3120 [btrfs] [1631742.146179] ? account_entity_enqueue+0xd3/0x100 [1631742.146662] ? reweight_entity+0x154/0x1a0 [1631742.147135] ? update_curr+0x20/0x2a0 [1631742.147593] ? check_preempt_wakeup+0x103/0x250 [1631742.148053] ? do_vfs_ioctl+0xa2/0x6f0 [1631742.148510] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [1631742.148942] do_vfs_ioctl+0xa2/0x6f0 [1631742.149361] ? __fget+0x113/0x200 [1631742.149767] ksys_ioctl+0x70/0x80 [1631742.150159] __x64_sys_ioctl+0x16/0x20 [1631742.150543] do_syscall_64+0x60/0x1b0 [1631742.150931] entry_SYSCALL_64_after_hwframe+0x49/0xbe [1631742.151326] RIP: 0033:0x7f4cd9f5add7 (...) [1631742.152509] RSP: 002b:00007ffe91017708 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [1631742.152892] RAX: ffffffffffffffda RBX: 0000000000000105 RCX: 00007f4cd9f5add7 [1631742.153268] RDX: 00007ffe91017790 RSI: 0000000040489426 RDI: 0000000000000007 [1631742.153633] RBP: 0000000000000007 R08: 00007f4cd9e79700 R09: 00007f4cd9e79700 [1631742.153999] R10: 00007f4cd9e799d0 R11: 0000000000000202 R12: 0000000000000003 [1631742.154365] R13: 0000555dfae53020 R14: 0000000000000000 R15: 0000000000000001 (...) [1631742.156696] ---[ end trace 5dac9f96dcc3fd6b ]--- That BUG_ON happens because while send is using a node, that node is COWed by a concurrent deduplication, gets freed and gets reused as a leaf (because a transaction commit happened in between), so when it attempts to read a slot from the extent buffer, at ctree.c:read_node_slot(), the extent buffer contents were wiped out and it now matches a leaf (which can even belong to some other tree now), hitting the BUG_ON(level == 0). Fix this concurrency issue by not allowing send and deduplication to run in parallel if both operate on the same readonly trees, returning EAGAIN to user space and logging an exlicit warning in dmesg/syslog. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be6821f82c3cc36e026f5afd10249988852b35ea [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2 [3] https://lore.kernel.org/linux-btrfs/CAL3q7H7iqSEEyFaEtpRZw3cp613y+4k2Q8b4W7mweR3tZA05bQ@mail.gmail.com/ CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4b732a upstream. There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well. This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done & and results in the following oops, [ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yes they were, thank you! Also: is there any way I can donate money to the project? I really appreciate the work you've put into it. |
[ Upstream commit ff612ba ] We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a #4 [ffffc90003adb9c0] do_trap at ffffffff81019114 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f80c5da ] This commit makes the kernel not send the next queued HCI command until a command complete arrives for the last HCI command sent to the controller. This change avoids a problem with some buggy controllers (seen on two SKUs of QCA9377) that send an extra command complete event for the previous command after the kernel had already sent a new HCI command to the controller. The problem was reproduced when starting an active scanning procedure, where an extra command complete event arrives for the LE_SET_RANDOM_ADDR command. When this happends the kernel ends up not processing the command complete for the following commmand, LE_SET_SCAN_PARAM, and ultimately behaving as if a passive scanning procedure was being performed, when in fact controller is performing an active scanning procedure. This makes it impossible to discover BLE devices as no device found events are sent to userspace. This problem is reproducible on 100% of the attempts on the affected controllers. The extra command complete event can be seen at timestamp 27.420131 on the btmon logs bellow. Bluetooth monitor ver 5.50 = Note: Linux version 5.0.0+ (x86_64) 0.352340 = Note: Bluetooth subsystem version 2.22 0.352343 = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344 = Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345 = Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346 @ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347 @ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366 @ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164 @ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310 Address type: 0x06 LE Public LE Random < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #1 [hci0] 27.302496 Address: 15:60:F2:91:B2:24 (Non-Resolvable) > HCI Event: Command Complete (0x0e) plen 4 #2 [hci0] 27.419117 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #3 [hci0] 27.419244 Type: Active (0x01) Interval: 11.250 msec (0x0012) Window: 11.250 msec (0x0012) Own address type: Random (0x01) Filter policy: Accept all advertisement (0x00) > HCI Event: Command Complete (0x0e) plen 4 #4 [hci0] 27.420131 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #5 [hci0] 27.420259 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01) > HCI Event: Command Complete (0x0e) plen 4 #6 [hci0] 27.420969 LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) > HCI Event: Command Complete (0x0e) plen 4 #7 [hci0] 27.421983 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) @ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059 Start Discovery (0x0023) plen 1 Status: Success (0x00) Address type: 0x06 LE Public LE Random @ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff612ba ] We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a #4 [ffffc90003adb9c0] do_trap at ffffffff81019114 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f80c5da ] This commit makes the kernel not send the next queued HCI command until a command complete arrives for the last HCI command sent to the controller. This change avoids a problem with some buggy controllers (seen on two SKUs of QCA9377) that send an extra command complete event for the previous command after the kernel had already sent a new HCI command to the controller. The problem was reproduced when starting an active scanning procedure, where an extra command complete event arrives for the LE_SET_RANDOM_ADDR command. When this happends the kernel ends up not processing the command complete for the following commmand, LE_SET_SCAN_PARAM, and ultimately behaving as if a passive scanning procedure was being performed, when in fact controller is performing an active scanning procedure. This makes it impossible to discover BLE devices as no device found events are sent to userspace. This problem is reproducible on 100% of the attempts on the affected controllers. The extra command complete event can be seen at timestamp 27.420131 on the btmon logs bellow. Bluetooth monitor ver 5.50 = Note: Linux version 5.0.0+ (x86_64) 0.352340 = Note: Bluetooth subsystem version 2.22 0.352343 = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344 = Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345 = Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346 @ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347 @ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366 @ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164 @ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310 Address type: 0x06 LE Public LE Random < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #1 [hci0] 27.302496 Address: 15:60:F2:91:B2:24 (Non-Resolvable) > HCI Event: Command Complete (0x0e) plen 4 #2 [hci0] 27.419117 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #3 [hci0] 27.419244 Type: Active (0x01) Interval: 11.250 msec (0x0012) Window: 11.250 msec (0x0012) Own address type: Random (0x01) Filter policy: Accept all advertisement (0x00) > HCI Event: Command Complete (0x0e) plen 4 #4 [hci0] 27.420131 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #5 [hci0] 27.420259 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01) > HCI Event: Command Complete (0x0e) plen 4 #6 [hci0] 27.420969 LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) > HCI Event: Command Complete (0x0e) plen 4 #7 [hci0] 27.421983 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) @ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059 Start Discovery (0x0023) plen 1 Status: Success (0x00) Address type: 0x06 LE Public LE Random @ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ff612ba ] We've been seeing the following sporadically throughout our fleet panic: kernel BUG at fs/btrfs/relocation.c:4584! netversion: 5.0-0 Backtrace: #0 [ffffc90003adb880] machine_kexec at ffffffff81041da8 #1 [ffffc90003adb8c8] __crash_kexec at ffffffff8110396c #2 [ffffc90003adb988] crash_kexec at ffffffff811048ad #3 [ffffc90003adb9a0] oops_end at ffffffff8101c19a #4 [ffffc90003adb9c0] do_trap at ffffffff81019114 #5 [ffffc90003adba00] do_error_trap at ffffffff810195d0 #6 [ffffc90003adbab0] invalid_op at ffffffff81a00a9b [exception RIP: btrfs_reloc_cow_block+692] RIP: ffffffff8143b614 RSP: ffffc90003adbb68 RFLAGS: 00010246 RAX: fffffffffffffff7 RBX: ffff8806b9c32000 RCX: ffff8806aad00690 RDX: ffff880850b295e0 RSI: ffff8806b9c32000 RDI: ffff88084f205bd0 RBP: ffff880849415000 R8: ffffc90003adbbe0 R9: ffff88085ac90000 R10: ffff8805f7369140 R11: 0000000000000000 R12: ffff880850b295e0 R13: ffff88084f205bd0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffc90003adbbb0] __btrfs_cow_block at ffffffff813bf1cd #8 [ffffc90003adbc28] btrfs_cow_block at ffffffff813bf4b3 #9 [ffffc90003adbc78] btrfs_search_slot at ffffffff813c2e6c The way relocation moves data extents is by creating a reloc inode and preallocating extents in this inode and then copying the data into these preallocated extents. Once we've done this for all of our extents, we'll write out these dirty pages, which marks the extent written, and goes into btrfs_reloc_cow_block(). From here we get our current reloc_control, which _should_ match the reloc_control for the current block group we're relocating. However if we get an ENOSPC in this path at some point we'll bail out, never initiating writeback on this inode. Not a huge deal, unless we happen to be doing relocation on a different block group, and this block group is now rc->stage == UPDATE_DATA_PTRS. This trips the BUG_ON() in btrfs_reloc_cow_block(), because we expect to be done modifying the data inode. We are in fact done modifying the metadata for the data inode we're currently using, but not the one from the failed block group, and thus we BUG_ON(). (This happens when writeback finishes for extents from the previous group, when we are at btrfs_finish_ordered_io() which updates the data reloc tree (inode item, drops/adds extent items, etc).) Fix this by writing out the reloc data inode always, and then breaking out of the loop after that point to keep from tripping this BUG_ON() later. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> [ add note from Filipe ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a848c29 ] On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit abaf1e0 upstream. While debugging a segfault on 'perf lock contention' without an available perf.data file I noticed that it was basically calling: perf_session__delete(ERR_PTR(-1)) Resulting in: (gdb) run lock contention Starting program: /root/bin/perf lock contention [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". failed to open perf.data: No such file or directory (try 'perf record' first) Initializing perf session failed Program received signal SIGSEGV, Segmentation fault. 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 2858 if (!session->auxtrace) (gdb) p session $1 = (struct perf_session *) 0xffffffffffffffff (gdb) bt #0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858 #1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300 #2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161 #3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604 #4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322 #5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375 #6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419 #7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535 (gdb) So just set it to NULL after using PTR_ERR(session) to decode the error as perf_session__delete(NULL) is supported. Fixes: eef4fee ("perf lock: Dynamically allocate lockhash_table") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Ross Zwisler <zwisler@chromium.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/lkml/ZN4R1AYfsD2J8lRs@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 18ad4df ] 1) initial state, three tasks: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | Λ | Λ | Λ | | | | | | V | V | V | bfqq1 bfqq2 bfqq3 process ref: 1 1 1 2) bfqq1 merged to bfqq2: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | | | Λ \--------------\| | | V V | bfqq1--------->bfqq2 bfqq3 process ref: 0 2 1 3) bfqq2 merged to bfqq3: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) here -> Λ | | \--------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3 process ref: 0 1 3 In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then get bfqq3 through merge chain, and finially handle IO by bfqq3. Howerver, current code will think bfqq2 is owned by BIC1, like initial state, and set bfqq2->bic to BIC1. bfq_insert_request -> by Process 1 bfqq = bfq_init_rq(rq) bfqq = bfq_get_bfqq_handle_split bfqq = bic_to_bfqq -> get bfqq2 from BIC1 bfqq->ref++ rq->elv.priv[0] = bic rq->elv.priv[1] = bfqq if (bfqq_process_refs(bfqq) == 1) bfqq->bic = bic -> record BIC1 to bfqq2 __bfq_insert_request new_bfqq = bfq_setup_cooperator -> get bfqq3 from bfqq2->new_bfqq bfqq_request_freed(bfqq) new_bfqq->ref++ rq->elv.priv[1] = new_bfqq -> handle IO by bfqq3 Fix the problem by checking bfqq is from merge chain fist. And this might fix a following problem reported by our syzkaller(unreproducible): ================================================================== BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595 CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G L 6.6.0-07439-gba2303cacfda #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_requeue_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x10d/0x610 mm/kasan/report.c:475 kasan_report+0x8e/0xc0 mm/kasan/report.c:588 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757 bfq_init_rq block/bfq-iosched.c:6876 [inline] bfq_insert_request block/bfq-iosched.c:6254 [inline] bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 </TASK> Allocated by task 20776: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:188 [inline] slab_post_alloc_hook mm/slab.h:763 [inline] slab_alloc_node mm/slub.c:3458 [inline] kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503 ioc_create_icq block/blk-ioc.c:370 [inline] ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517 blk_mq_get_new_requests block/blk-mq.c:2940 [inline] blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042 __submit_bio+0x331/0x6f0 block/blk-core.c:624 __submit_bio_noacct_mq block/blk-core.c:703 [inline] submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Freed by task 946: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1815 [inline] slab_free_freelist_hook mm/slub.c:1841 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0x118/0x6f0 mm/slub.c:3808 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Second to last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 The buggy address belongs to the object at ffff888123839d68 which belongs to the cache bfq_io_cq of size 1360 The buggy address is located 336 bytes inside of freed 1360-byte region [ffff888123839d68, ffff88812383a2b8) The buggy address belongs to the physical page: page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838 head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488 raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 36eca89 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 44d1745 upstream. Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores). CPU0 CPU1 CPU2 1 lock(&kvm->slots_lock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpu_hotplug_lock); 5 lock(kvm_lock); 6 lock(&kvm->slots_lock); 7 lock(cpu_hotplug_lock); 8 sync(&kvm->srcu); Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with __kvmclock_cpufreq_notifier(): cpuhp_cpufreq_online() | -> cpufreq_online() | -> cpufreq_gov_performance_limits() | -> __cpufreq_driver_target() | -> __target_index() | -> cpufreq_freq_transition_begin() | -> cpufreq_notify_transition() | -> ... __kvmclock_cpufreq_notifier() But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpu_hotplug_lock) would be even more unusual. The most robust solution to the general cpu_hotplug_lock issue is likely to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvm_lock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvm_lock and walking vm_list. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O ------------------------------------------------------ tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] but task is already holding lock: ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kvm_lock){+.+.}-{3:3}: __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 kvm_dev_ioctl+0x4fb/0xe50 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2e/0xb0 static_key_slow_inc+0x16/0x30 kvm_lapic_set_base+0x6a/0x1c0 [kvm] kvm_set_apic_base+0x8f/0xe0 [kvm] kvm_set_msr_common+0x9ae/0xf80 [kvm] vmx_set_msr+0xa54/0xbe0 [kvm_intel] __kvm_set_msr+0xb6/0x1a0 [kvm] kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] kvm_vcpu_ioctl+0x485/0x5b0 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&kvm->srcu){.+.+}-{0:0}: __synchronize_srcu+0x44/0x1a0 synchronize_srcu_expedited+0x21/0x30 kvm_swap_active_memslots+0x110/0x1c0 [kvm] kvm_set_memslot+0x360/0x620 [kvm] __kvm_set_memory_region+0x27b/0x300 [kvm] kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm] kvm_vm_ioctl+0x295/0x650 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&kvm->slots_lock){+.+.}-{3:3}: __lock_acquire+0x15ef/0x2e30 lock_acquire+0xe0/0x260 __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 set_nx_huge_pages+0x179/0x1e0 [kvm] param_attr_store+0x93/0x100 module_attr_store+0x22/0x40 sysfs_kf_write+0x81/0xb0 kernfs_fop_write_iter+0x133/0x1d0 vfs_write+0x28d/0x380 ksys_write+0x70/0xe0 __x64_sys_write+0x1f/0x30 x64_sys_call+0x281b/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: Chao Gao <chao.gao@intel.com> Fixes: 0bf5049 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac01c8c upstream. AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with 5c24b67 ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <yunzhao@cloudflare.com> Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com> Cc: Ian Rogers <irogers@google.com> Cc: kernel-team@cloudflare.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org # v5.13+ Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…tion to perf_sched__replay() [ Upstream commit c690786 ] The start_work_mutex and work_done_wait_mutex are used only for the 'perf sched replay'. Put their initialization in perf_sched__replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.197 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 14.952 MB perf.data (134165 samples) ] # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 65658 nsecs the run test took 999991 nsecs the sleep test took 1079324 nsecs nr_run_events: 42378 nr_sleep_events: 43102 nr_wakeup_events: 31852 target-less wakeups: 17 multi-target wakeups: 712 task 0 ( swapper: 0), nr_events: 10451 task 1 ( swapper: 1), nr_events: 3 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 717 ( sched-messaging: 74483), nr_events: 152 task 718 ( sched-messaging: 74484), nr_events: 1944 task 719 ( sched-messaging: 74485), nr_events: 73 task 720 ( sched-messaging: 74486), nr_events: 163 task 721 ( sched-messaging: 74487), nr_events: 942 task 722 ( sched-messaging: 74488), nr_events: 78 task 723 ( sched-messaging: 74489), nr_events: 1090 ------------------------------------------------------------ #1 : 1366.507, ravg: 1366.51, cpu: 7682.70 / 7682.70 #2 : 1410.072, ravg: 1370.86, cpu: 7723.88 / 7686.82 #3 : 1396.296, ravg: 1373.41, cpu: 7568.20 / 7674.96 #4 : 1381.019, ravg: 1374.17, cpu: 7531.81 / 7660.64 #5 : 1393.826, ravg: 1376.13, cpu: 7725.25 / 7667.11 #6 : 1401.581, ravg: 1378.68, cpu: 7594.82 / 7659.88 #7 : 1381.337, ravg: 1378.94, cpu: 7371.22 / 7631.01 #8 : 1373.842, ravg: 1378.43, cpu: 7894.92 / 7657.40 #9 : 1364.697, ravg: 1377.06, cpu: 7324.91 / 7624.15 #10 : 1363.613, ravg: 1375.72, cpu: 7209.55 / 7582.69 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-2-yangjihong1@huawei.com Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string") Signed-off-by: Sasha Levin <sashal@kernel.org>
…f_sched__{lat|map|replay}() [ Upstream commit bd2cdf2 ] The curr_pid and cpu_last_switched are used only for the 'perf sched replay/latency/map'. Put their initialization in perf_sched__{lat|map|replay () to reduce unnecessary actions in other commands. Simple functional testing: # perf sched record perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.209 [sec] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 16.456 MB perf.data (147907 samples) ] # perf sched lat ------------------------------------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end | ------------------------------------------------------------------------------------------------------------------------------------------- sched-messaging:(401) | 2990.699 ms | 38705 | avg: 0.661 ms | max: 67.046 ms | max start: 456532.624830 s | max end: 456532.691876 s qemu-system-x86:(7) | 179.764 ms | 2191 | avg: 0.152 ms | max: 21.857 ms | max start: 456532.576434 s | max end: 456532.598291 s sshd:48125 | 0.522 ms | 2 | avg: 0.037 ms | max: 0.046 ms | max start: 456532.514610 s | max end: 456532.514656 s <SNIP> ksoftirqd/11:82 | 0.063 ms | 1 | avg: 0.005 ms | max: 0.005 ms | max start: 456532.769366 s | max end: 456532.769371 s kworker/9:0-mm_:34624 | 0.233 ms | 20 | avg: 0.004 ms | max: 0.007 ms | max start: 456532.690804 s | max end: 456532.690812 s migration/13:93 | 0.000 ms | 1 | avg: 0.004 ms | max: 0.004 ms | max start: 456532.512669 s | max end: 456532.512674 s ----------------------------------------------------------------------------------------------------------------- TOTAL: | 3180.750 ms | 41368 | --------------------------------------------------- # echo $? 0 # perf sched map *A0 456532.510141 secs A0 => migration/0:15 *. 456532.510171 secs . => swapper:0 . *B0 456532.510261 secs B0 => migration/1:21 . *. 456532.510279 secs <SNIP> L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . . 456532.785979 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . . 456532.786054 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . . 456532.786127 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 . 456532.786197 secs L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 L7 *L7 456532.786270 secs # echo $? 0 # perf sched replay run measurement overhead: 108 nsecs sleep measurement overhead: 66473 nsecs the run test took 1000002 nsecs the sleep test took 1082686 nsecs nr_run_events: 49334 nr_sleep_events: 50054 nr_wakeup_events: 34701 target-less wakeups: 165 multi-target wakeups: 766 task 0 ( swapper: 0), nr_events: 15419 task 1 ( swapper: 1), nr_events: 1 task 2 ( swapper: 2), nr_events: 1 <SNIP> task 715 ( sched-messaging: 110248), nr_events: 1438 task 716 ( sched-messaging: 110249), nr_events: 512 task 717 ( sched-messaging: 110250), nr_events: 500 task 718 ( sched-messaging: 110251), nr_events: 537 task 719 ( sched-messaging: 110252), nr_events: 823 ------------------------------------------------------------ #1 : 1325.288, ravg: 1325.29, cpu: 7823.35 / 7823.35 #2 : 1363.606, ravg: 1329.12, cpu: 7655.53 / 7806.56 #3 : 1349.494, ravg: 1331.16, cpu: 7544.80 / 7780.39 #4 : 1311.488, ravg: 1329.19, cpu: 7495.13 / 7751.86 #5 : 1309.902, ravg: 1327.26, cpu: 7266.65 / 7703.34 #6 : 1309.535, ravg: 1325.49, cpu: 7843.86 / 7717.39 #7 : 1316.482, ravg: 1324.59, cpu: 7854.41 / 7731.09 #8 : 1366.604, ravg: 1328.79, cpu: 7955.81 / 7753.57 #9 : 1326.286, ravg: 1328.54, cpu: 7466.86 / 7724.90 #10 : 1356.653, ravg: 1331.35, cpu: 7566.60 / 7709.07 # echo $? 0 Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240206083228.172607-5-yangjihong1@huawei.com Stable-dep-of: 1a5efc9 ("libsubcmd: Don't free the usage string") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a848c29 ] On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18ad4df ] 1) initial state, three tasks: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | Λ | Λ | Λ | | | | | | V | V | V | bfqq1 bfqq2 bfqq3 process ref: 1 1 1 2) bfqq1 merged to bfqq2: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | | | Λ \--------------\| | | V V | bfqq1--------->bfqq2 bfqq3 process ref: 0 2 1 3) bfqq2 merged to bfqq3: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) here -> Λ | | \--------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3 process ref: 0 1 3 In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then get bfqq3 through merge chain, and finially handle IO by bfqq3. Howerver, current code will think bfqq2 is owned by BIC1, like initial state, and set bfqq2->bic to BIC1. bfq_insert_request -> by Process 1 bfqq = bfq_init_rq(rq) bfqq = bfq_get_bfqq_handle_split bfqq = bic_to_bfqq -> get bfqq2 from BIC1 bfqq->ref++ rq->elv.priv[0] = bic rq->elv.priv[1] = bfqq if (bfqq_process_refs(bfqq) == 1) bfqq->bic = bic -> record BIC1 to bfqq2 __bfq_insert_request new_bfqq = bfq_setup_cooperator -> get bfqq3 from bfqq2->new_bfqq bfqq_request_freed(bfqq) new_bfqq->ref++ rq->elv.priv[1] = new_bfqq -> handle IO by bfqq3 Fix the problem by checking bfqq is from merge chain fist. And this might fix a following problem reported by our syzkaller(unreproducible): ================================================================== BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595 CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G L 6.6.0-07439-gba2303cacfda #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_requeue_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x10d/0x610 mm/kasan/report.c:475 kasan_report+0x8e/0xc0 mm/kasan/report.c:588 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757 bfq_init_rq block/bfq-iosched.c:6876 [inline] bfq_insert_request block/bfq-iosched.c:6254 [inline] bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 </TASK> Allocated by task 20776: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:188 [inline] slab_post_alloc_hook mm/slab.h:763 [inline] slab_alloc_node mm/slub.c:3458 [inline] kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503 ioc_create_icq block/blk-ioc.c:370 [inline] ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517 blk_mq_get_new_requests block/blk-mq.c:2940 [inline] blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042 __submit_bio+0x331/0x6f0 block/blk-core.c:624 __submit_bio_noacct_mq block/blk-core.c:703 [inline] submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Freed by task 946: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1815 [inline] slab_free_freelist_hook mm/slub.c:1841 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0x118/0x6f0 mm/slub.c:3808 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Second to last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 The buggy address belongs to the object at ffff888123839d68 which belongs to the cache bfq_io_cq of size 1360 The buggy address is located 336 bytes inside of freed 1360-byte region [ffff888123839d68, ffff88812383a2b8) The buggy address belongs to the physical page: page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838 head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488 raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 36eca89 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 44d1745 upstream. Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores). CPU0 CPU1 CPU2 1 lock(&kvm->slots_lock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpu_hotplug_lock); 5 lock(kvm_lock); 6 lock(&kvm->slots_lock); 7 lock(cpu_hotplug_lock); 8 sync(&kvm->srcu); Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with __kvmclock_cpufreq_notifier(): cpuhp_cpufreq_online() | -> cpufreq_online() | -> cpufreq_gov_performance_limits() | -> __cpufreq_driver_target() | -> __target_index() | -> cpufreq_freq_transition_begin() | -> cpufreq_notify_transition() | -> ... __kvmclock_cpufreq_notifier() But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpu_hotplug_lock) would be even more unusual. The most robust solution to the general cpu_hotplug_lock issue is likely to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvm_lock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvm_lock and walking vm_list. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O ------------------------------------------------------ tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] but task is already holding lock: ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kvm_lock){+.+.}-{3:3}: __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 kvm_dev_ioctl+0x4fb/0xe50 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2e/0xb0 static_key_slow_inc+0x16/0x30 kvm_lapic_set_base+0x6a/0x1c0 [kvm] kvm_set_apic_base+0x8f/0xe0 [kvm] kvm_set_msr_common+0x9ae/0xf80 [kvm] vmx_set_msr+0xa54/0xbe0 [kvm_intel] __kvm_set_msr+0xb6/0x1a0 [kvm] kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] kvm_vcpu_ioctl+0x485/0x5b0 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&kvm->srcu){.+.+}-{0:0}: __synchronize_srcu+0x44/0x1a0 synchronize_srcu_expedited+0x21/0x30 kvm_swap_active_memslots+0x110/0x1c0 [kvm] kvm_set_memslot+0x360/0x620 [kvm] __kvm_set_memory_region+0x27b/0x300 [kvm] kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm] kvm_vm_ioctl+0x295/0x650 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&kvm->slots_lock){+.+.}-{3:3}: __lock_acquire+0x15ef/0x2e30 lock_acquire+0xe0/0x260 __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 set_nx_huge_pages+0x179/0x1e0 [kvm] param_attr_store+0x93/0x100 module_attr_store+0x22/0x40 sysfs_kf_write+0x81/0xb0 kernfs_fop_write_iter+0x133/0x1d0 vfs_write+0x28d/0x380 ksys_write+0x70/0xe0 __x64_sys_write+0x1f/0x30 x64_sys_call+0x281b/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: Chao Gao <chao.gao@intel.com> Fixes: 0bf5049 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 18ad4df ] 1) initial state, three tasks: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | Λ | Λ | Λ | | | | | | V | V | V | bfqq1 bfqq2 bfqq3 process ref: 1 1 1 2) bfqq1 merged to bfqq2: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | | | Λ \--------------\| | | V V | bfqq1--------->bfqq2 bfqq3 process ref: 0 2 1 3) bfqq2 merged to bfqq3: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) here -> Λ | | \--------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3 process ref: 0 1 3 In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then get bfqq3 through merge chain, and finially handle IO by bfqq3. Howerver, current code will think bfqq2 is owned by BIC1, like initial state, and set bfqq2->bic to BIC1. bfq_insert_request -> by Process 1 bfqq = bfq_init_rq(rq) bfqq = bfq_get_bfqq_handle_split bfqq = bic_to_bfqq -> get bfqq2 from BIC1 bfqq->ref++ rq->elv.priv[0] = bic rq->elv.priv[1] = bfqq if (bfqq_process_refs(bfqq) == 1) bfqq->bic = bic -> record BIC1 to bfqq2 __bfq_insert_request new_bfqq = bfq_setup_cooperator -> get bfqq3 from bfqq2->new_bfqq bfqq_request_freed(bfqq) new_bfqq->ref++ rq->elv.priv[1] = new_bfqq -> handle IO by bfqq3 Fix the problem by checking bfqq is from merge chain fist. And this might fix a following problem reported by our syzkaller(unreproducible): ================================================================== BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595 CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G L 6.6.0-07439-gba2303cacfda #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_requeue_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x10d/0x610 mm/kasan/report.c:475 kasan_report+0x8e/0xc0 mm/kasan/report.c:588 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757 bfq_init_rq block/bfq-iosched.c:6876 [inline] bfq_insert_request block/bfq-iosched.c:6254 [inline] bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 </TASK> Allocated by task 20776: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:188 [inline] slab_post_alloc_hook mm/slab.h:763 [inline] slab_alloc_node mm/slub.c:3458 [inline] kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503 ioc_create_icq block/blk-ioc.c:370 [inline] ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517 blk_mq_get_new_requests block/blk-mq.c:2940 [inline] blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042 __submit_bio+0x331/0x6f0 block/blk-core.c:624 __submit_bio_noacct_mq block/blk-core.c:703 [inline] submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Freed by task 946: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1815 [inline] slab_free_freelist_hook mm/slub.c:1841 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0x118/0x6f0 mm/slub.c:3808 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Second to last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 The buggy address belongs to the object at ffff888123839d68 which belongs to the cache bfq_io_cq of size 1360 The buggy address is located 336 bytes inside of freed 1360-byte region [ffff888123839d68, ffff88812383a2b8) The buggy address belongs to the physical page: page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838 head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488 raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 36eca89 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 44d1745 upstream. Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock on x86 due to a chain of locks and SRCU synchronizations. Translating the below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the fairness of r/w semaphores). CPU0 CPU1 CPU2 1 lock(&kvm->slots_lock); 2 lock(&vcpu->mutex); 3 lock(&kvm->srcu); 4 lock(cpu_hotplug_lock); 5 lock(kvm_lock); 6 lock(&kvm->slots_lock); 7 lock(cpu_hotplug_lock); 8 sync(&kvm->srcu); Note, there are likely more potential deadlocks in KVM x86, e.g. the same pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with __kvmclock_cpufreq_notifier(): cpuhp_cpufreq_online() | -> cpufreq_online() | -> cpufreq_gov_performance_limits() | -> __cpufreq_driver_target() | -> __target_index() | -> cpufreq_freq_transition_begin() | -> cpufreq_notify_transition() | -> ... __kvmclock_cpufreq_notifier() But, actually triggering such deadlocks is beyond rare due to the combination of dependencies and timings involved. E.g. the cpufreq notifier is only used on older CPUs without a constant TSC, mucking with the NX hugepage mitigation while VMs are running is very uncommon, and doing so while also onlining/offlining a CPU (necessary to generate contention on cpu_hotplug_lock) would be even more unusual. The most robust solution to the general cpu_hotplug_lock issue is likely to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq notifier doesn't to take kvm_lock. For now, settle for fixing the most blatant deadlock, as switching to an RCU-protected list is a much more involved change, but add a comment in locking.rst to call out that care needs to be taken when walking holding kvm_lock and walking vm_list. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O ------------------------------------------------------ tee/35048 is trying to acquire lock: ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm] but task is already holding lock: ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (kvm_lock){+.+.}-{3:3}: __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 kvm_dev_ioctl+0x4fb/0xe50 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (cpu_hotplug_lock){++++}-{0:0}: cpus_read_lock+0x2e/0xb0 static_key_slow_inc+0x16/0x30 kvm_lapic_set_base+0x6a/0x1c0 [kvm] kvm_set_apic_base+0x8f/0xe0 [kvm] kvm_set_msr_common+0x9ae/0xf80 [kvm] vmx_set_msr+0xa54/0xbe0 [kvm_intel] __kvm_set_msr+0xb6/0x1a0 [kvm] kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm] kvm_vcpu_ioctl+0x485/0x5b0 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&kvm->srcu){.+.+}-{0:0}: __synchronize_srcu+0x44/0x1a0 synchronize_srcu_expedited+0x21/0x30 kvm_swap_active_memslots+0x110/0x1c0 [kvm] kvm_set_memslot+0x360/0x620 [kvm] __kvm_set_memory_region+0x27b/0x300 [kvm] kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm] kvm_vm_ioctl+0x295/0x650 [kvm] __se_sys_ioctl+0x7b/0xd0 __x64_sys_ioctl+0x21/0x30 x64_sys_call+0x15d0/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (&kvm->slots_lock){+.+.}-{3:3}: __lock_acquire+0x15ef/0x2e30 lock_acquire+0xe0/0x260 __mutex_lock+0x6a/0xb40 mutex_lock_nested+0x1f/0x30 set_nx_huge_pages+0x179/0x1e0 [kvm] param_attr_store+0x93/0x100 module_attr_store+0x22/0x40 sysfs_kf_write+0x81/0xb0 kernfs_fop_write_iter+0x133/0x1d0 vfs_write+0x28d/0x380 ksys_write+0x70/0xe0 __x64_sys_write+0x1f/0x30 x64_sys_call+0x281b/0x2e60 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: Chao Gao <chao.gao@intel.com> Fixes: 0bf5049 ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Cc: stable@vger.kernel.org Reviewed-by: Kai Huang <kai.huang@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac01c8c upstream. AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with 5c24b67 ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <yunzhao@cloudflare.com> Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com> Cc: Ian Rogers <irogers@google.com> Cc: kernel-team@cloudflare.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org # v5.13+ Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a848c29 ] On the node of an NFS client, some files saved in the mountpoint of the NFS server were copied to another location of the same NFS server. Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference crash with the following syslog: [232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116 [232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [232066.588586] Mem abort info: [232066.588701] ESR = 0x0000000096000007 [232066.588862] EC = 0x25: DABT (current EL), IL = 32 bits [232066.589084] SET = 0, FnV = 0 [232066.589216] EA = 0, S1PTW = 0 [232066.589340] FSC = 0x07: level 3 translation fault [232066.589559] Data abort info: [232066.589683] ISV = 0, ISS = 0x00000007 [232066.589842] CM = 0, WnR = 0 [232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400 [232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000 [232066.590757] Internal error: Oops: 96000007 [#1] SMP [232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2 [232066.591052] vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs [232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1 [232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06 [232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4] [232066.598595] sp : ffff8000f568fc70 [232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000 [232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001 [232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050 [232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000 [232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000 [232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6 [232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828 [232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a [232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058 [232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000 [232066.601636] Call trace: [232066.601749] nfs4_reclaim_open_state+0x220/0x800 [nfsv4] [232066.601998] nfs4_do_reclaim+0x1b8/0x28c [nfsv4] [232066.602218] nfs4_state_manager+0x928/0x10f0 [nfsv4] [232066.602455] nfs4_run_state_manager+0x78/0x1b0 [nfsv4] [232066.602690] kthread+0x110/0x114 [232066.602830] ret_from_fork+0x10/0x20 [232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00) [232066.603284] SMP: stopping secondary CPUs [232066.606936] Starting crashdump kernel... [232066.607146] Bye! Analysing the vmcore, we know that nfs4_copy_state listed by destination nfs_server->ss_copies was added by the field copies in handle_async_copy(), and we found a waiting copy process with the stack as: PID: 3511963 TASK: ffff710028b47e00 CPU: 0 COMMAND: "cp" #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4 #1 [ffff8001116ef760] __schedule at ffff800008dd0650 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4] #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4] #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4] #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4] The NULL-pointer dereference was due to nfs42_complete_copies() listed the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state. So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and the data accessed through this pointer was also incorrect. Generally, the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state(). When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED and copies are not deleted in nfs_server->ss_copies, the source state may be passed to the nfs42_complete_copies() process earlier, resulting in this crash scene finally. To solve this issue, we add a list_head nfs_server->ss_src_copies for a server-to-server copy specially. Fixes: 0e65a32 ("NFS: handle source server reboot") Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn> Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 3d04139 upstream. Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf1 ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops"), and commit 27069e7 ("mptcp: disable active MPTCP in case of blackhole") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "endpoint attempt" MIB entry. Also a conflict in protocol.h, because commit fce68b0 ("mptcp: add scheduled in mptcp_subflow_context") is not in this version, and changes the context by introducing 'scheduled' variable just before. Also a conflict in pm_netlink.c, because commit 3aa3624 ("mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket()") is not in this version, and refactor the function: that's fine, we can still set pm_listener before doing the 'listen()', taking 'ssock->sk' as 'ssk' is not defined before this refactoring. There is also a conflict because the context has been changed later in commit 69925a3 ("mptcp: ensure listener is unhashed before updating the sk status"). ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d04139 upstream. Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf1 ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops"), and commit 27069e7 ("mptcp: disable active MPTCP in case of blackhole") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "endpoint attempt" MIB entry. Also a conflict in protocol.h, because commit fce68b0 ("mptcp: add scheduled in mptcp_subflow_context") is not in this version, and changes the context by introducing 'scheduled' variable just before. Also a conflict in pm_netlink.c, because commit 3aa3624 ("mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket()") is not in this version, and refactor the function: that's fine, we can still set pm_listener before doing the 'listen()', taking 'ssock->sk' as 'ssk' is not defined before this refactoring. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…ation commit c728a95 upstream. When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detailed log is as follows. root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg By analyzing the XDP_REDIRECT implementation of enetc driver, the driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures rings. Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the Tx rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames. Fixes: c33bfaf ("net: enetc: set up XDP program under enetc_reconfigure()") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a93f2c upstream. The Tx BD rings are disabled first in enetc_stop() and the driver waits for them to become empty. This operation is not safe while the ring is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the NETC block guide, software should only disable an active Tx ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. When testing XDP_REDIRECT feautre, although all frames were blocked from being put into Tx rings during ring reconfiguration, the similar warning log was still encountered: fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear The reason is that when there are still unsent frames in the Tx ring, disabling the Tx ring causes the remaining frames to be unable to be sent out. And the Tx ring cannot be restored, which means that even if the xdp program is uninstalled, the Tx frames cannot be sent out anymore. Therefore, correct the operation order in enect_start() and enect_stop(). Fixes: ff58fda ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d04139 upstream. Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf1 ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops"), and commit 27069e7 ("mptcp: disable active MPTCP in case of blackhole") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "endpoint attempt" MIB entry. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
…ation commit c728a95 upstream. When testing the XDP_REDIRECT function on the LS1028A platform, we found a very reproducible issue that the Tx frames can no longer be sent out even if XDP_REDIRECT is turned off. Specifically, if there is a lot of traffic on Rx direction, when XDP_REDIRECT is turned on, the console may display some warnings like "timeout for tx ring #6 clear", and all redirected frames will be dropped, the detailed log is as follows. root@ls1028ardb:~# ./xdp-bench redirect eno0 eno2 Redirecting from eno0 (ifindex 3; driver fsl_enetc) to eno2 (ifindex 4; driver fsl_enetc) [203.849809] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #5 clear [204.006051] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear [204.161944] fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear eno0->eno2 1420505 rx/s 1420590 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420590 drop/s 0 drv_err/s 15.71 bulk-avg eno0->eno2 1420484 rx/s 1420485 err,drop/s 0 xmit/s xmit eno0->eno2 0 xmit/s 1420485 drop/s 0 drv_err/s 15.71 bulk-avg By analyzing the XDP_REDIRECT implementation of enetc driver, the driver will reconfigure Tx and Rx BD rings when a bpf program is installed or uninstalled, but there is no mechanisms to block the redirected frames when enetc driver reconfigures rings. Similarly, XDP_TX verdicts on received frames can also lead to frames being enqueued in the Tx rings. Because XDP ignores the state set by the netif_tx_wake_queue() API, so introduce the ENETC_TX_DOWN flag to suppress transmission of XDP frames. Fixes: c33bfaf ("net: enetc: set up XDP program under enetc_reconfigure()") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0a93f2c upstream. The Tx BD rings are disabled first in enetc_stop() and the driver waits for them to become empty. This operation is not safe while the ring is actively transmitting frames, and will cause the ring to not be empty and hardware exception. As described in the NETC block guide, software should only disable an active Tx ring after all pending ring entries have been consumed (i.e. when PI = CI). Disabling a transmit ring that is actively processing BDs risks a HW-SW race hazard whereby a hardware resource becomes assigned to work on one or more ring entries only to have those entries be removed due to the ring becoming disabled. When testing XDP_REDIRECT feautre, although all frames were blocked from being put into Tx rings during ring reconfiguration, the similar warning log was still encountered: fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #6 clear fsl_enetc 0000:00:00.2 eno2: timeout for tx ring #7 clear The reason is that when there are still unsent frames in the Tx ring, disabling the Tx ring causes the remaining frames to be unable to be sent out. And the Tx ring cannot be restored, which means that even if the xdp program is uninstalled, the Tx frames cannot be sent out anymore. Therefore, correct the operation order in enect_start() and enect_stop(). Fixes: ff58fda ("net: enetc: prioritize ability to go down over packet processing") Cc: stable@vger.kernel.org Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241010092056.298128-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3d04139 upstream. Syzkaller reported a lockdep splat: ============================================ WARNING: possible recursive locking detected 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Not tainted -------------------------------------------- syz-executor364/5113 is trying to acquire lock: ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff8880449f1958 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 but task is already holding lock: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-slock-AF_INET); lock(k-slock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by syz-executor364/5113: #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #0: ffff8880449f0e18 (sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg+0x153/0x1b10 net/mptcp/protocol.c:1806 #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1607 [inline] #1: ffff88803fe39ad8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_sendmsg_fastopen+0x11f/0x530 net/mptcp/protocol.c:1727 #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #2: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x5f/0x1b80 net/ipv4/ip_output.c:470 #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #3: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x45f/0x1390 net/ipv4/ip_output.c:228 #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline] #4: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x33b/0x15b0 net/core/dev.c:6104 #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:326 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline] #5: ffffffff8e938320 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x230/0x5f0 net/ipv4/ip_input.c:232 #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline] #6: ffff88803fe3cb58 (k-slock-AF_INET){+.-.}-{2:2}, at: sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 stack backtrace: CPU: 0 UID: 0 PID: 5113 Comm: syz-executor364 Not tainted 6.11.0-rc6-syzkaller-00019-g67784a74e258 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 check_deadlock kernel/locking/lockdep.c:3061 [inline] validate_chain+0x15d3/0x5900 kernel/locking/lockdep.c:3855 __lock_acquire+0x137a/0x2040 kernel/locking/lockdep.c:5142 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5759 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline] _raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154 spin_lock include/linux/spinlock.h:351 [inline] sk_clone_lock+0x2cd/0xf40 net/core/sock.c:2328 mptcp_sk_clone_init+0x32/0x13c0 net/mptcp/protocol.c:3279 subflow_syn_recv_sock+0x931/0x1920 net/mptcp/subflow.c:874 tcp_check_req+0xfe4/0x1a20 net/ipv4/tcp_minisocks.c:853 tcp_v4_rcv+0x1c3e/0x37f0 net/ipv4/tcp_ipv4.c:2267 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5661 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6108 __napi_poll+0xcb/0x490 net/core/dev.c:6772 napi_poll net/core/dev.c:6841 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6963 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:908 [inline] __dev_queue_xmit+0x1763/0x3e90 net/core/dev.c:4450 dev_queue_xmit include/linux/netdevice.h:3105 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:235 ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:535 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6542 [inline] tcp_rcv_state_process+0x2c32/0x4570 net/ipv4/tcp_input.c:6729 tcp_v4_do_rcv+0x77d/0xc70 net/ipv4/tcp_ipv4.c:1934 sk_backlog_rcv include/net/sock.h:1111 [inline] __release_sock+0x214/0x350 net/core/sock.c:3004 release_sock+0x61/0x1f0 net/core/sock.c:3558 mptcp_sendmsg_fastopen+0x1ad/0x530 net/mptcp/protocol.c:1733 mptcp_sendmsg+0x1884/0x1b10 net/mptcp/protocol.c:1812 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2737 __do_sys_sendmmsg net/socket.c:2766 [inline] __se_sys_sendmmsg net/socket.c:2763 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2763 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f04fb13a6b9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 01 1a 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd651f42d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f04fb13a6b9 RDX: 0000000000000001 RSI: 0000000020000d00 RDI: 0000000000000004 RBP: 00007ffd651f4310 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000020000080 R11: 0000000000000246 R12: 00000000000f4240 R13: 00007f04fb187449 R14: 00007ffd651f42f4 R15: 00007ffd651f4300 </TASK> As noted by Cong Wang, the splat is false positive, but the code path leading to the report is an unexpected one: a client is attempting an MPC handshake towards the in-kernel listener created by the in-kernel PM for a port based signal endpoint. Such connection will be never accepted; many of them can make the listener queue full and preventing the creation of MPJ subflow via such listener - its intended role. Explicitly detect this scenario at initial-syn time and drop the incoming MPC request. Fixes: 1729cf1 ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Reported-by: syzbot+f4aacdfef2c6a6529c3e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f4aacdfef2c6a6529c3e Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-1-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 6982826 ("mptcp: fallback to TCP after SYN+MPC drops"), and commit 27069e7 ("mptcp: disable active MPTCP in case of blackhole") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "endpoint attempt" MIB entry. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac01c8c upstream. AddressSanitizer found a use-after-free bug in the symbol code which manifested as 'perf top' segfaulting. ==1238389==ERROR: AddressSanitizer: heap-use-after-free on address 0x60b00c48844b at pc 0x5650d8035961 bp 0x7f751aaecc90 sp 0x7f751aaecc80 READ of size 1 at 0x60b00c48844b thread T193 #0 0x5650d8035960 in _sort__sym_cmp util/sort.c:310 #1 0x5650d8043744 in hist_entry__cmp util/hist.c:1286 #2 0x5650d8043951 in hists__findnew_entry util/hist.c:614 #3 0x5650d804568f in __hists__add_entry util/hist.c:754 #4 0x5650d8045bf9 in hists__add_entry util/hist.c:772 #5 0x5650d8045df1 in iter_add_single_normal_entry util/hist.c:997 #6 0x5650d8043326 in hist_entry_iter__add util/hist.c:1242 #7 0x5650d7ceeefe in perf_event__process_sample /home/matt/src/linux/tools/perf/builtin-top.c:845 #8 0x5650d7ceeefe in deliver_event /home/matt/src/linux/tools/perf/builtin-top.c:1208 #9 0x5650d7fdb51b in do_flush util/ordered-events.c:245 #10 0x5650d7fdb51b in __ordered_events__flush util/ordered-events.c:324 #11 0x5650d7ced743 in process_thread /home/matt/src/linux/tools/perf/builtin-top.c:1120 #12 0x7f757ef1f133 in start_thread nptl/pthread_create.c:442 #13 0x7f757ef9f7db in clone3 ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 When updating hist maps it's also necessary to update the hist symbol reference because the old one gets freed in map__put(). While this bug was probably introduced with 5c24b67 ("perf tools: Replace map->referenced & maps->removed_maps with map->refcnt"), the symbol objects were leaked until c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") was merged so the bug was masked. Fixes: c087e94 ("perf machine: Fix refcount usage when processing PERF_RECORD_KSYMBOL") Reported-by: Yunzhao Li <yunzhao@cloudflare.com> Signed-off-by: Matt Fleming (Cloudflare) <matt@readmodwrite.com> Cc: Ian Rogers <irogers@google.com> Cc: kernel-team@cloudflare.com Cc: Namhyung Kim <namhyung@kernel.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: stable@vger.kernel.org # v5.13+ Link: https://lore.kernel.org/r/20240815142212.3834625-1-matt@readmodwrite.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 18ad4df ] 1) initial state, three tasks: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | Λ | Λ | Λ | | | | | | V | V | V | bfqq1 bfqq2 bfqq3 process ref: 1 1 1 2) bfqq1 merged to bfqq2: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | | | Λ \--------------\| | | V V | bfqq1--------->bfqq2 bfqq3 process ref: 0 2 1 3) bfqq2 merged to bfqq3: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) here -> Λ | | \--------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3 process ref: 0 1 3 In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then get bfqq3 through merge chain, and finially handle IO by bfqq3. Howerver, current code will think bfqq2 is owned by BIC1, like initial state, and set bfqq2->bic to BIC1. bfq_insert_request -> by Process 1 bfqq = bfq_init_rq(rq) bfqq = bfq_get_bfqq_handle_split bfqq = bic_to_bfqq -> get bfqq2 from BIC1 bfqq->ref++ rq->elv.priv[0] = bic rq->elv.priv[1] = bfqq if (bfqq_process_refs(bfqq) == 1) bfqq->bic = bic -> record BIC1 to bfqq2 __bfq_insert_request new_bfqq = bfq_setup_cooperator -> get bfqq3 from bfqq2->new_bfqq bfqq_request_freed(bfqq) new_bfqq->ref++ rq->elv.priv[1] = new_bfqq -> handle IO by bfqq3 Fix the problem by checking bfqq is from merge chain fist. And this might fix a following problem reported by our syzkaller(unreproducible): ================================================================== BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595 CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G L 6.6.0-07439-gba2303cacfda #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_requeue_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x10d/0x610 mm/kasan/report.c:475 kasan_report+0x8e/0xc0 mm/kasan/report.c:588 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757 bfq_init_rq block/bfq-iosched.c:6876 [inline] bfq_insert_request block/bfq-iosched.c:6254 [inline] bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 </TASK> Allocated by task 20776: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:188 [inline] slab_post_alloc_hook mm/slab.h:763 [inline] slab_alloc_node mm/slub.c:3458 [inline] kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503 ioc_create_icq block/blk-ioc.c:370 [inline] ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517 blk_mq_get_new_requests block/blk-mq.c:2940 [inline] blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042 __submit_bio+0x331/0x6f0 block/blk-core.c:624 __submit_bio_noacct_mq block/blk-core.c:703 [inline] submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Freed by task 946: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1815 [inline] slab_free_freelist_hook mm/slub.c:1841 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0x118/0x6f0 mm/slub.c:3808 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Second to last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 The buggy address belongs to the object at ffff888123839d68 which belongs to the cache bfq_io_cq of size 1360 The buggy address is located 336 bytes inside of freed 1360-byte region [ffff888123839d68, ffff88812383a2b8) The buggy address belongs to the physical page: page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838 head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488 raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 36eca89 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 18ad4df ] 1) initial state, three tasks: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | Λ | Λ | Λ | | | | | | V | V | V | bfqq1 bfqq2 bfqq3 process ref: 1 1 1 2) bfqq1 merged to bfqq2: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) | | | Λ \--------------\| | | V V | bfqq1--------->bfqq2 bfqq3 process ref: 0 2 1 3) bfqq2 merged to bfqq3: Process 1 Process 2 Process 3 (BIC1) (BIC2) (BIC3) here -> Λ | | \--------------\ \-------------\| V V bfqq1--------->bfqq2---------->bfqq3 process ref: 0 1 3 In this case, IO from Process 1 will get bfqq2 from BIC1 first, and then get bfqq3 through merge chain, and finially handle IO by bfqq3. Howerver, current code will think bfqq2 is owned by BIC1, like initial state, and set bfqq2->bic to BIC1. bfq_insert_request -> by Process 1 bfqq = bfq_init_rq(rq) bfqq = bfq_get_bfqq_handle_split bfqq = bic_to_bfqq -> get bfqq2 from BIC1 bfqq->ref++ rq->elv.priv[0] = bic rq->elv.priv[1] = bfqq if (bfqq_process_refs(bfqq) == 1) bfqq->bic = bic -> record BIC1 to bfqq2 __bfq_insert_request new_bfqq = bfq_setup_cooperator -> get bfqq3 from bfqq2->new_bfqq bfqq_request_freed(bfqq) new_bfqq->ref++ rq->elv.priv[1] = new_bfqq -> handle IO by bfqq3 Fix the problem by checking bfqq is from merge chain fist. And this might fix a following problem reported by our syzkaller(unreproducible): ================================================================== BUG: KASAN: slab-use-after-free in bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] BUG: KASAN: slab-use-after-free in bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] BUG: KASAN: slab-use-after-free in bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 Write of size 1 at addr ffff888123839eb8 by task kworker/0:1H/18595 CPU: 0 PID: 18595 Comm: kworker/0:1H Tainted: G L 6.6.0-07439-gba2303cacfda #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 Workqueue: kblockd blk_mq_requeue_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x91/0xf0 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:364 [inline] print_report+0x10d/0x610 mm/kasan/report.c:475 kasan_report+0x8e/0xc0 mm/kasan/report.c:588 bfq_do_early_stable_merge block/bfq-iosched.c:5692 [inline] bfq_do_or_sched_stable_merge block/bfq-iosched.c:5805 [inline] bfq_get_queue+0x25b0/0x2610 block/bfq-iosched.c:5889 bfq_get_bfqq_handle_split+0x169/0x5d0 block/bfq-iosched.c:6757 bfq_init_rq block/bfq-iosched.c:6876 [inline] bfq_insert_request block/bfq-iosched.c:6254 [inline] bfq_insert_requests+0x1112/0x5cf0 block/bfq-iosched.c:6304 blk_mq_insert_request+0x290/0x8d0 block/blk-mq.c:2593 blk_mq_requeue_work+0x6bc/0xa70 block/blk-mq.c:1502 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 </TASK> Allocated by task 20776: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 __kasan_slab_alloc+0x87/0x90 mm/kasan/common.c:328 kasan_slab_alloc include/linux/kasan.h:188 [inline] slab_post_alloc_hook mm/slab.h:763 [inline] slab_alloc_node mm/slub.c:3458 [inline] kmem_cache_alloc_node+0x1a4/0x6f0 mm/slub.c:3503 ioc_create_icq block/blk-ioc.c:370 [inline] ioc_find_get_icq+0x180/0xaa0 block/blk-ioc.c:436 bfq_prepare_request+0x39/0xf0 block/bfq-iosched.c:6812 blk_mq_rq_ctx_init.isra.7+0x6ac/0xa00 block/blk-mq.c:403 __blk_mq_alloc_requests+0xcc0/0x1070 block/blk-mq.c:517 blk_mq_get_new_requests block/blk-mq.c:2940 [inline] blk_mq_submit_bio+0x624/0x27c0 block/blk-mq.c:3042 __submit_bio+0x331/0x6f0 block/blk-core.c:624 __submit_bio_noacct_mq block/blk-core.c:703 [inline] submit_bio_noacct_nocheck+0x816/0xb40 block/blk-core.c:732 submit_bio_noacct+0x7a6/0x1b50 block/blk-core.c:826 xlog_write_iclog+0x7d5/0xa00 fs/xfs/xfs_log.c:1958 xlog_state_release_iclog+0x3b8/0x720 fs/xfs/xfs_log.c:619 xlog_cil_push_work+0x19c5/0x2270 fs/xfs/xfs_log_cil.c:1330 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Freed by task 946: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 kasan_set_track+0x25/0x30 mm/kasan/common.c:52 kasan_save_free_info+0x2b/0x50 mm/kasan/generic.c:522 ____kasan_slab_free mm/kasan/common.c:236 [inline] __kasan_slab_free+0x12c/0x1c0 mm/kasan/common.c:244 kasan_slab_free include/linux/kasan.h:164 [inline] slab_free_hook mm/slub.c:1815 [inline] slab_free_freelist_hook mm/slub.c:1841 [inline] slab_free mm/slub.c:3786 [inline] kmem_cache_free+0x118/0x6f0 mm/slub.c:3808 rcu_do_batch+0x35c/0xe30 kernel/rcu/tree.c:2189 rcu_core+0x819/0xd90 kernel/rcu/tree.c:2462 __do_softirq+0x1b0/0x7a2 kernel/softirq.c:553 Last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 Second to last potentially related work creation: kasan_save_stack+0x20/0x40 mm/kasan/common.c:45 __kasan_record_aux_stack+0xaf/0xc0 mm/kasan/generic.c:492 __call_rcu_common kernel/rcu/tree.c:2712 [inline] call_rcu+0xce/0x1020 kernel/rcu/tree.c:2826 ioc_destroy_icq+0x54c/0x830 block/blk-ioc.c:105 ioc_release_fn+0xf0/0x360 block/blk-ioc.c:124 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x432/0x13f0 kernel/workqueue.c:2700 worker_thread+0x6f2/0x1160 kernel/workqueue.c:2781 kthread+0x33c/0x440 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:305 The buggy address belongs to the object at ffff888123839d68 which belongs to the cache bfq_io_cq of size 1360 The buggy address is located 336 bytes inside of freed 1360-byte region [ffff888123839d68, ffff88812383a2b8) The buggy address belongs to the physical page: page:ffffea00048e0e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812383f588 pfn:0x123838 head:ffffea00048e0e00 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0x17ffffc0000a40(workingset|slab|head|node=0|zone=2|lastcpupid=0x1fffff) page_type: 0xffffffff() raw: 0017ffffc0000a40 ffff88810588c200 ffffea00048ffa10 ffff888105889488 raw: ffff88812383f588 0000000000150006 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888123839d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff888123839e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888123839f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888123839f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: 36eca89 ("block, bfq: add Early Queue Merge (EQM)") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240902130329.3787024-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cb7e509 ] If ufshcd_rtc_work calls ufshcd_rpm_put_sync() and the pm's usage_count is 0, we will enter the runtime suspend callback. However, the runtime suspend callback will wait to flush ufshcd_rtc_work, causing a deadlock. Replace ufshcd_rpm_put_sync() with ufshcd_rpm_put() to avoid the deadlock. Fixes: 6bf999e ("scsi: ufs: core: Add UFS RTC support") Cc: stable@vger.kernel.org #6.11.x Signed-off-by: Peter Wang <peter.wang@mediatek.com> Link: https://lore.kernel.org/r/20241024015453.21684-1-peter.wang@mediatek.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit cf96b8e upstream. ASan reports a memory leak caused by evlist not being deleted on exit in perf-report, perf-script and perf-data. The problem is caused by evlist->session not being deleted, which is allocated in perf_session__read_header, called in perf_session__new if perf_data is in read mode. In case of write mode, the session->evlist is filled by the caller. This patch solves the problem by calling evlist__delete in perf_session__delete if perf_data is in read mode. Changes in v2: - call evlist__delete from within perf_session__delete v1: https://lore.kernel.org/lkml/20210621234317.235545-1-rickyman7@gmail.com/ ASan report follows: $ ./perf script report flamegraph ================================================================= ==227640==ERROR: LeakSanitizer: detected memory leaks <SNIP unrelated> Indirect leak of 2704 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x7f999e in evlist__new /home/user/linux/tools/perf/util/evlist.c:77:26 #3 0x8ad938 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3797:20 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 568 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0x80ce88 in evsel__new_idx /home/user/linux/tools/perf/util/evsel.c:268:24 #3 0x8aed93 in evsel__new /home/user/linux/tools/perf/util/evsel.h:210:9 #4 0x8ae07e in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3853:11 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 264 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbe3e70 in xyarray__new /home/user/linux/tools/lib/perf/xyarray.c:10:23 #3 0xbd7754 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:361:21 #4 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #5 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #6 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #7 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #8 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #9 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #10 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #11 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #12 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 32 byte(s) in 1 object(s) allocated from: #0 0x4f4137 in calloc (/home/user/linux/tools/perf/perf+0x4f4137) #1 0xbe3d56 in zalloc /home/user/linux/tools/lib/perf/../../lib/zalloc.c:8:9 #2 0xbd77e0 in perf_evsel__alloc_id /home/user/linux/tools/lib/perf/evsel.c:365:14 #3 0x8ae201 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3871:7 #4 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #5 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #6 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #7 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #8 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #9 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #10 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #11 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) Indirect leak of 7 byte(s) in 1 object(s) allocated from: #0 0x4b8207 in strdup (/home/user/linux/tools/perf/perf+0x4b8207) #1 0x8b4459 in evlist__set_event_name /home/user/linux/tools/perf/util/header.c:2292:16 #2 0x89d862 in process_event_desc /home/user/linux/tools/perf/util/header.c:2313:3 #3 0x8af319 in perf_file_section__process /home/user/linux/tools/perf/util/header.c:3651:9 #4 0x8aa6e9 in perf_header__process_sections /home/user/linux/tools/perf/util/header.c:3427:9 #5 0x8ae3e7 in perf_session__read_header /home/user/linux/tools/perf/util/header.c:3886:2 #6 0x8ec714 in perf_session__open /home/user/linux/tools/perf/util/session.c:109:6 #7 0x8ebe83 in perf_session__new /home/user/linux/tools/perf/util/session.c:213:10 #8 0x60c6de in cmd_script /home/user/linux/tools/perf/builtin-script.c:3856:12 #9 0x7b2930 in run_builtin /home/user/linux/tools/perf/perf.c:313:11 #10 0x7b120f in handle_internal_command /home/user/linux/tools/perf/perf.c:365:8 #11 0x7b2493 in run_argv /home/user/linux/tools/perf/perf.c:409:2 #12 0x7b0c89 in main /home/user/linux/tools/perf/perf.c:539:3 #13 0x7f5260654b74 (/lib64/libc.so.6+0x27b74) SUMMARY: AddressSanitizer: 3728 byte(s) leaked in 7 allocation(s). Signed-off-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20210624231926.212208-1-rickyman7@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: stable@vger.kernel.org # 5.10.228 Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Version:
5.0.7.a-2-hardened
When running
dmesg
, the following occurs when attempting to read the kernel buffer:Ʋ dmesg INSERT dmesg: read kernel buffer failed: Operation not permitted
journalctl
however, is not restricted from reading the buffer:Since
systemd-journal
uses thesys_admin
capability, it is not restricted:Ʋ pscap | grep journal 1 29264 root systemd-journal chown, dac_override, dac_read_search, fowner, setgid, setuid, sys_ptrace, sys_admin, audit_control, mac_override, syslog, audit_read
Is there anyway we can restrict
journalctl
from accessingdmesg
? Thesystemd
project recommends removing the user from thesystemd-journald
group, but that does not work for my installation, as my user is not in that group.The text was updated successfully, but these errors were encountered: