-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[interop] watchdog: BUG: soft lockup - CPU#2 stuck for 22s #87
Comments
I think hit some corner case that triggers an infinite loop in subflow_check_data_avail() @cpaasch could you please have a run without the automatic reboot on soft lookup and try enabling the dyn debug once it got stuck ? echo 'file net/mptcp/* +fmp' > /sys/kernel/debug/dynamic_debug/control if the build don't have the dyn debug, perhaps per top could give some hints |
I think I got something useful:
|
@cpaasch thank you! The following tentative patch should fix the issue, could you please try that ?
|
Yes, problem solved!!! |
Christoph reported an infinite loop in the subflow receive path under stress condition. If there are multiple subflows, each of them using a large send buffer, the delta between the sequence number used by MPTCP-level retransmission can and the current msk->ack_seq can be greater than MAX_INT. In the above scenario, when calling mptcp_subflow_discard_data(), such delta will be truncated to int, and could result in a negative number: no bytes will be dropped, and subflow_check_data_avail() will try again to process the same packet, looping forever. This change addresses the issue by expanding the 'limit' size to 64 bits, so that overflows are not possible anymore. Closes: multipath-tcp/mptcp_net-next#87 Fixes: 6719331 ("mptcp: trigger msk processing even for OoO data") Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Christoph reported an infinite loop in the subflow receive path under stress condition. If there are multiple subflows, each of them using a large send buffer, the delta between the sequence number used by MPTCP-level retransmission can and the current msk->ack_seq can be greater than MAX_INT. In the above scenario, when calling mptcp_subflow_discard_data(), such delta will be truncated to int, and could result in a negative number: no bytes will be dropped, and subflow_check_data_avail() will try again to process the same packet, looping forever. This change addresses the issue by expanding the 'limit' size to 64 bits, so that overflows are not possible anymore. Closes: multipath-tcp/mptcp_net-next#87 Fixes: 6719331 ("mptcp: trigger msk processing even for OoO data") Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
linked to 1d39cd8 |
During testing of f263a81 ("bpf: Track subprog poke descriptors correctly and fix use-after-free") under various failure conditions, for example, when jit_subprogs() fails and tries to clean up the program to be run under the interpreter, we ran into the following freeze: [...] #127/8 tailcall_bpf2bpf_3:FAIL [...] [ 92.041251] BUG: KASAN: slab-out-of-bounds in ___bpf_prog_run+0x1b9d/0x2e20 [ 92.042408] Read of size 8 at addr ffff88800da67f68 by task test_progs/682 [ 92.043707] [ 92.044030] CPU: 1 PID: 682 Comm: test_progs Tainted: G O 5.13.0-53301-ge6c08cb33a30-dirty #87 [ 92.045542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 92.046785] Call Trace: [ 92.047171] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.047773] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.048389] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.049019] ? ktime_get+0x117/0x130 [...] // few hundred [similar] lines more [ 92.659025] ? ktime_get+0x117/0x130 [ 92.659845] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.660738] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.661528] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.662378] ? print_usage_bug+0x50/0x50 [ 92.663221] ? print_usage_bug+0x50/0x50 [ 92.664077] ? bpf_ksym_find+0x9c/0xe0 [ 92.664887] ? ktime_get+0x117/0x130 [ 92.665624] ? kernel_text_address+0xf5/0x100 [ 92.666529] ? __kernel_text_address+0xe/0x30 [ 92.667725] ? unwind_get_return_address+0x2f/0x50 [ 92.668854] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.670185] ? ktime_get+0x117/0x130 [ 92.671130] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.672020] ? __bpf_prog_run_args32+0x8b/0xb0 [ 92.672860] ? __bpf_prog_run_args64+0xc0/0xc0 [ 92.675159] ? ktime_get+0x117/0x130 [ 92.677074] ? lock_is_held_type+0xd5/0x130 [ 92.678662] ? ___bpf_prog_run+0x15d4/0x2e20 [ 92.680046] ? ktime_get+0x117/0x130 [ 92.681285] ? __bpf_prog_run32+0x6b/0x90 [ 92.682601] ? __bpf_prog_run64+0x90/0x90 [ 92.683636] ? lock_downgrade+0x370/0x370 [ 92.684647] ? mark_held_locks+0x44/0x90 [ 92.685652] ? ktime_get+0x117/0x130 [ 92.686752] ? lockdep_hardirqs_on+0x79/0x100 [ 92.688004] ? ktime_get+0x117/0x130 [ 92.688573] ? __cant_migrate+0x2b/0x80 [ 92.689192] ? bpf_test_run+0x2f4/0x510 [ 92.689869] ? bpf_test_timer_continue+0x1c0/0x1c0 [ 92.690856] ? rcu_read_lock_bh_held+0x90/0x90 [ 92.691506] ? __kasan_slab_alloc+0x61/0x80 [ 92.692128] ? eth_type_trans+0x128/0x240 [ 92.692737] ? __build_skb+0x46/0x50 [ 92.693252] ? bpf_prog_test_run_skb+0x65e/0xc50 [ 92.693954] ? bpf_prog_test_run_raw_tp+0x2d0/0x2d0 [ 92.694639] ? __fget_light+0xa1/0x100 [ 92.695162] ? bpf_prog_inc+0x23/0x30 [ 92.695685] ? __sys_bpf+0xb40/0x2c80 [ 92.696324] ? bpf_link_get_from_fd+0x90/0x90 [ 92.697150] ? mark_held_locks+0x24/0x90 [ 92.698007] ? lockdep_hardirqs_on_prepare+0x124/0x220 [ 92.699045] ? finish_task_switch+0xe6/0x370 [ 92.700072] ? lockdep_hardirqs_on+0x79/0x100 [ 92.701233] ? finish_task_switch+0x11d/0x370 [ 92.702264] ? __switch_to+0x2c0/0x740 [ 92.703148] ? mark_held_locks+0x24/0x90 [ 92.704155] ? __x64_sys_bpf+0x45/0x50 [ 92.705146] ? do_syscall_64+0x35/0x80 [ 92.706953] ? entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Turns out that the program rejection from e411901 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") is buggy since env->prog->aux->tail_call_reachable is never true. Commit ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") added a tracker into check_max_stack_depth() which propagates the tail_call_reachable condition throughout the subprograms. This info is then assigned to the subprogram's func[i]->aux->tail_call_reachable. However, in the case of the rejection check upon JIT failure, env->prog->aux->tail_call_reachable is used. func[0]->aux->tail_call_reachable which represents the main program's information did not propagate this to the outer env->prog->aux, though. Add this propagation into check_max_stack_depth() where it needs to belong so that the check can be done reliably. Fixes: ebf7d1f ("bpf, x64: rework pro/epilogue and tailcall handling in JIT") Fixes: e411901 ("bpf: allow for tailcalls in BPF subprograms for x64 JIT") Co-developed-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/618c34e3163ad1a36b1e82377576a6081e182f25.1626123173.git.daniel@iogearbox.net
jira LE-1907 Rebuild_History Non-Buildable kernel-4.18.0-294.el8 commit-author Paolo Abeni <pabeni@redhat.com> commit 1d39cd8 Empty-Commit: Cherry-Pick Conflicts during history rebuild. Will be included in final tarball splat. Ref for failed cherry-pick at: ciq/ciq_backports/kernel-4.18.0-294.el8/1d39cd8c.failed Christoph reported an infinite loop in the subflow receive path under stress condition. If there are multiple subflows, each of them using a large send buffer, the delta between the sequence number used by MPTCP-level retransmission can and the current msk->ack_seq can be greater than MAX_INT. In the above scenario, when calling mptcp_subflow_discard_data(), such delta will be truncated to int, and could result in a negative number: no bytes will be dropped, and subflow_check_data_avail() will try again to process the same packet, looping forever. This change addresses the issue by expanding the 'limit' size to 64 bits, so that overflows are not possible anymore. Closes: multipath-tcp/mptcp_net-next#87 Fixes: 6719331 ("mptcp: trigger msk processing even for OoO data") Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 1d39cd8) Signed-off-by: Jonathan Maple <jmaple@ciq.com> # Conflicts: # net/mptcp/subflow.c
Running test "bug_cheng_sbuf" (yeah, weird name - historic reasons ;-)) from https://github.com/multipath-tcp/mptcp-scripts/blob/master/testing/testing.py#L555 with netnext as server and multipath-tcp.org as a client. The test configures a huge rmem on the server and configures the client to create 16 subflows using the same pair of IP-addresses. Then, the test starts an iperf.
HEAD is at:
ecc3d085058b ("mptcp: Enable MPTCP when IPPROTO_MPTCP is set") (HEAD) (10 minutes ago)
fdc664ab314a ("mptcp: call tcp_cleanup_rbuf on subflows") (12 hours ago)
35d6b46 ("DO-NOT-MERGE: mptcp: enabled by default") (tag: export/20200909T050747, mptcp_net-next/export) (24 hours ago)
6e7290b ("DO-NOT-MERGE: mptcp: use kmalloc on kasan build") (24 hours ago)
8fd4ed4 ("tcp: propagate MPTCP skb extensions on xmit splits") (24 hours ago)
691048c ("mptcp: use _fast lock version in __mptcp_move_skbs") (24 hours ago)
ef14371 ("mptcp: adjust mptcp receive buffer limit if subflow has larger one") (24 hours ago)
74acdfd ("mptcp: simult flow self-tests") (24 hours ago)
a1975d0 ("mptcp: allow picking different xmit subflows") (24 hours ago)
6855e98 ("mptcp: allow creating non-backup subflows") (24 hours ago)
2c61627 ("mptcp: move address attribute into mptcp_addr_info") (24 hours ago)
b347db3 ("mptcp: add OoO related mibs") (24 hours ago)
91baf67 ("mptcp: cleanup mptcp_subflow_discard_data()") (24 hours ago)
685dab9 ("mptcp: move ooo skbs into msk out of order queue.") (24 hours ago)
c97925c ("mptcp: introduce and use mptcp_try_coalesce()") (24 hours ago)
904204e ("mptcp: basic sndbuf autotuning") (24 hours ago)
5d28362 ("mptcp: trigger msk processing even for OoO data") (24 hours ago)
227d369 ("mptcp: set data_ready status bit in subflow_check_data_avail()") (24 hours ago)
6f2c3af ("mptcp: rethink 'is writable' conditional") (24 hours ago)
ff4e207 ("mptcp: add accept_subflow re-check") (24 hours ago)
1cde79a ("selftests: mptcp: add ADD_ADDR mibs check function") (24 hours ago)
70671b7 ("mptcp: add ADD_ADDR related mibs") (24 hours ago)
6d37e97 ("mptcp: send out ADD_ADDR with echo flag") (24 hours ago)
a20a96f ("mptcp: add the incoming RM_ADDR support") (24 hours ago)
7103b2d ("mptcp: add the outgoing RM_ADDR support") (24 hours ago)
328bd7f ("mptcp: rename addr_signal and the related functions") (24 hours ago)
e8d556d ("selftests/mptcp: Better delay & reordering configuration") (24 hours ago)
38119e7 ("bpf:selftests: add bpf_mptcp_sock() verifier tests") (24 hours ago)
43b5943 ("bpf:selftests: add MPTCP test base") (24 hours ago)
c3de50c ("bpf: add 'bpf_mptcp_sock' structure and helper") (24 hours ago)
31e2af1 ("mptcp: attach subflow socket to parent cgroup") (24 hours ago)
36440f0 ("bpf: expose is_mptcp flag to bpf_tcp_sock") (24 hours ago)
f5499c6 ("nfc: pn533/usb.c: fix spelling of "functions"") (mptcp_net-next/net-next) (26 hours ago)
The text was updated successfully, but these errors were encountered: