Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sporadic failure of mptcp_join.sh selftest 13 #112

Closed
dcaratti opened this issue Nov 18, 2020 · 4 comments
Closed

sporadic failure of mptcp_join.sh selftest 13 #112

dcaratti opened this issue Nov 18, 2020 · 4 comments
Assignees
Labels

Comments

@dcaratti
Copy link
Contributor

dcaratti commented Nov 18, 2020

While backporting the support for "echo bit" to make ADD_ADDR "reliable", I noticed that some of the kselftests that check for the value of MPTcpExtEchoAdd often fail: I'm attaching pcap captures of the same test (number 13 in the current list), repeated twice on the same kernel.
In mp_join-01-ns1-0-FZiNdH.pcap the test failed as follows:

Created /tmp/tmp.6ll1l3yxsi (size 2 KB) containing data sent by client
Created /tmp/tmp.zo99WX2idt (size 2 KB) containing data sent by server
Capturing traffic for test 1 into mp_join-01-ns1-0-FZiNdH.pcap
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 65535 bytes
287 packets captured
290 packets received by filter
0 packets dropped by kernel
01 remove single address                syn[ ok ] - synack[ ok ] - ack[ ok ]
                                        add[ ok ] - echo  [fail] got 0 ADD_ADDR echo[s] expected 1
Server ns stats
MPTcpExtMPCapableSYNRX          1                  0.0
MPTcpExtMPCapableACKRX          1                  0.0
MPTcpExtMPTCPRetrans            14                 0.0
MPTcpExtMPJoinSynRx             1                  0.0
MPTcpExtMPJoinAckRx             1                  0.0
MPTcpExtDuplicateData           16                 0.0
MPTcpExtRmSubflow               1                  0.0
Client ns stats
MPTcpExtMPTCPRetrans            16                 0.0
MPTcpExtMPJoinSynAckRx          1                  0.0
MPTcpExtDuplicateData           14                 0.0
MPTcpExtAddAddr                 1                  0.0
MPTcpExtRmAddr                  1                  0.0
                                        rm [ ok ] - sf    [ ok ]

while in the other trace, namely mp_join-01-ns1-0-M8kYtD.pcap, the test passed.
Comparing the 2 captures, it's clear that the "echoed" ADD_ADDR is received correctly only if the peer transmitted it in a "pure ack" when TCP is in the ESTABLISHED state. However, it might happen that TCP subflows have no chance to send a pure ACK until they receive a FIN packet, like it happens in mp_join-01-ns1-0-FZiNdH.pcap. In this case, all the MPTCP options in the packet are currently discarded (here) by Linux.

This patch proved to fix the tests:

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6254,6 +6265,9 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
                         */
                        inet_csk_reset_keepalive_timer(sk, tmo);
                } else {
+                       if (sk_is_mptcp(sk) && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)
+                               mptcp_incoming_options(sk, skb);
+
                        tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
                        goto discard;
                }

but I'm unsure if we are allowed to do this (e.g. parsing the MPTCP options after sending FIN, and skipping the test of URG bit).

@dcaratti
Copy link
Contributor Author

issue112.tar.gz

@dcaratti
Copy link
Contributor Author

dcaratti commented Nov 18, 2020

I noticed that this behavior becomes more evident on kernel where the TCP "ping pong threshold" is 1. e.g. they don't have this commit:

commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c
Author: Wei Wang <weiwan@google.com>
Date:   Fri Jan 25 10:53:20 2019 -0800

    tcp: change pingpong threshold to 3

the issue can also be fixed by temporarily exiting the "pingpong" mode, when the ADD_ADDR is received, e.g. with something like:

--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -911,6 +922,8 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
                if (!mp_opt.echo) {
                        mptcp_pm_add_addr_received(msk, &addr);
                        MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_ADDADDR);
+                      /* printk("silly-fix\n");*/
+                       inet_csk_exit_pingpong_mode(sk);
                } else {
                        mptcp_pm_del_add_timer(msk, &addr);
                        MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_ECHOADD)

@dcaratti
Copy link
Contributor Author

dcaratti commented Nov 18, 2020

A third option might be to ignore the problem in real-life scenarios, and just fix the tests in a way that subflows are created with TCP_QUICKACK socket option for kselftests where the value of MPTcpExtEchoAdd is required to pass the test (but this would require at least implementation of the setsockopt() in the kernel)

@dcaratti
Copy link
Contributor Author

I still see sporadic failures in the "flush", but I see them independently from whether I keep commit 4a41f45 reverted or not. Hence, the issue can be considered a dup of #139

dcaratti pushed a commit to dcaratti/mptcp_net-next that referenced this issue Sep 2, 2021
Fix a warning from checkpatch.pl --strict:

 CHECK: Prefer kernel type 'u16' over 'uint16_t'
 multipath-tcp#112: FILE: server.c:112:
 +       uint16_t command;

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
jenkins-tessares pushed a commit that referenced this issue Apr 9, 2022
The BPF STX/LDX instruction uses offset relative to the FP to address
stack space. Since the BPF_FP locates at the top of the frame, the offset
is usually a negative number. However, arm64 str/ldr immediate instruction
requires that offset be a positive number.  Therefore, this patch tries to
convert the offsets.

The method is to find the negative offset furthest from the FP firstly.
Then add it to the FP, calculate a bottom position, called FPB, and then
adjust the offsets in other STR/LDX instructions relative to FPB.

FPB is saved using the callee-saved register x27 of arm64 which is not
used yet.

Before adjusting the offset, the patch checks every instruction to ensure
that the FP does not change in run-time. If the FP may change, no offset
is adjusted.

For example, for the following bpftrace command:

  bpftrace -e 'kprobe:do_sys_open { printf("opening: %s\n", str(arg1)); }'

Without this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   mov     x25, sp
  1c:   mov     x26, #0x0                       // #0
  20:   bti     j
  24:   sub     sp, sp, #0x90
  28:   add     x19, x0, #0x0
  2c:   mov     x0, #0x0                        // #0
  30:   mov     x10, #0xffffffffffffff78        // #-136
  34:   str     x0, [x25, x10]
  38:   mov     x10, #0xffffffffffffff80        // #-128
  3c:   str     x0, [x25, x10]
  40:   mov     x10, #0xffffffffffffff88        // #-120
  44:   str     x0, [x25, x10]
  48:   mov     x10, #0xffffffffffffff90        // #-112
  4c:   str     x0, [x25, x10]
  50:   mov     x10, #0xffffffffffffff98        // #-104
  54:   str     x0, [x25, x10]
  58:   mov     x10, #0xffffffffffffffa0        // #-96
  5c:   str     x0, [x25, x10]
  60:   mov     x10, #0xffffffffffffffa8        // #-88
  64:   str     x0, [x25, x10]
  68:   mov     x10, #0xffffffffffffffb0        // #-80
  6c:   str     x0, [x25, x10]
  70:   mov     x10, #0xffffffffffffffb8        // #-72
  74:   str     x0, [x25, x10]
  78:   mov     x10, #0xffffffffffffffc0        // #-64
  7c:   str     x0, [x25, x10]
  80:   mov     x10, #0xffffffffffffffc8        // #-56
  84:   str     x0, [x25, x10]
  88:   mov     x10, #0xffffffffffffffd0        // #-48
  8c:   str     x0, [x25, x10]
  90:   mov     x10, #0xffffffffffffffd8        // #-40
  94:   str     x0, [x25, x10]
  98:   mov     x10, #0xffffffffffffffe0        // #-32
  9c:   str     x0, [x25, x10]
  a0:   mov     x10, #0xffffffffffffffe8        // #-24
  a4:   str     x0, [x25, x10]
  a8:   mov     x10, #0xfffffffffffffff0        // #-16
  ac:   str     x0, [x25, x10]
  b0:   mov     x10, #0xfffffffffffffff8        // #-8
  b4:   str     x0, [x25, x10]
  b8:   mov     x10, #0x8                       // #8
  bc:   ldr     x2, [x19, x10]
  [...]

With this patch, jited code(fragment):

   0:   bti     c
   4:   stp     x29, x30, [sp, #-16]!
   8:   mov     x29, sp
   c:   stp     x19, x20, [sp, #-16]!
  10:   stp     x21, x22, [sp, #-16]!
  14:   stp     x25, x26, [sp, #-16]!
  18:   stp     x27, x28, [sp, #-16]!
  1c:   mov     x25, sp
  20:   sub     x27, x25, #0x88
  24:   mov     x26, #0x0                       // #0
  28:   bti     j
  2c:   sub     sp, sp, #0x90
  30:   add     x19, x0, #0x0
  34:   mov     x0, #0x0                        // #0
  38:   str     x0, [x27]
  3c:   str     x0, [x27, #8]
  40:   str     x0, [x27, #16]
  44:   str     x0, [x27, #24]
  48:   str     x0, [x27, #32]
  4c:   str     x0, [x27, #40]
  50:   str     x0, [x27, #48]
  54:   str     x0, [x27, #56]
  58:   str     x0, [x27, #64]
  5c:   str     x0, [x27, #72]
  60:   str     x0, [x27, #80]
  64:   str     x0, [x27, #88]
  68:   str     x0, [x27, #96]
  6c:   str     x0, [x27, #104]
  70:   str     x0, [x27, #112]
  74:   str     x0, [x27, #120]
  78:   str     x0, [x27, #128]
  7c:   ldr     x2, [x19, #8]
  [...]

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220321152852.2334294-4-xukuohai@huawei.com
matttbe pushed a commit that referenced this issue Dec 1, 2023
With latest upstream llvm18, the following test cases failed:

  $ ./test_progs -j
  #13/2    bpf_cookie/multi_kprobe_link_api:FAIL
  #13/3    bpf_cookie/multi_kprobe_attach_api:FAIL
  #13      bpf_cookie:FAIL
  #77      fentry_fexit:FAIL
  #78/1    fentry_test/fentry:FAIL
  #78      fentry_test:FAIL
  #82/1    fexit_test/fexit:FAIL
  #82      fexit_test:FAIL
  #112/1   kprobe_multi_test/skel_api:FAIL
  #112/2   kprobe_multi_test/link_api_addrs:FAIL
  [...]
  #112     kprobe_multi_test:FAIL
  #356/17  test_global_funcs/global_func17:FAIL
  #356     test_global_funcs:FAIL

Further analysis shows llvm upstream patch [1] is responsible for the above
failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c,
without [1], the asm code is:

  0000000000000400 <bpf_fentry_test7>:
     400: f3 0f 1e fa                   endbr64
     404: e8 00 00 00 00                callq   0x409 <bpf_fentry_test7+0x9>
     409: 48 89 f8                      movq    %rdi, %rax
     40c: c3                            retq
     40d: 0f 1f 00                      nopl    (%rax)

... and with [1], the asm code is:

  0000000000005d20 <bpf_fentry_test7.specialized.1>:
    5d20: e8 00 00 00 00                callq   0x5d25 <bpf_fentry_test7.specialized.1+0x5>
    5d25: c3                            retq

... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7>
and this caused test failures for #13/#77 etc. except #356.

For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog
looks like:

  0000000000000000 <global_func17>:
       0:       b4 00 00 00 2a 00 00 00 w0 = 0x2a
       1:       95 00 00 00 00 00 00 00 exit

... which passed verification while the test itself expects a verification
failure.

Let us add 'barrier_var' style asm code in both places to prevent function
specialization which caused selftests failure.

  [1] llvm/llvm-project#72903

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants