Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

selftests: simult_flows: hit WARN_ON_ONCE(!mpext) #227

Closed
matttbe opened this issue Aug 23, 2021 · 10 comments
Closed

selftests: simult_flows: hit WARN_ON_ONCE(!mpext) #227

matttbe opened this issue Aug 23, 2021 · 10 comments
Assignees
Labels

Comments

@matttbe
Copy link
Member

matttbe commented Aug 23, 2021

Probably caused by 5d8855f ("mptcp: fix possible divide by zero", cc @pabeni ), my CI reported a WARN was emitted when running simult_flows with a non debug kernel:

00:14:38.347 # selftests: net/mptcp: simult_flows.sh
00:14:38.347 [  522.220675] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
00:14:38.347 [  522.250772] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
00:14:38.347 [  522.334127] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
00:14:38.348 # balanced bwidth                                   [  523.092794] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
00:14:38.348 4587 max 5005 [ OK ]
00:14:38.348 # balanced bwidth - reverse direction                 4593 max 5005 [ OK ]
00:14:38.348 # balanced bwidth with unbalanced delay             [  533.497217] ------------[ cut here ]------------
00:14:38.348 [  533.497948] WARNING: CPU: 1 PID: 12689 at net/mptcp/protocol.c:1362 mptcp_sendmsg_frag (net/mptcp/protocol.c:1362 (discriminator 1)) 
00:14:38.668 [  533.499334] Modules linked in: mptcp_diag inet_diag xt_bpf sch_netem mptcp_token_test mptcp_crypto_test kunit
00:14:38.668 [  533.500592] CPU: 1 PID: 12689 Comm: mptcp_connect Not tainted 5.14.0-rc6+ #1
00:14:38.668 [  533.501462] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
00:14:38.668 [  533.502515] RIP: 0010:mptcp_sendmsg_frag (net/mptcp/protocol.c:1362 (discriminator 1)) 
00:14:38.670 [ 533.503239] Code: 05 aa 0e e9 00 48 85 c0 75 33 41 b9 20 0a 00 00 e9 1e ff ff ff 4c 89 f7 4c 89 1c 24 e8 ce ed ff ff 4c 8b 1c 24 e9 5f fe ff ff <0f> 0b b8 ea ff ff ff e9 ba fc ff ff 0f 0b e9 b8 fe ff ff 4c 89 e7
00:14:38.689 All code
00:14:38.778 ========
00:14:38.778    0:	05 aa 0e e9 00       	add    $0xe90eaa,%eax
00:14:38.778    5:	48 85 c0             	test   %rax,%rax
00:14:38.778    8:	75 33                	jne    0x3d
00:14:38.778    a:	41 b9 20 0a 00 00    	mov    $0xa20,%r9d
00:14:38.778   10:	e9 1e ff ff ff       	jmpq   0xffffffffffffff33
00:14:38.778   15:	4c 89 f7             	mov    %r14,%rdi
00:14:38.778   18:	4c 89 1c 24          	mov    %r11,(%rsp)
00:14:38.778   1c:	e8 ce ed ff ff       	callq  0xffffffffffffedef
00:14:38.778   21:	4c 8b 1c 24          	mov    (%rsp),%r11
00:14:38.778   25:	e9 5f fe ff ff       	jmpq   0xfffffffffffffe89
00:14:38.778   2a:*	0f 0b                	ud2    		<-- trapping instruction
00:14:38.778   2c:	b8 ea ff ff ff       	mov    $0xffffffea,%eax
00:14:38.778   31:	e9 ba fc ff ff       	jmpq   0xfffffffffffffcf0
00:14:38.778   36:	0f 0b                	ud2    
00:14:38.778   38:	e9 b8 fe ff ff       	jmpq   0xfffffffffffffef5
00:14:38.778   3d:	4c 89 e7             	mov    %r12,%rdi
00:14:38.778 
00:14:38.778 Code starting with the faulting instruction
00:14:38.778 ===========================================
00:14:38.778    0:	0f 0b                	ud2    
00:14:38.778    2:	b8 ea ff ff ff       	mov    $0xffffffea,%eax
00:14:38.778    7:	e9 ba fc ff ff       	jmpq   0xfffffffffffffcc6
00:14:38.778    c:	0f 0b                	ud2    
00:14:38.778    e:	e9 b8 fe ff ff       	jmpq   0xfffffffffffffecb
00:14:38.778   13:	4c 89 e7             	mov    %r12,%rdi
00:14:38.778 [  533.505462] RSP: 0018:ffff9561016abc80 EFLAGS: 00010246
00:14:38.779 [  533.506177] RAX: ffff88830316d000 RBX: ffff9561016abcf4 RCX: 0000000000000001
00:14:38.780 [  533.507131] RDX: ffff88830484e000 RSI: ffff88830484e2c0 RDI: 0000000000001f78
00:14:38.780 [  533.508220] RBP: ffff88830a4aa300 R08: 0000000000000400 R09: 0000000000001f4b
00:14:38.780 [  533.509117] R10: ffff888302ffa388 R11: ffff88830a4aa468 R12: ffff88830a56ba00
00:14:38.780 [  533.510048] R13: ffff88830316d400 R14: ffff88830316d000 R15: 51013642878c5e54
00:14:38.780 [  533.510995] FS:  00007fd6e2b4f740(0000) GS:ffff88837dd00000(0000) knlGS:0000000000000000
00:14:38.780 [  533.512044] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
00:14:38.780 [  533.512799] CR2: 00007ffe44206b08 CR3: 0000000009688001 CR4: 0000000000370ee0
00:14:38.780 [  533.513650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
00:14:38.780 [  533.514491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
00:14:38.780 [  533.515390] Call Trace:
00:14:38.781 [  533.515722] ? poll_select_finish (fs/select.c:211) 
00:14:39.040 [  533.516292] __mptcp_push_pending (net/mptcp/protocol.c:1544) 
00:14:39.284 [  533.516882] mptcp_sendmsg (net/mptcp/protocol.c:1754) 
00:14:39.542 [  533.517425] sock_sendmsg (net/socket.c:704) 
00:14:39.810 [  533.517865] sock_write_iter (net/socket.c:1058) 
00:14:40.078 [  533.518328] ? key_free_user_ns (security/keys/keyring.c:215) 
00:14:40.333 [  533.518818] new_sync_write (./include/linux/fs.h:2114) 
00:14:40.588 [  533.519279] ? key_free_user_ns (security/keys/keyring.c:215) 
00:14:40.590 [  533.519762] vfs_write (fs/read_write.c:605) 
00:14:40.857 [  533.520184] ksys_write (fs/read_write.c:658) 
00:14:41.111 [  533.520593] do_syscall_64 (arch/x86/entry/common.c:50) 
00:14:41.358 [  533.521039] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) 
00:14:41.606 [  533.521645] RIP: 0033:0x7fd6e2c631e7
00:14:41.606 [ 533.522090] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
00:14:41.608 All code
00:14:41.630 ========
00:14:41.630    0:	64 89 02             	mov    %eax,%fs:(%rdx)
00:14:41.630    3:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
00:14:41.630    a:	eb bb                	jmp    0xffffffffffffffc7
00:14:41.630    c:	0f 1f 80 00 00 00 00 	nopl   0x0(%rax)
00:14:41.630   13:	f3 0f 1e fa          	endbr64 
00:14:41.630   17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
00:14:41.630   1e:	00 
00:14:41.630   1f:	85 c0                	test   %eax,%eax
00:14:41.630   21:	75 10                	jne    0x33
00:14:41.630   23:	b8 01 00 00 00       	mov    $0x1,%eax
00:14:41.630   28:	0f 05                	syscall 
00:14:41.630   2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
00:14:41.630   30:	77 51                	ja     0x83
00:14:41.630   32:	c3                   	retq   
00:14:41.630   33:	48 83 ec 28          	sub    $0x28,%rsp
00:14:41.630   37:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
00:14:41.630   3c:	48                   	rex.W
00:14:41.630   3d:	89                   	.byte 0x89
00:14:41.630   3e:	74 24                	je     0x64
00:14:41.630 
00:14:41.630 Code starting with the faulting instruction
00:14:41.630 ===========================================
00:14:41.630    0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
00:14:41.630    6:	77 51                	ja     0x59
00:14:41.630    8:	c3                   	retq   
00:14:41.630    9:	48 83 ec 28          	sub    $0x28,%rsp
00:14:41.630    d:	48 89 54 24 18       	mov    %rdx,0x18(%rsp)
00:14:41.630   12:	48                   	rex.W
00:14:41.630   13:	89                   	.byte 0x89
00:14:41.630   14:	74 24                	je     0x3a
00:14:41.630 [  533.524239] RSP: 002b:00007ffe4420ab28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
00:14:41.632 [  533.525219] RAX: ffffffffffffffda RBX: 0000000000000353 RCX: 00007fd6e2c631e7
00:14:41.632 [  533.526124] RDX: 0000000000002000 RSI: 00007ffe4420ab50 RDI: 0000000000000006
00:14:41.632 [  533.527028] RBP: 0000000000000006 R08: 0000000000000000 R09: 00007fd6e2d3d240
00:14:41.632 [  533.527953] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
00:14:41.632 [  533.528966] R13: 00007ffe4420ab48 R14: 00007ffe4420ab50 R15: 0000000000001f24
00:14:41.632 [  533.530141] ---[ end trace e9cc69bd168d1084 ]---
00:14:41.632 7117 max 5005  [ fail ]
00:14:41.632 # client exit code 0, server 0
00:14:41.632 # 
00:14:41.632 # netns ns3-0-DnIxa3 socket stat for 10003:
00:14:41.632 # State     Recv-Q Send-Q Local Address:Port  Peer Address:Port Process
00:14:41.632 # TIME-WAIT 0      0           10.0.3.3:10003     10.0.1.1:52744 timer:(timewait,59sec,0)
00:14:41.632 # 	
00:14:41.632 # 
00:14:41.632 # netns ns1-0-DnIxa3 socket stat for 10003:
00:14:41.633 # State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
00:14:41.633 # -rw------- 1 root root 8388608 Aug 23 07:44 /tmp/tmp.aPhA5Z8U4c
00:14:41.633 # -rw------- 1 root root 8388608 Aug 23 07:44 /tmp/tmp.lslH0IKyFn
00:14:41.633 # -rw------- 1 root root 81920 Aug 23 07:44 /tmp/tmp.8Y4PTIHcpC
00:14:41.633 # -rw------- 1 root root 81920 Aug 23 07:44 /tmp/tmp.a7AnyRnVqp
00:14:41.633 # balanced bwidth with unbalanced delay - reverse direction  4585 max 5005 [ OK ]
00:14:41.633 # unbalanced bwidth                                   2940 max 3245 [ OK ]
00:14:41.633 # unbalanced bwidth - reverse direction               2935 max 3245 [ OK ]
00:14:41.633 # unbalanced bwidth with unbalanced delay             2945 max 3245 [ OK ]
00:14:41.633 # unbalanced bwidth with unbalanced delay - reverse direction  2935 max 3245 [ OK ]
00:14:41.633 # unbalanced bwidth with opposed, unbalanced delay    2980 max 3245 [ OK ]
00:14:41.633 # unbalanced bwidth with opposed, unbalanced delay - reverse direction  2951 max 3245 [ OK ]
00:14:41.634 not ok 5 selftests: net/mptcp: simult_flows.sh # exit=1

We hit this WARN in mptcp_sendmsg_frag():

mpext = skb_ext_find(tail, SKB_EXT_MPTCP);
if (WARN_ON_ONCE(!mpext)) {
/* should never reach here, stream corrupted */
return -EINVAL;
}

@matttbe matttbe added the bug label Aug 23, 2021
@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

@pabeni
Copy link

pabeni commented Aug 23, 2021

yep, that is caused by the leak fix attempt.

Either:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index cd690794e22b..0341e89d9634 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1320,7 +1320,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
                }
        }
 
-       if (!can_collapse && !ssk->sk_tx_skb_cache &&
+       if (!ssk->sk_tx_skb_cache &&
            !mptcp_alloc_tx_skb(sk, ssk, info->data_lock_held))
                return 0;

or:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index cd690794e22b..736b49328c44 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1310,16 +1310,18 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
                 * SSN association set here
                 */
                mpext = skb_ext_find(skb, SKB_EXT_MPTCP);
-               can_collapse = (info->size_goal - skb->len > 0) &&
-                        mptcp_skb_can_collapse_to(data_seq, skb, mpext);
-               if (!can_collapse) {
+               if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
                        TCP_SKB_CB(skb)->eor = 1;
-               } else {
-                       size_bias = skb->len;
-                       avail_size = info->size_goal - skb->len;
+                       goto alloc_skb;
                }
+
+               can_collapse = (info->size_goal - skb->len > 0) &&
+                              (skb_shinfo(skb)->nr_frags <= sysctl_max_skb_frags);
+               if (can_collapse)
+                       avail_size = info->size_goal - skb->len;
        }
 
+alloc_skb:
        if (!can_collapse && !ssk->sk_tx_skb_cache &&
            !mptcp_alloc_tx_skb(sk, ssk, info->data_lock_held))
                return 0;

should address the issue. The first option is safer, but removes the allocation benefit brought by the culprit commit.
The second option is prone to at least a race: if sysctl_max_skb_frags is reduced between the newly introduced test and the following tcp_build_frag() call, the warn can still happen. But it also preserver the allocation benefit.

@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

@pabeni: Thank you for the very quick reply!

I can try to reproduce the issue locally and check if your patches address the issue.
I guess the second one is better. Worst case, we have a warn, right?

@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

@pabeni I have some issues with the second patch:

++ ./simult_flows.sh
[    4.598149] ip (166) used greatest stack depth: 12176 bytes left
[    4.944377] ip (177) used greatest stack depth: 12048 bytes left
[    5.406534] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[    5.489027] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[    5.716126] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[    5.946017] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[    6.060201] netem: version 1.3
# balanced bwidth                                   [    6.688510] ------------[ cut here ]------------
[    6.692008] WARNING: CPU: 0 PID: 225 at net/mptcp/protocol.c:1364 mptcp_sendmsg_frag (net/mptcp/protocol.c:1364 (discriminator 1))
[    6.698604] Modules linked in: sch_netem
[    6.701639] CPU: 0 PID: 225 Comm: mptcp_connect Not tainted 5.14.0-rc6+ #634
[    6.706859] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    6.713473] RIP: 0010:mptcp_sendmsg_frag (net/mptcp/protocol.c:1364 (discriminator 1))
[ 6.717167] Code: 00 e9 0f fc ff ff 48 8b 05 59 fd e8 00 48 85 c0 75 2b 41 b9 20 0a 00 00 e9 23 ff ff ff 4c 89 f7 e8 b1 ed ff ff e9 6b fe ff ff <0f> 0b b8 ea ff ff ff e9 29 fc ff ff 0f 0b e9 c6 fe ff ff 4c 89 ef
All code
========
   0:   00 e9                   add    %ch,%cl
   2:   0f fc ff                paddb  %mm7,%mm7
   5:   ff 48 8b                decl   -0x75(%rax)
   8:   05 59 fd e8 00          add    $0xe8fd59,%eax
   d:   48 85 c0                test   %rax,%rax
  10:   75 2b                   jne    0x3d
  12:   41 b9 20 0a 00 00       mov    $0xa20,%r9d
  18:   e9 23 ff ff ff          jmpq   0xffffffffffffff40
  1d:   4c 89 f7                mov    %r14,%rdi
  20:   e8 b1 ed ff ff          callq  0xffffffffffffedd6
  25:   e9 6b fe ff ff          jmpq   0xfffffffffffffe95
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   b8 ea ff ff ff          mov    $0xffffffea,%eax
  31:   e9 29 fc ff ff          jmpq   0xfffffffffffffc5f
  36:   0f 0b                   ud2
  38:   e9 c6 fe ff ff          jmpq   0xffffffffffffff03
  3d:   4c 89 ef                mov    %r13,%rdi
Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   b8 ea ff ff ff          mov    $0xffffffea,%eax
   7:   e9 29 fc ff ff          jmpq   0xfffffffffffffc35
   c:   0f 0b                   ud2
   e:   e9 c6 fe ff ff          jmpq   0xfffffffffffffed9
  13:   4c 89 ef                mov    %r13,%rdi
[    6.731723] RSP: 0018:ffffb3fb8020fc80 EFLAGS: 00010246
[    6.735655] RAX: ffff9d2441295000 RBX: ffffb3fb8020fcf4 RCX: 0000000000000001
[    6.740891] RDX: ffff9d2442dfb800 RSI: ffff9d2442dfbac0 RDI: 0000000000000028
[    6.746168] RBP: ffff9d2443469180 R08: 0000000000000400 R09: ffffb3fb8020fca8
[    6.751522] R10: 0000000000003ad8 R11: ffff9d2442f5c308 R12: ffff9d2441294e00
[    6.756737] R13: ffff9d2442ef8e80 R14: ffff9d2441295000 R15: 1efcac8b07b7d535
[    6.762058] FS:  00007f40228d2740(0000) GS:ffff9d24bdc00000(0000) knlGS:0000000000000000
[    6.768208] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.772825] CR2: 00007ffc1bd12438 CR3: 0000000002ffe000 CR4: 00000000003506f0
[    6.778131] Call Trace:
[    6.780119] __mptcp_push_pending (net/mptcp/protocol.c:1546)
[    6.783357] mptcp_sendmsg (net/mptcp/protocol.c:1756)
[    6.786226] sock_sendmsg (net/socket.c:704)
[    6.788838] sock_write_iter (net/socket.c:1058)
[    6.791745] ? key_free_user_ns (security/keys/keyring.c:215)
[    6.795214] new_sync_write (include/linux/fs.h:2114)
[    6.798621] ? key_free_user_ns (security/keys/keyring.c:215)
[    6.801590] vfs_write (fs/read_write.c:605)
[    6.804137] ksys_write (fs/read_write.c:658)
[    6.806638] do_syscall_64 (arch/x86/entry/common.c:50)
[    6.809327] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
[    6.812977] RIP: 0033:0x7f40229e61e7
[ 6.815632] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
All code
========
   0:   64 89 02                mov    %eax,%fs:(%rdx)
   3:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
   a:   eb bb                   jmp    0xffffffffffffffc7
   c:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  13:   f3 0f 1e fa             endbr64
  17:   64 8b 04 25 18 00 00    mov    %fs:0x18,%eax
  1e:   00
  1f:   85 c0                   test   %eax,%eax
  21:   75 10                   jne    0x33
  23:   b8 01 00 00 00          mov    $0x1,%eax
  28:   0f 05                   syscall
  2a:*  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
  30:   77 51                   ja     0x83
  32:   c3                      retq
  33:   48 83 ec 28             sub    $0x28,%rsp
  37:   48 89 54 24 18          mov    %rdx,0x18(%rsp)
  3c:   48                      rex.W
  3d:   89                      .byte 0x89
  3e:   74 24                   je     0x64
Code starting with the faulting instruction
===========================================
   0:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
   6:   77 51                   ja     0x59
   8:   c3                      retq
   9:   48 83 ec 28             sub    $0x28,%rsp
   d:   48 89 54 24 18          mov    %rdx,0x18(%rsp)
  12:   48                      rex.W
  13:   89                      .byte 0x89
  14:   74 24                   je     0x3a
[    6.829616] RSP: 002b:00007ffc1bd16458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[    6.835162] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007f40229e61e7
[    6.840623] RDX: 0000000000002000 RSI: 00007ffc1bd16480 RDI: 0000000000000004
[    6.846281] RBP: 0000000000000004 R08: 0000000000000000 R09: 00007f4022ac0240
[    6.851623] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
[    6.856584] R13: 00007ffc1bd16478 R14: 00007ffc1bd16480 R15: 0000000000000064
[    6.861397] ---[ end trace 96d207c56867b87f ]---
copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30909 max 5005  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10001:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10001:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32560 Aug 23 09:13 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 32704 Aug 23 09:13 /tmp/tmp.N0Gp3r1OV1
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# balanced bwidth - reverse direction               copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
#  30347 max 5005  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10002:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10002:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 32656 Aug 23 09:14 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32560 Aug 23 09:14 /tmp/tmp.N0Gp3r1OV1
# balanced bwidth with unbalanced delay             copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30702 max 5005  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10003:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10003:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32704 Aug 23 09:14 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 32704 Aug 23 09:14 /tmp/tmp.N0Gp3r1OV1
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# balanced bwidth with unbalanced delay - reverse directioncopyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30733 max 5005  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10004:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10004:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 32664 Aug 23 09:15 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32560 Aug 23 09:15 /tmp/tmp.N0Gp3r1OV1
# unbalanced bwidth                                 copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30693 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10005:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10005:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32560 Aug 23 09:15 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 32661 Aug 23 09:15 /tmp/tmp.N0Gp3r1OV1
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# unbalanced bwidth - reverse direction             copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30681 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10006:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10006:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 32704 Aug 23 09:16 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32704 Aug 23 09:16 /tmp/tmp.N0Gp3r1OV1
# unbalanced bwidth with unbalanced delay           copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30717 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10007:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10007:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32571 Aug 23 09:16 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 32656 Aug 23 09:16 /tmp/tmp.N0Gp3r1OV1
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# unbalanced bwidth with unbalanced delay - reverse directioncopyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30708 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10008:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10008:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 26990 Aug 23 09:17 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32704 Aug 23 09:17 /tmp/tmp.N0Gp3r1OV1
# unbalanced bwidth with opposed, unbalanced delay  copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
#  30330 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10009:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10009:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32656 Aug 23 09:18 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 24576 Aug 23 09:18 /tmp/tmp.N0Gp3r1OV1
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# unbalanced bwidth with opposed, unbalanced delay - reverse directioncopyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 4)
# copyfd_io_poll: poll timed out (events: POLLIN 1, POLLOUT 0)
#  30714 max 3245  [ fail ]
# client exit code 2, server 2
#
# netns ns3-0-OCmCyI socket stat for 10010:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-OCmCyI socket stat for 10010:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
# -rw------- 1 root root 32704 Aug 23 09:18 /tmp/tmp.MovwP2WfND
# -rw------- 1 root root 81920 Aug 23 09:13 /tmp/tmp.QmnwU0CaUI
# -rw------- 1 root root 8388608 Aug 23 09:13 /tmp/tmp.LNa64J5KOi
# -rw------- 1 root root   32704 Aug 23 09:18 /tmp/tmp.N0Gp3r1OV1
++ rc=1

@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

Please note that the tests are running in a loop for a bit of time on my side with the first patch without issue.

@pabeni
Copy link

pabeni commented Aug 23, 2021

uhm... the 2nd patch was a bit rough. It needs a little more changes. This version:

--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1290,7 +1290,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	bool zero_window_probe = false;
 	struct mptcp_ext *mpext = NULL;
 	struct sk_buff *skb, *tail;
-	bool can_collapse = false;
+	bool must_collapse = false;
 	int size_bias = 0;
 	int avail_size;
 	size_t ret = 0;
@@ -1310,17 +1310,21 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 		 * SSN association set here
 		 */
 		mpext = skb_ext_find(skb, SKB_EXT_MPTCP);
-		can_collapse = (info->size_goal - skb->len > 0) &&
-			 mptcp_skb_can_collapse_to(data_seq, skb, mpext);
-		if (!can_collapse) {
+		if (!mptcp_skb_can_collapse_to(data_seq, skb, mpext)) {
 			TCP_SKB_CB(skb)->eor = 1;
-		} else {
+			goto alloc_skb;
+		}
+
+		must_collapse = (info->size_goal - skb->len > 0) &&
+				(skb_shinfo(skb)->nr_frags < sysctl_max_skb_frags);
+		if (must_collapse) {
 			size_bias = skb->len;
 			avail_size = info->size_goal - skb->len;
 		}
 	}
 
-	if (!can_collapse && !ssk->sk_tx_skb_cache &&
+alloc_skb:
+	if (!must_collapse && !ssk->sk_tx_skb_cache &&
 	    !mptcp_alloc_tx_skb(sk, ssk, info->data_lock_held))
 		return 0;
 
@@ -1353,7 +1357,6 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk,
 	if (skb == tail) {
 		TCP_SKB_CB(tail)->tcp_flags &= ~TCPHDR_PSH;
 		mpext->data_len += ret;
-		WARN_ON_ONCE(!can_collapse);
 		WARN_ON_ONCE(zero_window_probe);
 		goto out;
 	}

should be better - at least does not splat here after a few iterations.

@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

@pabeni: thank you for this new version. It looks better indeed!

After 18 attempts, one test failed but I guess I was just unlucky:

++ ./simult_flows.sh
[  721.944124] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth1: link becomes ready
[  722.023847] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[  722.242131] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth1: link becomes ready
[  722.247296] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
# balanced bwidth                                   [  722.672761] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[  722.677886] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth2: link becomes ready
  4634 max 5005 [ OK ]
# balanced bwidth - reverse direction                 4624 max 5005 [ OK ]
# balanced bwidth with unbalanced delay               4621 max 5005 [ OK ]
# balanced bwidth with unbalanced delay - reverse direction  4624 max 5005 [ OK ]
# unbalanced bwidth                                   3036 max 3245 [ OK ]
# unbalanced bwidth - reverse direction               3028 max 3245 [ OK ]
# unbalanced bwidth with unbalanced delay             2998 max 3245 [ OK ]
# unbalanced bwidth with unbalanced delay - reverse direction  3247 max 3245  [ fail ]
# client exit code 0, server 0
#
# netns ns3-0-SHDeJI socket stat for 10008:
# State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
#
# netns ns1-0-SHDeJI socket stat for 10008:
# State     Recv-Q Send-Q    Local Address:Port  Peer Address:Port Process
# TIME-WAIT 0      0      10.0.2.1%ns1eth2:36117     10.0.3.3:10008 timer:(timewait,59sec,0)
#
# TIME-WAIT 0      0              10.0.1.1:35816     10.0.3.3:10008 timer:(timewait,59sec,0)
#
# -rw------- 1 root root 81920 Aug 23 15:57 /tmp/tmp.ZCqNTK2Ag0
# -rw------- 1 root root 81920 Aug 23 15:56 /tmp/tmp.ivNt1haoqy
# -rw------- 1 root root 8388608 Aug 23 15:56 /tmp/tmp.6dWBwJSljU
# -rw------- 1 root root 8388608 Aug 23 15:57 /tmp/tmp.aoelqmRH88
# unbalanced bwidth with opposed, unbalanced delay    3028 max 3245 [ OK ]
# unbalanced bwidth with opposed, unbalanced delay - reverse direction  2915 max 3245 [ OK ]

@matttbe
Copy link
Member Author

matttbe commented Aug 23, 2021

I did 85 more iterations without issues so I guess I was just unlucky here.

@pabeni
Copy link

pabeni commented Aug 24, 2021

@pabeni: thank you for this new version. It looks better indeed!

> After 18 attempts, one test failed but I guess I was just unlucky:
[...]
> # unbalanced bwidth with unbalanced delay - reverse direction  3247 max 3245  [ fail ]
> # client exit code 0, server 0
> #
> # netns ns3-0-SHDeJI socket stat for 10008:
> # State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
> #
> # netns ns1-0-SHDeJI socket stat for 10008:
> # State     Recv-Q Send-Q    Local Address:Port  Peer Address:Port Process
> # TIME-WAIT 0      0      10.0.2.1%ns1eth2:36117     10.0.3.3:10008 timer:(timewait,59sec,0)
> #
> # TIME-WAIT 0      0              10.0.1.1:35816     10.0.3.3:10008 timer:(timewait,59sec,0)
> #
> # -rw------- 1 root root 81920 Aug 23 15:57 /tmp/tmp.ZCqNTK2Ag0
> # -rw------- 1 root root 81920 Aug 23 15:56 /tmp/tmp.ivNt1haoqy
> # -rw------- 1 root root 8388608 Aug 23 15:56 /tmp/tmp.6dWBwJSljU
> # -rw------- 1 root root 8388608 Aug 23 15:57 /tmp/tmp.aoelqmRH88

This is really an instance of issue #137, and a quite unlucky one, since we are off by only 2ms ;)

@matttbe
Copy link
Member Author

matttbe commented Aug 24, 2021

Fixed thanks to Paolo's patches:

  • 56344a3: "squashed" in "mptcp: fix possible divide by zero"
  • fa7fdd3: tg:msg: update commit message after squash-to patch
  • Results: e4be381..d9f4bc9

Builds and tests have ran:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20210824T100358
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export/20210824T100358

@matttbe matttbe closed this as completed Aug 24, 2021
jenkins-tessares pushed a commit that referenced this issue Jul 20, 2023
Add a big batch of test coverage to assert all aspects of the tcx link API:

  # ./vmtest.sh -- ./test_progs -t tc_links
  [...]
  #225     tc_links_after:OK
  #226     tc_links_append:OK
  #227     tc_links_basic:OK
  #228     tc_links_before:OK
  #229     tc_links_chain_classic:OK
  #230     tc_links_dev_cleanup:OK
  #231     tc_links_invalid:OK
  #232     tc_links_prepend:OK
  #233     tc_links_replace:OK
  #234     tc_links_revision:OK
  Summary: 10/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20230719140858.13224-9-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants