Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPTCP in multihoming doesn't announce all endpoints #331

Closed
vanyingenzi opened this issue Dec 20, 2022 · 17 comments
Closed

MPTCP in multihoming doesn't announce all endpoints #331

vanyingenzi opened this issue Dec 20, 2022 · 17 comments
Labels

Comments

@vanyingenzi
Copy link

vanyingenzi commented Dec 20, 2022

Hello,

I'm currently trying to study the behavior of mptcp servers on multihomed hosts using only IPv6.
So the setup is as follows :

Issues

setup

The execution on the server : ./iperf3 -s -p 80 -m
The execution on the client : ./iperf3 -c 2001:6a8:308f:9:0:82ff:fe68:e519 -p 80 -t 60 -m

  1. The server doesn't announce all its endpoints, it only announces one extra interface.
  2. When the client receives an announced subflow from the server, it doesn't create a fullmesh, however the limits set with ip mptcp allows the kernel path manager to do so. → maybe not an issue? If yes, a new ticket will be created. See below

This is the output of ip mptcp monitor when I run iperf3 with mptcpize (Also tried with the iperf3 implementation that support mptcp) and I never reach 4 subflows, at certain executions with the same configuration I get 3 subflows created.

>$ sudo ip mptcp monitor
[       CREATED] token=4349e85c remid=0 locid=0 saddr6=2001:6a8:308f:7:56e1:adff:fe69:1e34 daddr6=2001:6a8:308f:9:0:82ff:fe68:e519 sport=52312 dport=80
[   ESTABLISHED] token=4349e85c remid=0 locid=0 saddr6=2001:6a8:308f:7:56e1:adff:fe69:1e34 daddr6=2001:6a8:308f:9:0:82ff:fe68:e519 sport=52312 dport=80
[     ANNOUNCED] token=4349e85c remid=3 daddr6=2001:6a8:308f:10:0:83ff:fe00:2 dport=80
[SF_ESTABLISHED] token=4349e85c remid=3 locid=0 saddr6=2001:6a8:308f:7:56e1:adff:fe69:1e34 daddr6=2001:6a8:308f:10:0:83ff:fe00:2 sport=49039 dport=80 backup=0
[        CLOSED] token=4349e85c

Server Configuration

On the server, here's the output of certain commands giving more information about the setup:

>$ ip mptcp endpoint
2001:6a8:308f:9:0:82ff:fe68:e519 id 1 signal subflow dev eth0 
2001:6a8:308f:9:0:82ff:fe68:e55c id 2 signal subflow dev eth1 
2001:6a8:308f:10:0:83ff:fe00:2 id 3 signal subflow dev eth2 
fe80::82ff:fe68:e55c id 4 dev eth1
>$ ip mptcp limit show
add_addr_accepted 8 subflows 8
>$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:00:82:68:e5:19 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 130.104.229.25/25 brd 130.104.229.127 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2001:6a8:308f:9:0:82ff:fe68:e519/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::82ff:fe68:e519/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:00:82:68:e5:5c brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet6 2001:6a8:308f:9:0:82ff:fe68:e55c/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::82ff:fe68:e55c/64 scope link 
       valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 02:00:83:00:00:02 brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    altname ens5
    inet6 2001:6a8:308f:10:0:83ff:fe00:2/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::83ff:fe00:2/64 scope link 
       valid_lft forever preferred_lft forever
>$ hostnamectl 
   Static hostname: tfe-ingenzi-mbogne
         Icon name: computer-vm
           Chassis: vm
    Virtualization: kvm
  Operating System: Debian GNU/Linux 11 (bullseye)
            Kernel: Linux 6.0.0-2-amd64
      Architecture: x86-64

Client configuration

>$ ip mptcp endpoint
2001:6a8:308f:7:56e1:adff:fe69:1e34 id 1 signal subflow dev enp0s31f6 
2001:6a8:3081:6f1a:102e:16e2:ea46:bcff id 2 signal subflow dev wlp58s0
>$ ip mptcp limits show
add_addr_accepted 8 subflows 8
>$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 54:e1:ad:69:1e:34 brd ff:ff:ff:ff:ff:ff
    inet 130.104.78.197/27 brd 130.104.78.223 scope global dynamic noprefixroute enp0s31f6
       valid_lft 312sec preferred_lft 312sec
    inet6 2001:6a8:308f:7:56e1:adff:fe69:1e34/128 scope global dynamic noprefixroute 
       valid_lft 20712sec preferred_lft 17112sec
    inet6 fe80::753a:f216:c8fe:d0d/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: wlp58s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 34:f6:4b:03:b0:47 brd ff:ff:ff:ff:ff:ff
    inet 192.168.206.120/22 brd 192.168.207.255 scope global dynamic noprefixroute wlp58s0
       valid_lft 462sec preferred_lft 462sec
    inet6 2001:6a8:3081:6f1a:2fc8:d5fa:7d8:e730/64 scope global temporary dynamic 
       valid_lft 603916sec preferred_lft 84927sec
    inet6 2001:6a8:3081:6f1a:102e:16e2:ea46:bcff/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 2591921sec preferred_lft 604721sec
    inet6 fe80::fc91:6a14:e9c:d11a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:35:49:f0:b4 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
>$ hostnamectl 
 Static hostname: vany-ThinkPad-T470s
       Icon name: computer-laptop
         Chassis: laptop
Operating System: Ubuntu 22.04.1 LTS              
          Kernel: Linux 6.0.9-060009-generic
    Architecture: x86-64
 Hardware Vendor: Lenovo
  Hardware Model: ThinkPad T470s

I'm really open to disclose more information about the issue.
Thank you in advance.

@matttbe
Copy link
Member

matttbe commented Dec 20, 2022

Hello,

Thank you for the bug report!

The server doesn't announce all its endpoints, it only announces one extra interface.

Is it always the IP that is not in the same subnet that is being announced? (eth0 and 1 share the same subnet). (But still, it is strange, I didn't check the code but I don't see why we would add this restriction)

When the client receives an announced subflow from the server, it doesn't create a fullmesh, however the limits set with ip mptcp allows the kernel path manager to do so.

Sorry, I didn't get that. If you want to have a fullmesh topology, you need to add the fullmesh flag when creating the endpoints (make sure you have a recent version of iproute2, easy to compile if not).
For the second bit of your sentence, do you mean the limits are not working as expected?

@vanyingenzi
Copy link
Author

vanyingenzi commented Dec 20, 2022

Hello @matttbe, thank you for your time

Is it always the IP that is not in the same subnet that is being announced? (eth0 and 1 share the same subnet). (But still, it is strange, I didn't check the code but I don't see why we would add this restriction)

Indeed, it is always the same IP that is announced unless, I remove it from the endpoints then the other IP will be announced.

Sorry, I didn't get that. If you want to have a fullmesh topology, you need to add the fullmesh flag when creating the endpoints (make sure you have a recent version of iproute2, easy to compile if not).
For the second bit of your sentence, do you mean the limits are not working as expected?

If I'm not mistaken in order to configure the path manager. The user can set the manager limits using : ip mptcp limits set [ subflow SUBFLOW_NR ] [ add_addr_accepted ADD_ADDR_ACCEPTED_NR ] . Therefore, whenever the current number of subflows is below the number SUBFLOW_NR, the path manager can create the additional subflow. This is what I understood when I read Red Hat and man.

@matttbe
Copy link
Member

matttbe commented Dec 20, 2022

Indeed, it is always the same IP that is announced unless, I remove it from the endpoints then the other IP will be announced.

Mmh, strange. So only one is announced?
Just to be sure, are you using any net namespaces (containers, docker, etc.) for the server side? Just to be sure no other limits are used.

If I'm not mistaken in order to configure the path manager. The user can set the manager limits using : ip mptcp limits set [ subflow SUBFLOW_NR ] [ add_addr_accepted ADD_ADDR_ACCEPTED_NR ] . Therefore, whenever the current number of subflows is below the number SUBFLOW_NR, the path manager can create the additional subflow.

That's correct but I'm not sure to understand your issue. May you explain what you expect and what you get?
Note that by default, the PM tries to avoid using the same link twice, it is not doing a fullmesh except if you add the fullmesh flag in addition to the subflow one.

@vanyingenzi
Copy link
Author

Mmh, strange. So only one is announced?
Just to be sure, are you using any net namespaces (containers, docker, etc.) for the server side? Just to be sure no other limits are used.

No I didn't use namespaces.

That's correct but I'm not sure to understand your issue. May you explain what you expect and what you get?
Note that by default, the PM tries to avoid using the same link twice, it is not doing a fullmesh except if you add the fullmesh flag in addition to the subflow one.

Sorry for the confusion. So I want the path manager to create a full mesh, however I only observe two connections from the same interface. I'm going to try adding the fullmesh flag to iproute2, and I'll comment the results.

@matttbe
Copy link
Member

matttbe commented Dec 20, 2022

No I didn't use namespaces.

OK, to be investigated then.

From what I see in the code, it is likely possible we announce one IP and that's it. We should certainly loop there:

/* check first for announce */
if (msk->pm.add_addr_signaled < add_addr_signal_max) {
local = select_signal_address(pernet, msk);
/* due to racing events on both ends we can reach here while
* previous add address is still running: if we invoke now
* mptcp_pm_announce_addr(), that will fail and the
* corresponding id will be marked as used.
* Instead let the PM machinery reschedule us when the
* current address announce will be completed.
*/
if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
return;
if (local) {
if (mptcp_pm_alloc_anno_list(msk, local)) {
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
msk->pm.add_addr_signaled++;
mptcp_pm_announce_addr(msk, &local->addr, false);
mptcp_pm_nl_addr_send_ack(msk);
}
}
}

Could you build a kernel if we send a patch? Or do you want to try modifying the function here above by a while-loop (see the code just below this chunk).

Sorry for the confusion. So I want the path manager to create a full mesh, however I only observe two connections from the same interface. I'm going to try adding the fullmesh flag to iproute2, and I'll comment the results.

I suggest to use one issue per ticket: if you still have an issue with the fullmesh flag, please open a new one ;)

@matttbe matttbe changed the title MPTCP in multihoming doesn't announce all endpoints and doesn't create fullmesh MPTCP in multihoming doesn't announce all endpoints Dec 20, 2022
@vanyingenzi
Copy link
Author

Could you build a kernel if we send a patch? Or do you want to try modifying the function here above by a while-loop (see the code just below this chunk).

I prefer modifying the function. Please provide a step-by-step guideline ;), I don't want to end up messing with the kernel, I have learned enough from previous mistakes.

@matttbe
Copy link
Member

matttbe commented Dec 21, 2022

We discussed about this issue at our weekly meeting yesterday and there is a technical limitation that doesn't let us looping over all ADD_ADDR.

Currently, the behaviour is: sending an ADD_ADDR after each establishment of subflow. So when the connection is established, a first ADD_ADDR is sent, then a new one is sent when a new subflow is established, etc.
Would this behaviour be OK for you?

The current behaviour is probably not solid enough (e.g. if it is not possible to reach the first announced address) and the PM should probably try to ask sending more ADD_ADDR later, e.g. when the ADD_ADDR ECHO has been received.

@vanyingenzi
Copy link
Author

Currently, the behaviour is: sending an ADD_ADDR after each establishment of subflow. So when the connection is established, a first ADD_ADDR is sent, then a new one is sent when a new subflow is established, etc.
Would this behaviour be OK for you?

Is this the current behavior in the kernel or the patch that you were going to send me ? With this behavior, I should be able to receive all endpoints (eth1, eth2), so it's fine for me.

For the sake of the study, can I please get a brief description of the technical limitation. Thank you for the support once again.

@pabeni
Copy link

pabeni commented Dec 21, 2022

>$ ip mptcp endpoint
2001:6a8:308f:9:0:82ff:fe68:e519 id 1 signal subflow dev eth0 
2001:6a8:308f:9:0:82ff:fe68:e55c id 2 signal subflow dev eth1 
2001:6a8:308f:10:0:83ff:fe00:2 id 3 signal subflow dev 

The above configuration is incorrect. The endpoint should be either 'signal' or 'subflow'. For the intended scenario it must be 'signal'. The kernel PM is still misbehaving, since it's supposed to still announce both addresses.

I can reproduce the issue with a simplified setup, and dropping the bogus 'subflow' flag resolves it, e.g. the server announces all the configured addresses.

There is still a later issue, as sometimes HMAC checking fails on the additional subflow creation, still to be investigated.

@pabeni
Copy link

pabeni commented Dec 21, 2022

The above configuration is incorrect. The endpoint should be either 'signal' or 'subflow'. For the intended scenario it must be 'signal'. The kernel PM is still misbehaving, since it's supposed to still announce both addresses.

I can reproduce the issue with a simplified setup, and dropping the bogus 'subflow' flag resolves it, e.g. the server announces all the configured addresses.

There is still a later issue, as sometimes HMAC checking fails on the additional subflow creation, still to be investigated.

Both kernel-side problems (missing announced with bogus config, sporadic subflow creation failure with correct config) have the same root cause: HMAC check failure.

@matttbe
Copy link
Member

matttbe commented Dec 21, 2022

The above configuration is incorrect. The endpoint should be either 'signal' or 'subflow'.

Good catch, I didn't even noticed! The client is configured like that as well.

The kernel PM is still misbehaving, since it's supposed to still announce both addresses.

Indeed, we don't want to change iproute2 not to allow configuring both I suppose :)

Both kernel-side problems (missing announced with bogus config, sporadic subflow creation failure with correct config) have the same root cause: HMAC check failure.

Thank you for having checked!

@matttbe matttbe added bug and removed question labels Dec 21, 2022
@vanyingenzi
Copy link
Author

Hello @pabeni,

Indeed, by removing 'subflow' flag at the server I get all the endpoints, thank you very much. So at the client endpoints should be set as subflow (fullmesh)and at the server the endpoints should be flagged as signal ?

@matttbe
Copy link
Member

matttbe commented Dec 21, 2022

So at the client endpoints should be set as subflow (fullmesh)and at the server the endpoints should be flagged as signal ?

Yes, that was how it was designed and how it is tested: the client only set subflow (with/without fullmesh) while the server only set signal.

@pabeni
Copy link

pabeni commented Dec 21, 2022

The endpoint should be either 'signal' or 'subflow'. For the intended scenario it must be 'signal'. The kernel PM is still misbehaving, since it's supposed to still announce both addresses.

I have to take back the last part of the above sentence. The kernel PM is actually working as expected: signal one local addr and then try to create addtional subflows using the available local 'subflow' endpoint as source. Such additional subflows creation tries to connect to the peer (client) address and port [as the DENY_JOIN_ID0 flag is cleared at MPC handshake time] and is not successful, lacking a tcp listener on the (client) end. Still such attempt marks the relevant endpoint id ad used (the attempt is started, mptcp can't easily diagnose the failure) making it not available for later 'signal'.

We could make the scenario more easy to understand adding 'subflow creation attempt' MIBs and/or setting the DENY_JOIN_ID0 flag on the client side by default.

TL;DR: not a bug.

There is still a later issue, as sometimes HMAC checking fails on the additional subflow creation, still to be investigated.

The above was due a setup issue on my side: I used 'nc' in background. The process closed one end of the mptcp socket, moving it out of the established status. The HMAC fail counter is increased both on hmac failure and mptcp-state based failures.
We could make the scenario more easily to track adding more specific MIB counters.

TL;DR: not a bug even there.

@matttbe
Copy link
Member

matttbe commented Dec 21, 2022

@pabeni Thank you for this clarification!

We could make the scenario more easy to understand adding 'subflow creation attempt' MIBs and/or setting the DENY_JOIN_ID0 flag on the client side by default.

Indeed, good idea. We could also say that the in-kernel PM should not let the listener socket creating new subflows. For such particular need, people can use the userspace PM, no?

We could make the scenario more easily to track adding more specific MIB counters.

That remind me #203 :-)
Will this MIB be in TCP stack?

@pabeni
Copy link

pabeni commented Dec 21, 2022

We could make the scenario more easy to understand adding 'subflow creation attempt' MIBs and/or setting the DENY_JOIN_ID0 flag on the client side by default.

Indeed, good idea. We could also say that the in-kernel PM should not let the listener socket creating new subflows. For such particular need, people can use the userspace PM, no?

Copying here from IRC for future memory. In the general case we may want the server socket being able to create subflows towards the client, to cope with some protocol weirdness. And we should expose a more consistent deny_join_id0 to avoid possible interoperability issues.

We could make the scenario more easily to track adding more specific MIB counters.

That remind me #203 :-) Will this MIB be in TCP stack?

All the new MIBS will be accounted into the MPTCP code. this specific one should land here:

https://elixir.bootlin.com/linux/latest/source/net/mptcp/subflow.c#L701

we should increment [MPTCP_MIB_JOINACKMAC] only on hmac failures and increment different mibs on other tests failures.

@matttbe
Copy link
Member

matttbe commented Jan 11, 2023

@vanyingenzi I suggest to close this ticket now that 3 new ones have been created: #333 #334 #335. If I'm not mistaken, everything has been covered. If not, feel free to re-open this ticket or create a new one.

A quick summary of the situation:

@matttbe matttbe closed this as completed Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants