Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libteam: resynchronize ifinfo after lost RTNLGRP_LINK notifications
When there's a large number of interfaces (e.g. vlans), teamd loses link notifications as it cannot read them as fast as kernel is broadcasting them. This often prevents teamd starting properly if started concurrently when other links are being set up. It can also fail when it's up and running, especially in the cases where the team device itself has a lot of vlans under it. This can easily be reproduces by simple example (in SMP system) by manually adding team device with a bunch of vlans, putting it up, and starting teamd with --take-over option: root@debian:~# ip link add name team0 type team root@debian:~# for i in `seq 100 150` ; do > ip link add link team0 name team0.$i type vlan id $i ; done root@debian:~# ip link set team0 up root@debian:~# cat teamd.conf { "device": "team0", "runner": { "name": "activebackup" }, "ports": { "eth1": {}, "eth2": {} } } root@debian:~# teamd -o -N -f teamd.conf At this point, teamd will not give any error messages or other indication that something is wrong. But state will not look healthy: root@debian:~# teamdctl team0 state setup: runner: activebackup ports: eth1 link watches: link summary: up instance[link_watch_0]: name: ethtool link: up down count: 0 Failed to parse JSON port dump. command call failed (Invalid argument) If checking state dump, it will show that port eth2 is missing info. Running strace to teamd will reveal that there's one recvmsgs() that returned -1 with errno ENOBUFS. What happened in this example was that when teamd started, all vlans got carrier up, and kernel flooded notifications faster than teamd could read them. It then lost events related to port eth2 getting enslaved and up. The socket that joins RTNLGRP_LINK notifications uses default libnl 32k buffer size. Netlink messages are large (over 1k), and this buffer gets easily full. Kernel neither knows nor cares were notification broadcasts delivered. This cannot be fixed by simply increasing the buffer size, as there's no size that is guaranteed to work in every use case, and this can require several megabytes of buffer (a way over normal rmem_max limit) if there are hunderds of vlans. Only way to recover from this is to refresh all ifinfo list, as it's invalidated at this point. It cannot easily work around of this by just refreshing team device and its ports, because library side might not have ports linked due to events missed, and it doesn't know about teamd configuration. Checks now return value of nl_recvmsgs_default() for event socket. In case of ENOBUFS (which libnl nicely changes to ENOMEM), refreshes all ifinfo list. get_ifinfo_list() also checks now for removed interfaces in case of missed dellink event. Currently all TEAM_IFINFO_CHANGE handlers processed events one by one, so it had to be changed to support multiple ifinfo changes. For this, ifinfo changed flags are cleared and removed entries destroyed only after all handlers have been called. Also, increased nl_cli.sock_event receive buffers to 96k like all other sockets. Added possibility to change this via environment variable. Signed-off-by: Antti Tiainen <atiainen@forcepoint.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
- Loading branch information