-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential blackholing/looping traffic when link-local was used and refresh ipv6 neighbor to avoid CPU hit #1904
Conversation
…o avoid CPU hit In case ipv6 global addresses were configured on L3 interfaces and used for peering, and routing protocol was using link-local addresses on the same interfaces as prefered nexthops, the link-local addresses could be aged out after a while due to no activities towards the link-local addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert them to the HW, and thus cause looping or blackholing traffic. Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages. On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly. This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed, and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load. Also, if link-local addresses were not learned, we may not get them at all later. It is intended to fix all above issues. Changes: Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces Change the default ipv6 neighbor reachable timer to 30mins Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking for information, did you analyze using neighbor discovery (ndisc6 ?) or any configuration to send NS to achieve this functionality?
files/scripts/arp_update
Outdated
ping6cmd="ping6 -I $intf -n -q -i 0 -c 1 -W 0 ff02::1 >/dev/null" | ||
intf_up=$(ip link show $intf | grep "state UP") | ||
if [[ -n "$intf_up" ]]; then | ||
eval $ping6cmd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look space aligned. Can you check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
|
||
while /bin/true; do | ||
# find L3 interfaces which are UP, send ipv6 multicast pings | ||
echo "{% for (name, prefix) in INTERFACE %} {{name}} {% endfor %}" > /tmp/intf_tmp.j2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to handle PORTCHANNEL_INTERFACE as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Zhen. Can we concatenate INTERFACE and PC_INTERFACE and do one for-loop. It will avoid the duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Add PORTCHANNEL_INTERFACE interface for ipv6 multicast ping format issue
ndisc6 tool is not able to discover link-local addresses if you never learnt them for some reason (timing or network issue etc). |
# the neighbors state. | ||
# arp_update: | ||
# Send ipv6 multicast pings to all "UP" L3 interfaces including vlan interfaces to | ||
# refresh link-local addresses from neighbors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a concern if the vlan have many neighbors? one multicast ping will get reply from all neighbors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, by standard all neighbors should reply to ff02::1, so it will get reply from all neighbors.
Also verified it and it worked as expected.
@@ -10,6 +10,7 @@ RUN apt-get update | |||
|
|||
RUN apt-get install -f -y ifupdown arping libdbus-1-3 libdaemon0 libjansson4 | |||
|
|||
RUN apt-get install -f -y ndisc6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this into vs docker? we need to replicate same env in the vs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
does it make sense to eventually move this function into neighbor mgrd? which is our own version of neighbor manager. looks like we also have issue with linux kernel neighbor gc function. we cannot really trust it. at certain point, we need to develop our own neighbor manager. |
@lguohan , We would eventually need a neighbor mgr. Atleast I could think of the following cases that may need some user-space handling. We could prioritize based on use-case.
[Open] |
It would be good to have a session in community to discuss the things around "neighbor mgrd". Source of truth should be linux kernel IMO. Anyway, it probably makes sense to defer those implementation to later PR. |
retest this please |
``` ca728b8 [config] fix interface IPv6 address removal. (sonic-net#1819) 0665d6f VxLAN Tunnel Counters and Rates implementation (sonic-net#1748) 80a10dc Fix log_ssd_health hang issue (sonic-net#1904) ea4a730 [config][cbf] Added config commands for CBF (sonic-net#1799) 02ce8d6 [sonic-package-manager] update FEATURE entries on upgrade (sonic-net#1803) 9f123c0 [generate_dump] remove secrets from dump files (sonic-net#1886) 3a8ab73 [fwutil] Add `fwutil update all` to support the automatic platform component fw updates (sonic-net#1242) 776fddf [sonic-package-manager] code style fixes and enhancements (sonic-net#1802) f53baac [watermarkstat] Fix for error in processing empty array from couters db (sonic-net#1810) 0b2536b Generic_upater: Apply JSON change (sonic-net#1856) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
``` ca728b8 [config] fix interface IPv6 address removal. (#1819) 0665d6f VxLAN Tunnel Counters and Rates implementation (#1748) 80a10dc Fix log_ssd_health hang issue (#1904) ea4a730 [config][cbf] Added config commands for CBF (#1799) 02ce8d6 [sonic-package-manager] update FEATURE entries on upgrade (#1803) 9f123c0 [generate_dump] remove secrets from dump files (#1886) 3a8ab73 [fwutil] Add `fwutil update all` to support the automatic platform component fw updates (#1242) 776fddf [sonic-package-manager] code style fixes and enhancements (#1802) f53baac [watermarkstat] Fix for error in processing empty array from couters db (#1810) 0b2536b Generic_upater: Apply JSON change (#1856) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (sonic-net#1854) 8b2ec09a Fix log_ssd_health hang issue (sonic-net#1904) ac9c4254 Fix the option missing in kernel config issue (sonic-net#1888) 5cc9417a disk_check: Script updated to run good in 201811 & 201911 (sonic-net#1747) Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Submodule update for sonic-utilties ``` 48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (#1854) 8b2ec09a Fix log_ssd_health hang issue (#1904) ac9c4254 Fix the option missing in kernel config issue (#1888) 5cc9417a disk_check: Script updated to run good in 201811 & 201911 (#1747) ```
What I did Fix sonic-net#9114 The log_ssd_health command hangs due to timeout being used with docker exec -i which also affect warmboot flow. How I did it Added foreground option for timeout. This is recommended when not using the command on shell https://man7.org/linux/man-pages/man1/timeout.1.html How to verify it Run log_ssd_health and verify it does not hang Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Fix potential blackholing/looping traffic and refresh ipv6 neighbor to avoid CPU hit
In case ipv6 global addresses were configured on L3 interfaces and used for peering,
and routing protocol was using link-local addresses on the same interfaces as prefered nexthops,
the link-local addresses could be aged out after a while due to no activities towards the link-local
addresses themselves. And when we receive new routes with the link-local nexthops, SONiC won't insert
them to the HW, and thus cause looping or blackholing traffic.
Global ipv6 addresses on L3 interfaces between switches are refreshed by BGP keeplive and other messages.
On server facing side, traffic may hit fowarding plane only, and no refresh for the ipv6 neighbor entries regularly.
This could age-out the linux kernel ipv6 neighbor entries, and HW neighbor table entries could be removed,
and thus traffic going to those neighbors would hit CPU, and cause traffic drop and temperary CPU high load.
Also, if link-local addresses were not learned, we may not get them at all later.
It is intended to fix all above issues.
Changes:
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.
- What I did
Fix potential blackholing/looping traffic when ipv6 link-local was used, and refresh ipv6 neighbor to avoid CPU hit
- How I did it
Add ndisc6 package in swss docker and use it for ipv6 ndp ping to update the neighbors' state on Vlan interfaces
Change the default ipv6 neighbor reachable timer to 30mins
Add periodical ipv6 multicast ping to ff02::11 to get/refresh link-local neighbor info.
- How to verify it
-- ipv6 neighbors now stay at REACHABLE state for 30mins instead of 30seconds.
admin@lnos-x1-a-asw03:~$ clear arp
fe80::a83c:309a:5ca3:6593 dev Vlan100 lladdr 04:62:73:c4:eb:59 ref 1 used 2470/2470/39 probes 1 REACHABLE
2100::7 dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 95/9/56 probes 4 REACHABLE
fe80::2a6f:7fff:feba:1cff dev eth0 lladdr 28:6f:7f:ba:1c:ff router used 30606/27426/22 probes 1 STALE
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 85/85/5 probes 1 REACHABLE
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router ref 1 used 651/651/52 probes 1 REACHABLE
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff ref 1 used 857/0/852 probes 1 REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a ref 1 used 39523/56/56 probes 6 REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a ref 1 used 2475/2472/20 probes 1 REACHABLE
Round 1, deleting 8 entries
Flush is complete after 1 round
admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE
admin@lnos-x1-a-asw03:~$ docker exec -it swss bash -c "/usr/bin/arp_update"
admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a DELAY
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router DELAY
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE
admin@lnos-x1-a-asw03:~$ ip neighbor show | grep -v FAILED
fe80::662:73ff:fe8c:fd6a dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
fe80::2e0:ecff:fe3b:d6ac dev Ethernet122 lladdr 00:e0:ec:3b:d6:ac router REACHABLE
172.25.11.1 dev eth0 lladdr 28:6f:7f:ba:1c:ff REACHABLE
172.18.1.7 dev Vlan100 lladdr 04:62:73:8c:fd:6a REACHABLE
172.25.11.46 dev eth0 lladdr 00:e0:ec:3c:09:9a REACHABLE
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)