[action] [PR:15583] [arp_update]: Fix IPv6 neighbor race condition #15693
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I did it
A race condition exists which makes it possible for the kernel to resolve a trapped packet's destination IP at the same time that arp_update is running the
ip neigh replace
command for that neighbor IP. When this occurs, the kernel's neighbor entry for this IP is in the INCOMPLETE state, and theip neigh replace
command sets it to permanently incomplete. This means no netlink message will be generated for this neighbor, since the kernel doesn't generate netlink messages for INCOMPLETE neighbors (it would only generate a message once the neighbor transitions to FAILED, which doesn't happen due to theip neigh replace
command). As a result, no APPL_DB neighbor table entry is ever created and no tunnel route for the IP is ever installed, leading to dropped traffic.Work item tracking
How I did it
ping
ing the neighbor IPs, wait for any neighbor entries which might be transiently INCOMPLETE to transition to FAILED (so that the subsequentip neigh replace
command can set them to permanently incomplete)ip neigh replace
command in case they are new neighbors for which no netlink message has been generated yetHow to verify it
Run
arp_update
with no FAILED or INCOMPLETE neighbor entries, verify no changes are made to the kernel neighbor tableRun
arp_update
with a FAILED neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanentlyRun
arp_update
with an INCOMPLETE neighbor entry with corresponding APPL_DB entry, verify the neighbor IP is pinged and set to INCOMPLETE permanentlyRun
arp_update
with a FAILED neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanentlyRun
arp_update
with an INCOMPLETE neighbor entry without corresponding APPL_DB entry, verify the neighbor is flushed, pinged, and set to INCOMPLETE permanentlyRun
arp_update
with various combinations of above scenarios - verify that only neighbors missing APPL_DB entries are flushed from the kernel; verify that all FAILED/INCOMPLETE neighbors are pinged and set to permanently INCOMPLETEWhich release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)