Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ipv4/ipv6 neighbor timers #2803

Closed
wants to merge 1 commit into from

Conversation

zhenggen-xu
Copy link
Collaborator

Optimize ipv4/ipv6 neighbor timers

SONiC currently has 30m reachable timer and 60s stale timer for
both ipv4 and ipv6.
In case the peer has updated the mac due to system migration or
replace of the device. It is possible that the neighbor entry
will be at REACHABLE state for long (~30mins), and the traffic is
blackholed. This is especially for L3 interface where we rely on
BGP etc protocol to check peers. The protocol won't prob the MAC
when the neighbor is at REACHABLE state with the old MAC.

The fix is to change the REACHABLE timer to 60s, and keep the stale
timer to 30mins. So when we have protocol running, it will refresh the
MAC when the neighbor is at STALE state.
We rely on arping/ndisc6 for VLAN interfaces neighbor MAC update since
there might not be protocol running.

Signed-off-by: Zhenggen Xu zxu@linkedin.com

- What I did
Optimize the ipv4/ipv6 REACHABLE timer and STALE timer.

- How I did it
Change the default settings of ipv4/ipv6 REACHABLE timer and STALE timer.

- How to verify it
Build image and new image has the updated timer value.

Before the fix, if the peer changed MAC address, we can see ipv6 pinging to the peer are blocked and no MAC updated for ~30mins (could be 45mins). Ipv4 seems to be ok since the peer usually sends GARP when it come up.

After fix, if the peer changed MAC address, ipv6 pinging traffic to peer address will be resumed in ~1min.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

SONiC currently has 30m reachable timer and 60s stale timer for
both ipv4 and ipv6.
In case the peer has updated the mac due to system migration or
replace of the device. It is possible that the neighbor entry
will be at REACHABLE state for long (~30mins), and the traffic is
blackholed. This is especially for L3 interface where we rely on
BGP etc protocol to check peers. The protocol won't prob the MAC
when the neighbor is at REACHABLE state with the old MAC.

The fix is to change the REACHABLE timer to 60s, and keep the stale
timer to 30mins. So when we have protocol running, it will refresh the
MAC when the neighbor is at STALE state.
We rely on arping/ndisc6 for VLAN interfaces neighbor MAC update since
there might not be protocol running.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
@pavel-shirshov
Copy link
Contributor

@zhenggen-xu The second line in your current PR was created by your another PR #1904
So what change should we prefer?

@zhenggen-xu
Copy link
Collaborator Author

@zhenggen-xu The second line in your current PR was created by your another PR #1904
So what change should we prefer?

The old PR #1904 was using the same value for ipv6 and ipv4, based on the old values. This PR will change both to the new values and they are preferred against the old PR.

@prsunny
Copy link
Contributor

prsunny commented Oct 31, 2019

IMO, 60s refresh interval seems to be too small. I think the traditional value appears to be 4hrs for non-Sonic OS'es. May be we should consider 5 minutes instead of 60s.

@zhenggen-xu
Copy link
Collaborator Author

IMO, 60s refresh interval seems to be too small. I think the traditional value appears to be 4hrs for non-Sonic OS'es. May be we should consider 5 minutes instead of 60s.

Could you be more specific about the use cases that require more than 60s REACHABLE timer? Please note that the STALE timer is 30mins so the entries won't get aged out before 30mins. Also, if you have any link about the traditional value settings, please share. Thanks!

@zhenggen-xu zhenggen-xu requested a review from lguohan as a code owner February 6, 2021 20:29
yxieca pushed a commit that referenced this pull request Jun 3, 2023
…lly (#15319)

src/sonic-swss

* c781521 - (HEAD -> 202205, origin/202205) [swss][orchagent] fix srt-bfd ut (#2803) (6 hours ago) [Baorong Liu]
mihirpat1 pushed a commit to mihirpat1/sonic-buildimage that referenced this pull request Jun 14, 2023
* fix srt-bfd ut error introduced by PR 2769
@zhenggen-xu zhenggen-xu deleted the neigh-timer branch May 24, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants