-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGP route recursive resolution not happening #13682
Labels
triage
Needs further investigation
Comments
Looks to be some timing/race-condition issue. Does not happen always. Is their any dump we can collect in such state that will help in debugging this ? |
yxieca
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this issue
Jun 7, 2023
What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this issue
Jun 7, 2023
What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld
pushed a commit
to mssonicbld/sonic-buildimage
that referenced
this issue
Jun 7, 2023
What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
mssonicbld
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this issue
Jun 10, 2023
What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
sonic-otn
pushed a commit
to sonic-otn/sonic-buildimage
that referenced
this issue
Sep 20, 2023
What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
This issue is stale because it has been open 180 days with no activity. Comment or remove the |
This issue will be automatically closed in the specified period unless there is further activity. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Issue Description:
FRR Version: 8.2.2 being used in SONiC: https://github.com/sonic-net/sonic-buildimage/tree/202205/src/sonic-frr
https://github.com/sonic-net/sonic-frr/tree/79188bf710e92acf42fb5b9b0a2e9593a5e
Topology Instance:
BGP0 (3.3.3.12) and BGP2 (3.3.3.13) instance (running on same Lince-Card different Linux namespace) have iBGP with BGP1 (3.3.3.6) instance (running on different Line-Card)
BGP1 instance has e-BGP peer 10.0.0.77 . Route Learned via e-BGPpeer is 100.1.0.39
On BGP0 instance 100.1.0.39 is learnt correctly recursive over 10.0.0.77 and then another recursive over 3.3.3.6
Working Case: BGP0 instance
Non Working Case:- BGP2 instance
The text was updated successfully, but these errors were encountered: