Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIB Suppress Announcements of Routes Not Installed in HW #1103

Merged

Conversation

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
@stepanblyschak
Copy link
Contributor Author

FRR only(no SONiC involved) with a simple FPM application that sends reply right after route msg received.
The performance plot comparison:
image

@liat-grozovik
Copy link
Collaborator

@StormLiangMS could you please help to review and approve?

```abnf
; DEVICE_METADATA table
key = DEVICE_METADATA|localhost ; Device metadata configuration table
suppress-fib-pending = "enabled"/"disabled" ; Globally enable/disable routes announcement suppression feature,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this controls only for BGP routes, please be explicit in the field name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is implemented in fpmsyncd/zebra which does not care if it receives BGP or other protocol routes, so, if SONiC supports, the same feature could be applied to other routing protocols.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we talk only about BGP routes in the HLD, I thought it is better to have a name like "suppress-bgp-routes-fib-pending" and handle only BGP routes case. If you think, that's not the case and you'll enable support for all route types (BGP, Static, OSPF..etc), we can have the generic field name (suppress-fib-pending)


Example data:
```
127.0.0.1:6379[14]> hgetall "ROUTE_TABLE:193.5.96.0/25"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dont we have 'vrf' as the key in the ROUTE_TABLE? we could be supporting only 'default' VRF now, but it's important to have the VRF as the key for future enhancements with data VRF.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any plan for bulking the routes in the APPL STATE_DB ROUTE_TABLE?

Copy link
Contributor Author

@stepanblyschak stepanblyschak Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dont we have 'vrf' as the key in the ROUTE_TABLE? we could be supporting only 'default' VRF now, but it's important to have the VRF as the key for future enhancements with data VRF.

This example is for default VRF. The key in APPL_STATE_DB is the same as in APPL_DB, with VRF included (if not default), so VRF is supported.

Any plan for bulking the routes in the APPL STATE_DB ROUTE_TABLE?

The bulking is done by using redis pipeline for ResponsePublisher.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROUTE_TABLE in APPL_DB is legacy and we did not have VRF support originally and when we added the VRF support, we did not want to disrupt the existing schema and added the VRF key only for user VRF cases but APPL STATE_DB ROUTE_TABLE is a new schema, not sure why we still need to follow ROUTE_TABLE (APPL_DB) approach rather than having the VRF as the key by default even for 'default' VRF case as well. IMO. having VRF as one of the key by default is cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ROUTE_TABLE in APPL_DB is legacy? Is there a new table with a new schema?

Here's the latest master and ROUTE_TABLE entry in APPL_DB:

127.0.0.1:6379> keys "ROUTE_TABLE:193.8.224.0/25"
1) "ROUTE_TABLE:193.8.224.0/25"

If it were a user created VRF route the key would be in a format :.
The idea of APPL_STATE_DB is to reflect the same schema that APPL_DB has.

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of APPL_STATE_DB is to reflect the same schema that APPL_DB has.

APPL_DB:
------------
Default VRF:   ROUTE_TABLE:<prefix>
Non-default VRF: ROUTE_TABLE:<vrf>:<prefix>

IMO, having the below schema helps to cleanly get the VRF (e.g default) for every route instead of assuming if the VRF is not there in the key, the route is meant for default route, we don't need to follow APPL_DB ROUTE_TABLE schema for new tables but I leave it to you to decide on the final schema.
APPL_STATE_DB:
---------------------
All VRFs including default VRF: ROUTE_TABLE:<vrf>:<prefix>


### 3. Overview

As of today, SONiC BGP advertises learnt prefixes regardless of whether these prefixes were successfully programmed into ASIC. While route programming failure is followed by orchagent crash and all services restart, even for successfully created routes there is a short period of time when the peer will be black holing traffic. Also, in the following scenario, a credit loop occurs:
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple updates for the given route (there are NH changes) from fpmsyncd to route-orch, route-orch will send the response to every route changes after ASIC programming? or it will send the response to the latest route-update to fpmsyncd? please clarify.
Also, what happens in the case where we get the route-del from zebra before updating the RTM_NEWROUTE message with RTM_F_OFFLOAD flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orchagent will send a response for every route update it gets from APPL_DB. As far as I know there is a logic inside ProducerStateTable/ConsumerStateTable that will consolidate multiple SET/DEL operations for a single key which makes orchagent get the most up to date requested field values for a route entry. FRR, on the other hand, currently is only interested in the first route creation status. It does not care about the installation status after the route next hop or other attributes have changed.

If zebra sends RTM_DELROUTE before fpmsyncd sends RTM_NEWROUTE with RTM_F_OFFLOAD, zebra will ignore it, considering it as a stale update.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks for the clarification.

2. Withdraw 10 prefix to DUT through *exabgp* from T0 Arista VM
3. Once reached the required number of cycles the loop breaks after first step
4. Consistency check is applied, there are no withdrawn routes and announced routes were successfully installed and correctly marked as offloaded in zebra (In case a race condition happens or a notification is missed for some reason this test will try to catch it)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have any test case to simulate orchagent crash and delay the response to fpmsyncd from route-orch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case orchagent crash there will be no response from routeorch. Which tests case do you propose?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if fpmsyncd doesnot notify RTM_F_OFFLOAD to Zebra due to orchagent crash? Is there any timeout in Zebra to declare FIB programming failed? what would happen for those routes in Zebra that are waiting for the FIB programming after OrchAgent back up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no logic in zebra that considers a route beeing failed in HW if it did not receive RTM_F_OFFLOAD flag within some time interval. These routes will stay in queue-ed state.
Orchagent crash itself triggers BGP container(zebra + bgpd) restart. So in that case, zebra starts fresh with no routes.


Routing is a crucial part of a network switch. This feature adds additional flows in existing route processing pipeline and so along the way there might be unexpected failures leading to routes being not marked as offloaded in zebra - missed notification, race conditions leading to in-consistency, etc. It is required to monitor the consistency of routes state periodically and mitigate problems.

At the moment *route_check.py* verifies routes between APPL_DB & ASIC_DB and makes sure they are in sync. In addition to that it is required to extend this script to check for consistency APPL_DB, APPL_STATE_DB and zebra by calling ```show ip route json``` and verify every route installed in ASIC_DB is marked as offloaded. The script will retry the check for 3 times every 15 sec and if zebra FIB is not in sync with ASIC_DB a mitigation action is performed. The mitigation action publishes required notifications to trigger fpmsyncd flows to send RTM_NEWROUTE message to zebra and log alerts in the syslog.
Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be Static/OSPF/ISIS route updates apart from BGP routes as well, hope you'll allow offloaded/not-offloaded routes to be present in the DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify your comment? What do you mean by allowing offloaded/not-offloaded routes to be present in the DB?

Copy link
Collaborator

@venkatmahalingam venkatmahalingam Dec 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this HLD talks about adding "suppress-fib-pending" support only for BGP routes, what happens if we have other protocol routes (e.g OSPF) in the switch, what would be the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would assume OSPF to implement a similar logic. We dropped the "bgp" word in configuration flag because from our point of view there's nothing related to BGP in these new flows. It is on the level of zebra/fpmsyncd/orchagent and they do not care which protocol the route they are working with is originated.


During fast reboot BGP session is closed by SONiC device without the notification. BGP session is preserved in graceful restart mode. BGP routes on the peer are still active, because nexthop interfaces are up. Once interfaces go down, received BGP routes on the peer are removed from the routing table. Nothing is sent to SONiC device since then. After interfaces go up and BGP sessions re-establish the peer's BGP re-learns advertised routes.

Due to additional response publishing in orchagent there might be a slight delay in fast reboot reconfiguration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the fast boot time with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be less than 30 sec down time

@liat-grozovik
Copy link
Collaborator

@StormLiangMS @prsunny please provide your inputs. If no additional feedback please approve.

@liat-grozovik
Copy link
Collaborator

The feature and its current content is expected to be in 202305 and under discussion if to take into 202211 as originally aimed.

@prsunny
Copy link
Contributor

prsunny commented Feb 9, 2023

@venkatmahalingam , could you please sign off?

@prsunny
Copy link
Contributor

prsunny commented Feb 16, 2023

@stepanblyschak , can you please update the Sonic-mgmt PR to the description?

@prsunny
Copy link
Contributor

prsunny commented Feb 17, 2023

@abdosi , @arlakshm , @judyjoseph for viz

@liat-grozovik liat-grozovik merged commit ab996b4 into sonic-net:master Feb 23, 2023
@zhangyanzhao
Copy link
Collaborator

Two test PRs are still open. Need be merged @echuawu

@echuawu
Copy link

echuawu commented May 9, 2023

@zhangyanzhao , we are handling the comments from @StormLiangMS, and investigating the possiblity of implementing performance test in sonic-mgmt.

@echuawu
Copy link

echuawu commented May 24, 2023

For stress and performance test implementation, the implementation plan is under discussion, add a PR to track this requirement: sonic-net/sonic-mgmt#8409

@StormLiangMS
Copy link
Contributor

For stress and performance test implementation, the implementation plan is under discussion, add a PR to track this requirement: sonic-net/sonic-mgmt#8409

@echuawu could you put this to the description of this PR.
image

@echuawu
Copy link

echuawu commented May 25, 2023

@stepanblyschak , please help do that as @StormLiangMS suggested.

@stepanblyschak
Copy link
Contributor Author

@echuawu @StormLiangMS Added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Github Project Adoption Guideline and Sample Feature - PLEASE READ
9 participants