From a14e3af8d9cd2fcb220b6d019471cdffdb2ea2ed Mon Sep 17 00:00:00 2001 From: Kanji Nakano Date: Thu, 13 Jul 2023 15:43:49 +0000 Subject: [PATCH] init fpmsyncd HLD --- doc/fpmsyncd/diagram/fig.drawio | 313 +++++++++++++++++++++++++++ doc/fpmsyncd/diagram/fig1.svg | 1 + doc/fpmsyncd/diagram/fig2.svg | 1 + doc/fpmsyncd/diagram/fig3.svg | 4 + doc/fpmsyncd/hld_fpmsyncd-NTT.md | 348 +++++++++++++++++++++++++++++++ 5 files changed, 667 insertions(+) create mode 100644 doc/fpmsyncd/diagram/fig.drawio create mode 100644 doc/fpmsyncd/diagram/fig1.svg create mode 100644 doc/fpmsyncd/diagram/fig2.svg create mode 100644 doc/fpmsyncd/diagram/fig3.svg create mode 100644 doc/fpmsyncd/hld_fpmsyncd-NTT.md diff --git a/doc/fpmsyncd/diagram/fig.drawio b/doc/fpmsyncd/diagram/fig.drawio new file mode 100644 index 00000000000..b7393d204ad --- /dev/null +++ b/doc/fpmsyncd/diagram/fig.drawioo newline at end of file diff --git a/doc/fpmsyncd/diagram/fig1.svg b/doc/fpmsyncd/diagram/fig1.svg new file mode 100644 index 00000000000..19074d2f940 --- /dev/null +++ b/doc/fpmsyncd/diagram/fig1.svg @@ -0,0 +1 @@ +
Zebra
Zebra
fpmsyncd
fpmsyncd
redis-server
redis-server
add route
add route
RTM_NEWROUTE
RTM_NEWROUTE
HSET ROUTE_TABLE:PREFIX
HSET ROUTE_TABLE:PREFIX
del route
del route
RTM_DELROUTE
RTM_DELROUTE
HDEL ROUTE_TABLE:PREFIX
HDEL ROUTE_TABLE:PREFIX
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/fpmsyncd/diagram/fig2.svg b/doc/fpmsyncd/diagram/fig2.svg new file mode 100644 index 00000000000..e25ce00e93f --- /dev/null +++ b/doc/fpmsyncd/diagram/fig2.svg @@ -0,0 +1 @@ +
Zebra
Zebra
fpmsyncd
fpmsyncd
redis-server
redis-server
add route
add route
RTM_NEWNEXTHOP
RTM_NEWNEXTHOP
HSET NEXT_HOP_GROUP_TABLE:PREFIX
HSET NEXT_HOP_GROUP_TABLE:PREFIX
RTM_NEWNEXTHOP
RTM_NEWNEXTHOP
RTM_NEWNEXTHOP
RTM_NEWNEXTHOP
HSET ROUTE_TABLE:PREFIX
HSET ROUTE_TABLE:PREFIX
RTM_NEWNEXTHOP
RTM_NEWNEXTHOP
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/fpmsyncd/diagram/fig3.svg b/doc/fpmsyncd/diagram/fig3.svg new file mode 100644 index 00000000000..e40664e4a4b --- /dev/null +++ b/doc/fpmsyncd/diagram/fig3.svg @@ -0,0 +1,4 @@ + + + +
NEXT_HOP
NEXT_HOP
NEXT_HOP_GROUP_MEMBER
NEXT_HOP_GROUP_MEMBER
Key: oid: 6
Value: {
  NEXT_HOP_GROUP_ID: oid: 5
  NEXT_HOP_ID:
3
}
Key: oid: 6...
NEXT_HOP_GROUP_MEMBER
NEXT_HOP_GROUP_MEMBER
Key: oid: 7
Value: {
  NEXT_HOP_GROUP_ID: oid: 5
  NEXT_HOP_ID:
4
}
Key: oid: 7...
NEXT_HOP
NEXT_HOP
Key: oid: 3
Value: {
  NEXT_HOP_ATTR_IP: 10.0.1.5
  NEXT_HOP_ATTR_ROUTER_INTERFACE_ID:  
1
Key: oid: 3...
Key: oid: 4
Value: {
  NEXT_HOP_ATTR_IP: 10.0.1.5
  NEXT_HOP_ATTR_ROUTER_INTERFACE_ID:  
1
Key: oid: 4...
ROUTE_ENTRY
ROUTE_ENTRY
Key: {
  dest:
8.8.8.0/24
}
Value: {
  NEXT_HOP_ID:
5
}
Key: {...
NEXT_HOP_GROUP
NEXT_HOP_GROUP
Key: oid: 5

Key: oid: 5...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/fpmsyncd/hld_fpmsyncd-NTT.md b/doc/fpmsyncd/hld_fpmsyncd-NTT.md new file mode 100644 index 00000000000..14666ebe826 --- /dev/null +++ b/doc/fpmsyncd/hld_fpmsyncd-NTT.md @@ -0,0 +1,348 @@ +# `fpmsyncd` NextHop Group Enhancement High Level Design Document + + +## Table of Content +- [Revision](#revision) +- [Scope](#scope) +- [Overview](#overview) +- [Requirements](#requirements) +- [Architecture Design](#architecture-design) + - [Source Code change (TODO: move to proper place)](#source-code-change-todo-move-to-proper-place) +- [High-Level Design](#high-level-design) + - [Current fpmsyncd processing flow (for reference)](#current-fpmsyncd-processing-flow-for-reference) + - [Proposed fpmsyncd processing flow using NextHop Group](#proposed-fpmsyncd-processing-flow-using-nexthop-group) + - [Value SET/DEL to APPL\_DB](#value-setdel-to-appl_db) + - [Example of entries in APPL\_DB](#example-of-entries-in-appl_db) + - [Example of entries in ASIC\_DB](#example-of-entries-in-asic_db) +- [SAI API](#sai-api) +- [Configuration and management](#configuration-and-management) + - [Manifest (if the feature is an Application Extension)](#manifest-if-the-feature-is-an-application-extension) + - [CLI/YANG model Enhancements](#cliyang-model-enhancements) + - [Config DB Enhancements](#config-db-enhancements) +- [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact) +- [Restrictions/Limitations](#restrictionslimitations) +- [Testing Requirements/Design](#testing-requirementsdesign) + - [Unit Test cases](#unit-test-cases) + - [System Test cases](#system-test-cases) +- [Open/Action items - if any](#openaction-items---if-any) + - [Backward compatibility with Fine-grain NHG, Ordered NHG/ECMP](#backward-compatibility-with-fine-grain-nhg-ordered-nhgecmp) + +### Revision + +| Rev | Date | Author | Change Description | +| :---: | :---: | :---------------------------------: | ------------------ | +| 0.1 | TBD | Kanji Nakano, Kentaro Ebisawa (NTT) | Initial version | + +### Scope + +This document details the design and implementation of the "fpmsyncd extension" related to NextHop Group behavior in SONiC. +The goal of this "fpmsyncd extension" is to integrate NextHop Group functionality into SONiC by writing NextHop Group entry from `fpmsyncd` to `APPL_DB` for NextHop Group operation in SONiC. + +### Overview + +SONIC system has support for programming routes using the NextHop Group feature through the NextHop Group table in `APPL_DB` database. +The idea is to have a more efficient system that would involve managing the NextHop Group in use by the route table separately, and simply have the route table specify a reference to which NextHop Group to use. +Since at scale many routes will use the same NextHop Groups, this requires much smaller occupancy per route, and so more efficient building, transmission and parsing of per-route information. + +The current version of `fpmsyncd` has no support to handle the NextHop Group netlink messages sent by zebra process when it uses the new `dplane_fpm_nl` module. +This implementation modifies the `fpmsyncd` code to handle `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` events and write it to the database. +Also, the `fpmsyncd` was modified to use the NextHop Group ID (`nexthop_group`) when programming the route to the `ROUTE_TABLE`. + +These capabilities consist in: +- `fpmsyncd` is responsible for enabling the SET/DEL setting of `NEXTHOP_GROUP_TABLE` for `APPL_DB` in Redis DB. + +### Requirements + +`Fpmsyncd extension` requires: +- `fpmsyncd` to handle `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` events from zebra via `dplane_fpm_nl` +- `fpmsyncd` to SET/DEL routes to `APPL_DB: ROUTE_TABLE`` using `nexthop_group` +- `fpmsyncd` to SET/DEL NextHop Group entry to `APPL_DB: NEXTHOP_GROUP_TABLE` + +### Architecture Design + + +This design directly modifies `fpmsyncd` to use the new `APPL_DB` tables. + +The current `fpmsyncd` handle just the `RTM_NEWROUTE` and `RTM_DELROUTE` writing all route information for each route prefix to `ROUTE_TABLE` on Redis DB (`redis-server`). +When zebra process is initialized using the old fpm module, the `RTM_NEWROUTE` is sent with at least destination address, gateway, and interface id attributes. +For multipath route, the `RTM_NEWROUTE` is sent with a list of gateways and interfaces id. + +This `Fpmsyncd extension` will modify `fpmsyncd` to handle `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` as below. + +> TODO: Add diagram with flow described in Overview and Requirements + +To implement this, below SONiC subsystems will be changed. + +- sonic-buildimage + - modify `sonic-cfggen` to set `fpm use-nexthop-groups` + - patch `src/libnl3` to support `nh_id` (TODO: still required in latest master?) + - modify `/zebra/rt_netlink.c` to use `vrf_id` for vrf, not `table_id` +- fpmsyncd (swss) + - add default VRF in `/cfgmgr/vrfmgr.cpp` + - add `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` support in `/fpmsyncd/fpmlink.cpp` + - add NextHop Group support in `/fpmsyncd/routesync.cpp` and `/fpmsyncd/routesync.h` +- frr + - change plugin from `fpm` to `dplane_fpm_nl` in `/dockers/docker-fpm-frr/frr/supervisord/supervisord.conf.j2` + - this is already done in the latest master branch with [PR#12852](https://github.com/sonic-net/sonic-buildimage/pull/12852) + +#### Source Code change (TODO: move to proper place) + +sonic-buildimage.patch + +- /src/libnl3/patch/0003-Adding-support-for-RTA_NH_ID-attribute.patch + - /dockers/docker-fpm-frr/frr/supervisord/supervisord.conf.j2 + - patch to `rt_nh_id`, `rtnl_route_set_nh_id` etc. + - TODO: check if this is still required in the latest master +- add change to use next hop groups + - change + - `+fpm use-next-hop-groups` + - `+fpm address 127.0.0.1 port 2620` + - /dockers/docker-fpm-frr/frr/common/daemons.common.conf.j2 + - /src/sonic-bgpcfgd/tests/data/sonic-cfggen/ + - bgpd.conf.j2/all.conf + - common/daemons.common.conf + - frr.conf.j2/all.conf + - staticd/staticd.conf + - zebra/zebra.conf + - /src/sonic-config-engine/tests/sample_output/py2/ + - bgpd_frr.conf + - bgpd_frr_backend_asic.conf + - bgpd_frr_frontend_asic.conf + - frr.conf + - staticd_frr.conf + - t2-chassis-fe-bgpd.conf + - etc. +- /src/sonic-frr/patch/dplane_fpm_nl-Use-vrf_id-for-vrf-not-tabled_id.patch + - /zebra/rt_netlink.c + +sonic-swss.patch + +- add default VRF + - /cfgmgr/vrfmgr.cpp +- Add `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP` support + - /fpmsyncd/fpmlink.cpp +- many changes to support NextHop Group + - /fpmsyncd/routesync.cpp + - /fpmsyncd/routesync.h + +### High-Level Design + + +#### Current fpmsyncd processing flow (for reference) + +For example, if one configure following routes: + +``` +S>* 8.8.8.0/24 [1/0] via 10.0.1.5, Ethernet4, weight 1, 00:00:05 +* via 10.0.1.6, Ethernet4, weight 1, 00:00:05 +S>* 9.9.9.0/24 [1/0] via 10.0.1.5, Ethernet4, weight 1, 00:00:19 +* via 10.0.1.6, Ethernet4, weight 1, 00:00:19 +``` + +it will generate the following `APPL_DB` entries: + +``` +admin@sonic:~$ sonic-db-cli APPL_DB hgetall "ROUTE_TABLE:8.8.8.0/24" +{'nexthop': '10.0.1.5,10.0.1.6', 'ifname': 'Ethernet4,Ethernet4', 'weight': '1,1'} +admin@sonic:~$ sonic-db-cli APPL_DB hgetall "ROUTE_TABLE:9.9.9.0/24" +{'nexthop': '10.0.1.5,10.0.1.6', 'ifname': 'Ethernet4,Ethernet4', 'weight': '1,1'} +``` + +The flow below shows how `zebra`, `fpmsyncd` and `redis-server` interacts when using `fpm plugin` without NextHop Group: + + +##### Figure: Flow diagram without NextHop Group +![fig1](diagram/fig1.svg) + +#### Proposed fpmsyncd processing flow using NextHop Group + +To support the nexthop group, `fpmsyncd` was modified to handle the new events `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP`. +`fpmsyncd` now has a new logic to associate routes to NextHop Groups. + +The flow for the new NextHop Group feature is shown below: + + +##### Figure: Flow diagram new nexthop group feature +![fig2](diagram/fig2.svg) + + +#### Value SET/DEL to APPL_DB + +After enabling `use-next-hop-groups` in `dplane_fpm_nl` plugin, zebra will send `RTM_NEWNEXTHOP` to `fpmsyncd` when a new route is added. + +`RTM_NEWNEXTHOP` is sent with 2 different attribute groups as shown in the table below: + + + + + + + + +
EventAttributesDescription
RTM_NEWNEXTHOPNHA_IDNextHop Group ID
NHA_GATEWAYgateway address
NHA_OIFThe interface ID
RTM_NEWNEXTHOPNHA_IDNextHop Group ID
NHA_GROUPA list of nexthop groups IDs with its respective weights.
+ +After sending the `RTM_NEWNEXTHOP` events, zebra sends the `RTM_NEWROUTE` to `fpmsyncd` with NextHop Group ID as shown in the table below: + + + + + +
EventAttributesDescription
RTM_NEWROUTERTA_DSTroute prefix address
RTA_NH_IDNextHop Group ID
+ +#### Example of entries in APPL_DB + +For example. following route configuration will generate events show in the table below: + +``` +S>* 8.8.8.0/24 [1/0] via 10.0.1.5, Ethernet4, weight 1, 00:01:09 + * via 10.0.2.6, Ethernet8, weight 1, 00:01:09 +S>* 9.9.9.0/24 [1/0] via 10.0.1.5, Ethernet4, weight 1, 00:00:04 + * via 10.0.2.6, Ethernet8, weight 1, 00:00:04 +``` + + + + + + + + + + + + + + +
SeqEventAttributesValue
1RTM_NEWNEXTHOPNHA_ID116
NHA_GATEWAY10.0.1.5
NHA_OIF22
2RTM_NEWNEXTHOPNHA_ID117
NHA_GATEWAY10.0.2.6
NHA_OIF23
3RTM_NEWNEXTHOPNHA_ID118
NHA_GROUP[{116,1},{117,1}]
4RTM_NEWROUTERTA_DST8.8.8.0/24
RTA_NH_ID118
5RTM_NEWROUTERTA_DST9.9.9.0/24
RTA_NH_ID118
+ +A short description of `fpmsyncd` logic flow: + +- When receiving `RTM_NEWNEXTHOP` events on sequence 1, 2 and 3, `fpmsyncd` will save the information in an internal list to be used when necessary. +- When `fpmsyncd` receive the `RTM_NEWROUTE` on sequence 4, the process will write the NextHop Group with ID 118 to the `NEXTHOP_GROUP_TABLE` using the information of gateway and interface from the NextHop Group events with IDs 116 and 117. +- Then `fpmsyncd` will create a new route entry to `ROUTE_TABLE` with a `nexthop_group` field with value `ID118`. +- When `fpmsyncd` receives the last `RTM_NEWROUTE` on sequence 5, the process will create a new route entry (but no NextHop Group entry) in `ROUTE_TABLE` with `nexthop_group` field with value `ID118`. (Note: This NextHop Group entry was created when the `fpmsyncd` received the event sequence 4.) + +#### Example of entries in ASIC_DB + +The `ASIC_DB` entry is not changed by this enhancement. +Therefore, even after this enhancement, table entries will be created for `ROUTE_ENTRY`, `NEXT_HOP_GROUP`, `NEXT_HOP_GROUP_MEMBER`, and `NEXT_HOP` respectively, as shown in the example below + + +##### Figure: Example of ASIC_DB entry +![fig3](diagram/fig3.svg) + + +### SAI API + +No changes are being made in SAI. The end result of what gets programmed via SAI will be the same as current implementation. + + +### Configuration and management + +The output of 'show ip route' and 'show ipv6 route' will remain unchanged - the CLI code will resolve the NextHop Group ID referenced in the `ROUTE_TABLE` to display the next hops for the routes. + + +#### Manifest (if the feature is an Application Extension) + + + +#### CLI/YANG model Enhancements + + + +No change. + +#### Config DB Enhancements + + + +No change. + +### Warmboot and Fastboot Design Impact + + + +TBD (if applicable) + +### Restrictions/Limitations + +TBD (if applicable) + +### Testing Requirements/Design + +TBD + + + +#### Unit Test cases + +TBD + +#### System Test cases + +TBD + +### Open/Action items - if any + + + +#### Backward compatibility with Fine-grain NHG, Ordered NHG/ECMP + +Eddy Kevetny (Nvidia) provided feedback about `net.ipv4.nexthop_compat_mode` and backward compatibility issue. + +> From: eddyk=nvidia.com@lists.sonicfoundation.dev on Date: Thu, 29 Jun 2023 14:29:56 +0000 +> +> You might want to set “net.ipv4.nexthop_compat_mode” with 0 to enable the Linux kernel to handle NHG and send to FRR. +> Then you will need also to set “fpm use-next-hop-groups” in FRR Vtysh. Please check which logic of NHG creation is preferrable: by kernel (supporting it from 5.3) or by FRR/Zebra +> +> Today the logic of creation of NHGs is located in SWSS (Route/NextHopGroup Orch Agent) and the community defined different types of NHG/ECMP with configuration via Redis – e.g. Fine-grain NHG, Ordered NHG/ECMP. If some apps are using these NHG types (I know for sure that Microsoft uses some of them) then it might be a problematic to have a logic of NHG/ECMP creation (particularly enforcing the specific order of NHG members) out of SWSS. Then you might need to consider the support of backward-compatibility for this feature + +We already have set `fpm use-next-hop-groups` in FRR. + +We can disable `net.ipv4.nexthop_compat_mode` (set to 0) if it does not cause backward compatibility issue, e.g. if we want to make `fpmsyncd` to use NextHop Group an optional feature. + +TODO: study NHG creation logic in SWSS (Route/NextHopGroup Orch Agent) to identify: +1. if we should make this feature an runtime option. +2. if this has backward compatibility issue \ No newline at end of file