-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fpmsyncd] Fpmsyncd Next Hop Table Enhancement #2919
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
fpmsyncd/routesync.cpp
Outdated
FieldValueTuple nhg("nexthop_group", nhg_id_key.c_str()); | ||
fvVector.push_back(nhg); | ||
updateNextHopGroup(nhg_id); | ||
use_nhg = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_nhg=true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was removed after the change to not use NHG for route with single nexthop.
fpmsyncd/routesync.h
Outdated
/* nexthop group table */ | ||
ProducerStateTable m_nexthop_groupTable; | ||
map<uint32_t,NextHopGroup> m_nh_groups; | ||
map<string,NextHopGroupRoute> m_nh_routes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fpmsyncd should NOT cache the routes, it will consume too much memory in large scale routes scenario.
For one route entry, it already exists in orchagent, sai meta, syncd(sai/sdk). We cannot afford another copying :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed map<string,NextHopGroupRoute> m_nh_routes;
based on the discussion in the Routing WG.
Please kindly check if this has resolved your concern.
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
/AzurePipelines run Azure.sonic-swss |
Commenter does not have sufficient privileges for PR 2919 in repo sonic-net/sonic-swss |
@zice312963205 @shuaishang |
@nakano-omw your branch needs to be updated and you need to repush your code first. @prsunny Hi Prince, can you please help with this PR to merge. Thanks. |
@ridahanif96 I have repush. Thanks. |
Looks good |
reviewers, can you please help to review and approve this PR? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add UT to the changes. There is a requirement for 80% coverage.
fpmsyncd/routesync.h
Outdated
|
||
#include <netlink/route/route.h> | ||
|
||
#if (LINUX_VERSION_CODE > KERNEL_VERSION(5,3,0)) | ||
#define HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this macro HAVE_NEXTHOP_GROUP? Since we are submitting the code to master, the linux version is expected to be above 5,3,0. Please remove unnecessary ifdefs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have removed the code.
else | ||
#endif | ||
{ | ||
onEvpnRouteMsg(h, len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested if nexthop group works with EVPN in case of overlay nexthop?
fpmsyncd/routesync.cpp
Outdated
char ifname_unknown[IFNAMSIZ] = "unknown"; | ||
|
||
SWSS_LOG_INFO("type %d len %d", nlmsg_type, len); | ||
if ((nlmsg_type != RTM_NEWNEXTHOP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already checked in the calling function and hence redundant. Please remove here
https://github.com/sonic-net/sonic-swss/pull/2919/files#diff-0555c0a4f1e207c410ac8ab7d4a44f48a0925da2ed14c57499a4e9175223be57R625
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I have removed the code.
fpmsyncd/routesync.cpp
Outdated
@@ -45,6 +49,8 @@ using namespace swss; | |||
|
|||
#define ETHER_ADDR_STRLEN (3*ETH_ALEN) | |||
|
|||
#define MULTIPATH_NUM 256 //Same value used for FRR in SONiC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename as MAX_MULTIPATH_NUM for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have modified it to MAX_MULTIPATH_NUM.
fpmsyncd/routesync.cpp
Outdated
auto itr = m_nh_groups.find(id); | ||
if(itr == m_nh_groups.end()) | ||
{ | ||
SWSS_LOG_INFO("NextHop group is incomplete: %d", nhg.id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a warn or error log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have corrected it to SWSS_LOG_ERROR.
fpmsyncd/routesync.cpp
Outdated
auto git = m_nh_groups.find(nh_id); | ||
if(git == m_nh_groups.end()) | ||
{ | ||
SWSS_LOG_INFO("Nexthop not found: %d", nh_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a warn or error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have corrected it to SWSS_LOG_ERROR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cherry picked into Phoenix Wing folk and validate it.
fpmsyncd/fpmlink.cpp
Outdated
@@ -276,6 +276,13 @@ void FpmLink::processFpmMessage(fpm_msg_hdr_t* hdr) | |||
/* EVPN Type5 Add route processing */ | |||
processRawMsg(nl_hdr); | |||
} | |||
#ifdef HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this macro check? Can we do a dynamic check like
DEVICE_METADATA['localhost']['nexthop_group']
This is done as part of PR - https://github.com/sonic-net/sonic-buildimage/pull/16762/files
Could you pls check this?
fpmsyncd/routesync.cpp
Outdated
|
||
vector<string> alsv = tokenize(intf_list, NHG_DELIMITER); | ||
for (auto alias : alsv) | ||
#ifdef HAVE_NEXTHOP_GROUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the previous comment and we could use device_metadata dynamic check.
* up/down events. Skipping routes to eth0 or docker0 to avoid such behavior | ||
*/ | ||
if (alias == "eth0" || alias == "docker0") | ||
const auto itg = m_nh_groups.find(nhg_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add couple of sonic-swss tests with NHGs?
@ntt-omw Please check the compiler failures routesync.cpp:800:23: error: 'rtnl_route_get_nh_id' was not declared in this scope; did you mean 'rtnl_route_get_iif'? |
@ntt-omw can you rebase your branch and trigger recompile? You need #3105 's changes to fix the compile issue @kperumalbfn pointed out. |
fpmsyncd/routesync.cpp
Outdated
// In this case since we do not want the route with next hop on eth0/docker0, we return. | ||
// But still we need to clear the route from the APPL_DB. Otherwise the APPL_DB and data | ||
// path will be left with stale route entry | ||
if(alsv.size() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this test be moved outside the loop ? If the list is single entry then there is no reason for the loop itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this loop is not to show skipped routes, but to skip routes to specific interfaces (eth0 or docker0) and do the associated processing.
fpmsyncd/routesync.cpp
Outdated
string weights = getNextHopWt(route_obj); | ||
|
||
vector<string> alsv = tokenize(intf_list, NHG_DELIMITER); | ||
for (auto alias : alsv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this loop? Is it to print the skipped routes ? Because the only logic there is when the list is of size 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I moved if(alsv.size() == 1)) outside the loop.
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
@dgsudharsan @kperumalbfn |
@ntt-omw can you help to get swss sanity check passed? Failed: 3 (0.35%) test_rebind_eni_route_group Might be related to your changes. |
@nakano-omw libnl 3.10 will have support for getting/setting the nexthop ID attribute, but the API is a little bit different. See thom311/libnl@3e08063 for details. It looks like in the version of code that has been committed, it's For ease of upgrades, it would be good if the same API syntax is used. Would you be able to rework this PR to use that new API instead? |
@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch" |
We are fixing this issue.
|
We are rebasing our branch now.
|
Following lines are missing test coverage.. Coverage Threshold is 80%. |
{ | ||
if(nhg.group.size() == 0) | ||
{ | ||
if(!nhg.nexthop.empty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced with the following:
nexthops = nhg.nexthop.empty() ? (af == AF_INET ? "0.0.0.0" : "::") : nhg.nexthop;
Similar logic is implemented in the non-empty nhg.
//Using route-table only for single next-hop | ||
string nexthops, ifnames, weights; | ||
|
||
getNextHopGroupFields(nhg, nexthops, ifnames, weights, rtnl_route_get_family(route_obj)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to handle a case where the nhg is not based on the ID?
If there is a failure in getNextHopGroupFields()
, you would be pushing empty strings.
What I did
Implementing code changes for sonic-net/SONiC#1425
Why I did it
add nexthop group feature to fpmsyncd.
How I verified it
enable/disable nexthop group feature
feature next-hop-group enable
.FEATURE|nexthop_group
will be created inCONFIG_DB
zebra.conf.j2
will generatezebra.conf
withfpm use-next-hop-groups
ifFEATURE|nexthop_group
exists inCONFIG_DB
. Else, it will generatezebra.conf
withno fpm use-next-hop-groups
(default behavior)config save
comman and write to/etc/sonic/config_db.json
virsh reboot sonic-nhg
/etc/frr/zebra.conf
hasfpm use-next-hop-groups
instead ofno fpm use-next-hop-groups
Klish CLI for feature nexthop_group
sonic(config)# feature next-hop-group enable
sonic(config)# no feature next-hop-group
Enable
Disable
Details if related