Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fpmsyncd] Fpmsyncd Next Hop Table Enhancement #2919

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ntt-omw
Copy link

@ntt-omw ntt-omw commented Oct 2, 2023

What I did
Implementing code changes for sonic-net/SONiC#1425

Why I did it
add nexthop group feature to fpmsyncd.

How I verified it

enable/disable nexthop group feature

  • Klish will call REST API to configure feature next-hop-group enable.
  • FEATURE|nexthop_group will be created in CONFIG_DB
    • template zebra.conf.j2 will generate zebra.conf with fpm use-next-hop-groups if FEATURE|nexthop_group exists in CONFIG_DB. Else, it will generate zebra.conf with no fpm use-next-hop-groups (default behavior)
  • Do config save comman and write to /etc/sonic/config_db.json
  • restart SONiC: virsh reboot sonic-nhg
  • /etc/frr/zebra.conf has fpm use-next-hop-groups instead of no fpm use-next-hop-groups

Klish CLI for feature nexthop_group

  • Enable: sonic(config)# feature next-hop-group enable
  • Disable: sonic(config)# no feature next-hop-group

Enable

admin@sonic:~$ sonic-cli

sonic# configure terminal

sonic(config)#
  end        Exit to EXEC mode
  exit       Exit from current mode
  feature    Configure additional feature
  interface  Select an interface
  ip         Global IP configuration subcommands
  mclag      domain
  no         To delete / disable commands in config mode

sonic(config)# feature
  next-hop-group  Next-hop Groups feature

sonic(config)# feature next-hop-group
  enable  Enable Next-hop Groups feature

sonic(config)# feature next-hop-group enable
  <cr>

Disable

sonic(config)# no
  feature  Disable additional feature
  ip       Global IP configuration subcommands
  mclag    domain

sonic(config)# no feature
  next-hop-group  Disable Next-hop Groups feature

sonic(config)# no feature next-hop-group
  <cr>

Details if related

Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
FieldValueTuple nhg("nexthop_group", nhg_id_key.c_str());
fvVector.push_back(nhg);
updateNextHopGroup(nhg_id);
use_nhg = false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_nhg=true?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed after the change to not use NHG for route with single nexthop.

/* nexthop group table */
ProducerStateTable m_nexthop_groupTable;
map<uint32_t,NextHopGroup> m_nh_groups;
map<string,NextHopGroupRoute> m_nh_routes;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fpmsyncd should NOT cache the routes, it will consume too much memory in large scale routes scenario.
For one route entry, it already exists in orchagent, sai meta, syncd(sai/sdk). We cannot afford another copying :(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed map<string,NextHopGroupRoute> m_nh_routes; based on the discussion in the Routing WG.
Please kindly check if this has resolved your concern.

@nakano-omw
Copy link

/AzurePipelines run Azure.sonic-swss

Copy link

Commenter does not have sufficient privileges for PR 2919 in repo sonic-net/sonic-swss

@nakano-omw
Copy link

@zice312963205 @shuaishang
Please help in restarting the /AzurePipelines run Azure.sonic-swss. Would you be able to grant me the privilege?

@ridahanif96
Copy link

@nakano-omw your branch needs to be updated and you need to repush your code first. @prsunny Hi Prince, can you please help with this PR to merge. Thanks.

@nakano-omw
Copy link

@ridahanif96 I have repush. Thanks.

@zice312963205
Copy link

Looks good

@zhangyanzhao
Copy link
Collaborator

reviewers, can you please help to review and approve this PR? Thanks.

Copy link
Collaborator

@dgsudharsan dgsudharsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add UT to the changes. There is a requirement for 80% coverage.


#include <netlink/route/route.h>

#if (LINUX_VERSION_CODE > KERNEL_VERSION(5,3,0))
#define HAVE_NEXTHOP_GROUP
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this macro HAVE_NEXTHOP_GROUP? Since we are submitting the code to master, the linux version is expected to be above 5,3,0. Please remove unnecessary ifdefs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have removed the code.

else
#endif
{
onEvpnRouteMsg(h, len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested if nexthop group works with EVPN in case of overlay nexthop?

char ifname_unknown[IFNAMSIZ] = "unknown";

SWSS_LOG_INFO("type %d len %d", nlmsg_type, len);
if ((nlmsg_type != RTM_NEWNEXTHOP)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I have removed the code.

@@ -45,6 +49,8 @@ using namespace swss;

#define ETHER_ADDR_STRLEN (3*ETH_ALEN)

#define MULTIPATH_NUM 256 //Same value used for FRR in SONiC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename as MAX_MULTIPATH_NUM for better readability?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified it to MAX_MULTIPATH_NUM.

auto itr = m_nh_groups.find(id);
if(itr == m_nh_groups.end())
{
SWSS_LOG_INFO("NextHop group is incomplete: %d", nhg.id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a warn or error log?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected it to SWSS_LOG_ERROR.

auto git = m_nh_groups.find(nh_id);
if(git == m_nh_groups.end())
{
SWSS_LOG_INFO("Nexthop not found: %d", nh_id);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a warn or error message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected it to SWSS_LOG_ERROR.

Copy link

@eddieruan-alibaba eddieruan-alibaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cherry picked into Phoenix Wing folk and validate it.

http://phoenixwing.com.cn/codes

@@ -276,6 +276,13 @@ void FpmLink::processFpmMessage(fpm_msg_hdr_t* hdr)
/* EVPN Type5 Add route processing */
processRawMsg(nl_hdr);
}
#ifdef HAVE_NEXTHOP_GROUP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this macro check? Can we do a dynamic check like

DEVICE_METADATA['localhost']['nexthop_group']

This is done as part of PR - https://github.com/sonic-net/sonic-buildimage/pull/16762/files

Could you pls check this?


vector<string> alsv = tokenize(intf_list, NHG_DELIMITER);
for (auto alias : alsv)
#ifdef HAVE_NEXTHOP_GROUP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the previous comment and we could use device_metadata dynamic check.

* up/down events. Skipping routes to eth0 or docker0 to avoid such behavior
*/
if (alias == "eth0" || alias == "docker0")
const auto itg = m_nh_groups.find(nhg_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add couple of sonic-swss tests with NHGs?

@kperumalbfn
Copy link
Contributor

@ntt-omw Please check the compiler failures

routesync.cpp:800:23: error: 'rtnl_route_get_nh_id' was not declared in this scope; did you mean 'rtnl_route_get_iif'?
800 | uint32_t nhg_id = rtnl_route_get_nh_id(route_obj);
| ^~~~~~~~~~~~~~~~~~~~
| rtnl_route_get_iif

@eddieruan-alibaba
Copy link

@ntt-omw can you rebase your branch and trigger recompile? You need #3105 's changes to fix the compile issue @kperumalbfn pointed out.

// In this case since we do not want the route with next hop on eth0/docker0, we return.
// But still we need to clear the route from the APPL_DB. Otherwise the APPL_DB and data
// path will be left with stale route entry
if(alsv.size() == 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this test be moved outside the loop ? If the list is single entry then there is no reason for the loop itself.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this loop is not to show skipped routes, but to skip routes to specific interfaces (eth0 or docker0) and do the associated processing.

string weights = getNextHopWt(route_obj);

vector<string> alsv = tokenize(intf_list, NHG_DELIMITER);
for (auto alias : alsv)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this loop? Is it to print the skipped routes ? Because the only logic there is when the list is of size 1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I moved if(alsv.size() == 1)) outside the loop.

Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
Signed-off-by: Kanji Nakano <kanji.nakano@ntt.com>
@nakano-omw
Copy link

@dgsudharsan @kperumalbfn
The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@eddieruan-alibaba
Copy link

@dgsudharsan @kperumalbfn The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@ntt-omw can you help to get swss sanity check passed?

Failed: 3 (0.35%)

test_rebind_eni_route_group
AssertionError: Field SAI_ENI_ATTR_OUTBOUND_ROUTING_GROUP_ID in ASIC_DB table ASIC_STATE:SAI_OBJECT_TYPE_ENI not equal to oid:0x14008000000617
https://github.com/sonic-net/sonic-swss/pull/2919/checks?check_run_id=31324424340

Might be related to your changes.

@saiarcot895
Copy link
Contributor

@nakano-omw libnl 3.10 will have support for getting/setting the nexthop ID attribute, but the API is a little bit different. See thom311/libnl@3e08063 for details. It looks like in the version of code that has been committed, it's nhid instead of nh_id.

For ease of upgrades, it would be good if the same API syntax is used. Would you be able to rework this PR to use that new API instead?

@eddieruan-alibaba
Copy link

@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch"

@sugisono-omw
Copy link

@eddieruan-alibaba

We are fixing this issue.

@dgsudharsan @kperumalbfn The test will be implemented in Phoenix Wing. @eddieruan-alibaba , let me know if you have any additional information.

@ntt-omw can you help to get swss sanity check passed?

Failed: 3 (0.35%)

test_rebind_eni_route_group AssertionError: Field SAI_ENI_ATTR_OUTBOUND_ROUTING_GROUP_ID in ASIC_DB table ASIC_STATE:SAI_OBJECT_TYPE_ENI not equal to oid:0x14008000000617 https://github.com/sonic-net/sonic-swss/pull/2919/checks?check_run_id=31324424340

Might be related to your changes.

@sugisono-omw
Copy link

@eddieruan-alibaba

We are rebasing our branch now.

@ntt-omw @nakano-omw can you rebase your branch to latest master? You have "This branch is out-of-date with the base branch"

@eddieruan-alibaba
Copy link

Following lines are missing test coverage.. Coverage Threshold is 80%.

{
if(nhg.group.size() == 0)
{
if(!nhg.nexthop.empty())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with the following:
nexthops = nhg.nexthop.empty() ? (af == AF_INET ? "0.0.0.0" : "::") : nhg.nexthop;

Similar logic is implemented in the non-empty nhg.

//Using route-table only for single next-hop
string nexthops, ifnames, weights;

getNextHopGroupFields(nhg, nexthops, ifnames, weights, rtnl_route_get_family(route_obj));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to handle a case where the nhg is not based on the ID?
If there is a failure in getNextHopGroupFields(), you would be pushing empty strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.