Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[staticroutebfd]fix an issue on deleting a non-bfd static route #15269

Merged
merged 5 commits into from
Jun 2, 2023
Merged

[staticroutebfd]fix an issue on deleting a non-bfd static route #15269

merged 5 commits into from
Jun 2, 2023

Conversation

baorliu
Copy link
Contributor

@baorliu baorliu commented May 31, 2023

What I did it

Fix an issue for deleting a non-bfd static route
also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.

fixes #15267

Why I did it

Current design access "None" object (nh_vrf_list) when delete a non-bfd static route

For supervisord, because staticroutebfd support restart and recovery. so it is not necessary to restart bgp container if there is any crash inside staticroutebfd. supervisord will restart it after crash.

Work item tracking

Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/17793093

How I did it

check if the route is BFD protected when delete this static route, and check if the nh_vrf_list is None.

How to verify it

add a UT deleting a non-bfd static route to reproduce the issue:

self = <staticroutebfd.main.StaticRouteBfd object at 0x7f2f1add8b80>, key = '3.3.3.0/24', redis_del = {'ifname': 'if1, if2', 'nexthop': '192.168.1.2 , 192.168.2.2'}

    def static_route_del_handler(self, key, redis_del):
        valid, vrf, ip_prefix = static_route_split_key(key)
        if not valid:
            return True
        route_cfg_key = vrf + "|" + ip_prefix
    
        valid, is_ipv4, ip = check_ip(ip_prefix)
        if not valid:
            return True
    
        data = self.get_local_db(LOCAL_CONFIG_TABLE, route_cfg_key)
        if len(data) == 0:
            # this route is not handled by StaticRouteBfd, skip
            return True
    
        arg_list    = lambda v: [x.strip() for x in v.split(',')] if len(v.strip()) != 0 else None
        nh_list     = arg_list(data['nexthop']) if 'nexthop' in data else None
        nh_vrf_list = arg_list(data['nexthop-vrf']) if 'nexthop-vrf' in data else None
        for index in range(len(nh_list)):
            nh_ip = nh_list[index]
>           nh_vrf = nh_vrf_list[index]
E           TypeError: 'NoneType' object is not subscriptable

staticroutebfd/main.py:507: TypeError

fix the issue and run the UT:

staticroutebfd/__init__.py              0      0   100%
staticroutebfd/main.py                577    204    65%
staticroutebfd/vars.py                  5      0   100%
-------------------------------------------------------
TOTAL                                2367   1689    29%

================================================================================================ 32 passed in 0.74s =================================================================================================

For supervisord, after changed the critical_processes.j2 and supervisord.conf.j2, restart bgp (sudo systemctl restart bgp),
manually kill the staticroutebfd process inside bgp container, it can be restarted without restart the whole bgp container.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

self.remove_from_local_db(LOCAL_BFD_TABLE, bfd_key)
self.del_bfd_session_from_appl_db(bfd_key)
# do not delete it from appl_db if the route is not bfd enabled
if bfd_enabled:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't we removing the static route if bfd not enabled? What was the previous behavior? Does this mean if the user removes static route via CLI, won't it get removed from APP_DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If BFD is not enabled, staticroutebfd does not write static route entry to appl_db. so it should not delete it from appl_db. In general, prefix in config_db should not in appl_db. but to be safe, it is better that staticroutebfd does not touch that route in appl_db if it is not created by staticroutebfd.

dgsudharsan
dgsudharsan previously approved these changes May 31, 2023
@baorliu baorliu marked this pull request as ready for review May 31, 2023 16:14
set_del_test(dut, "srt",
"DEL",
("3.3.3.0/24", {
"nexthop": "192.168.1.2 , 192.168.2.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fv-pairs for del notification will be empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove that to prevent DUT using these values in case

@@ -169,6 +168,28 @@ def test_set_del():
{'del_default:2.2.2.0/24': {}}
)

# test add a non-bfd static route
set_del_test(dut, "srt",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest adding a testcase which has nexthop-vrf parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new testcases are added for nexthop-vrf list. thanks

@@ -9,5 +9,4 @@ program:pimd
program:frrcfgd
{%- else %}
program:bgpcfgd
program:staticroutebfd
Copy link
Contributor

@vivekrnv vivekrnv May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Isn't this a critical process? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

staticroutebfd itself supports recovery from restart/crash. So it is not necessary to set it to critical process. When a critical process crashes, the bgp container itself will restart, which has more impact. to reduce impact to the bgp container, removed it from critical process, and let supervisord restart staticroutebfd only (see change in supervisord.conf.j2 below).

@dgsudharsan
Copy link
Collaborator

@prsunny Can we merge this?

@abdosi
Copy link
Contributor

abdosi commented Jun 7, 2023

@gechiang needed for msft repo 202205 branch

@abdosi
Copy link
Contributor

abdosi commented Jun 7, 2023

@baorliu can you create direct PR for 202205 branch on msft repo ?

That way we need not have to cherry-pick.

cc @anamehra @gechiang

abdosi added a commit to Azure/sonic-buildimage-msft that referenced this pull request Jun 12, 2023
Double commit the PRs from sonic-net/sonic-buildimage.
Original PR for staticroutebfd:
sonic-net/sonic-buildimage#13789
And a fix for staticroutebfd:
sonic-net/sonic-buildimage#15269
@gechiang gechiang added the Included in Chassis for 202205 Branch Indicate PR is already in MSFT repo 202205 branch label Jun 12, 2023
sonic-otn pushed a commit to sonic-otn/sonic-buildimage that referenced this pull request Sep 20, 2023
…c-net#15269)

* [static_route][staticroutebfd]fix an issue on deleting a non-bfd static route

Fix an issue for deleting a non-bfd static route also remove the staticroutebfd from critical_processes list and make it auto restart in the case of crash.
mlok-nokia pushed a commit to mlok-nokia/sonic-buildimage that referenced this pull request Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

staticroutebfd fails to handle static route del notification
7 participants