Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426

gechiang · 2020-09-22T08:01:47Z

This is to port the same change from Master branch "https://github.com/Azure/sonic-buildimage/pull/5329/files" to 201911 branch as the way to start up a daemon in 201911 branch differs from master branch.

This PR is to auto start the bgpmon (BGP Monitor) Daemon under the BGP docker.

The purpose of bgpmon (BGP Monitor daemon) is to assist gathering BGP related information periodically based on BGP activity detection to store BGP Neighbor information into State DB. For components such as SNMP agent that uses this new BGP related DB in state DB to support the MIB operation.
This PR is to auto start the bgpmon daemon under the BGP docker. There is another PR which I originally used to implements the bgpmon processing code (sonic-net/sonic-swss#1429) that was later decided to be moved into this PR instead of under the sonic-swss repo. Many valuable comments are still in that PR so reader should refer to that PR for review comments history purpose only.

Note: This new daemon is also added to be monitored by Monit so that in case it stops running it will be reported.

The HLD doc PR can be found here:

[Azure-SONiC] (CiscoBgp4MIB implementation changes SONiC#667)

The following PR (SNMP Agent Changes to use State DB for BGP MIB )has a dependency on this PR:

[Azure-sonic-snmpagent] (Implementation changes for CiscoBgp4MIB sonic-snmpagent#158)

The following PR contains the Pytest code that validate this feature functionality whenever VS Image test is run as part of all PRs as well as used in DUT pytest:
-[Azure-sonic-mgmt] (sonic-net/sonic-mgmt#2241)

Signed-off-by: Gen-Hwa Chiang gechiang@microsoft.com

- What I did

Added a new daemon to be run under BGP docker periodically to gather the BGP peer state information and update them in
the state DB. In order not to continuously running even when there are no BGP peer activities, this new daemon will use the BGP frr.log file timestamp as an indicator whether there are any activities in BGP that warrants it to inspect the peer state info for possible updates.

- How I did it

Start a periodic timer to get triggered every 15 second.
Check if there is a change in the timestamp of the frr.log file against a cached timestamp from the previous update.
Request the back-end handler (FRR/Zebra) to dump out the "show bgp summary json"
For each peer state that it just received, check it against the previous state that was cached and if there is a change or this is new peer, update/add the corresponding state DB table entry.
At the end of processing all the peer info, check if there are any stale peer entry present and delete the corresponding state DB entry accordingly.

- How to verify it

once the BGP docker is up and running, you can perform "show services" to see that this new daemon "bgpmon.py" is running under the BGP docker service similar to the following output:

admin@SONic:~$ show services
...
bgp     docker
---------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.2  0.2  28576 22632 pts/0    Ss+  19:10   0:08 /usr/bin/python2 /usr/bin/supervisord
root        21  0.0  0.2  22664 16716 pts/0    S    19:11   0:00 python /usr/bin/supervisor-proc-exit-listener --container-name bgp
root        25  0.0  0.0 225856  3512 pts/0    Sl   19:11   0:00 /usr/sbin/rsyslogd -n -iNONE
frr         29  1.0  0.4 525176 36580 pts/0    Sl   19:11   0:40 /usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M fpm -M snmp
frr         30  0.0  0.0  43304  6080 pts/0    S    19:11   0:00 /usr/lib/frr/staticd -A 127.0.0.1
frr         31  2.5  1.0 356196 84108 pts/0    Sl   19:11   1:36 /usr/lib/frr/bgpd -A 127.0.0.1 -M snmp
root        40  0.1  0.6  69172 55876 pts/0    S    19:11   0:05 /usr/bin/python /usr/local/bin/bgpcfgd
root        41  0.0  0.1  20988 15508 pts/0    S    19:11   0:00 /usr/bin/python /usr/local/bin/bgpmon
root        42  0.2  0.0  82108  6136 pts/0    Sl   19:11   0:10 fpmsyncd
...

For IPV4 You can perform "show ip bgp summary" and use that output to check against the state DB for each peer.
The state DB key would be something like "NEIGH_STATE_TABLE|10.0.0.33" where 10.0.0.33 is the peer ip address.
you can then read out the state information by issuing the redis cmd hgetall "NEIGH_STATE_TABLE|10.0.0.33". Compare that with the corresponding output of "show ip bgp summary".
Here is a sample output from the State DB:
admin@str-s6000-acs-8:~$ redis-cli -n 6
127.0.0.1:6379[6]> keys "NEIGH*"

"NEIGH_STATE_TABLE|fc00::1a"
"NEIGH_STATE_TABLE|fc00::32"
"NEIGH_STATE_TABLE|10.0.0.33"
"NEIGH_STATE_TABLE|fc00::66"
"NEIGH_STATE_TABLE|fc00::62"
"NEIGH_STATE_TABLE|fc00::46"
"NEIGH_STATE_TABLE|10.0.0.35"
"NEIGH_STATE_TABLE|10.0.0.53"
"NEIGH_STATE_TABLE|10.0.0.47"
"NEIGH_STATE_TABLE|10.0.0.9"
"NEIGH_STATE_TABLE|fc00::6e"
"NEIGH_STATE_TABLE|10.0.0.41"
"NEIGH_STATE_TABLE|10.0.0.63"
"NEIGH_STATE_TABLE|10.0.0.17"
"NEIGH_STATE_TABLE|fc00::4a"
"NEIGH_STATE_TABLE|fc00::52"
"NEIGH_STATE_TABLE|fc00::5e"
"NEIGH_STATE_TABLE|10.0.0.29"
"NEIGH_STATE_TABLE|10.0.0.45"
"NEIGH_STATE_TABLE|fc00::2a"
"NEIGH_STATE_TABLE|fc00::42"
"NEIGH_STATE_TABLE|10.0.0.1"
"NEIGH_STATE_TABLE|10.0.0.59"
127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.1"
"state"
"Established"
127.0.0.1:6379[6]>

you can do the same for ipv6 by performing "show ipv6 bgp summary" and follow the same steps above to validate with the corresponding peer states stored in state DB.

…P docker

qiluo-msft · 2020-09-22T16:44:39Z

I see #5329 lack unit test or vs test. So for the backporting, it is also lack of test.

rlhui · 2020-09-22T20:33:35Z

@abdosi - FYI

abdosi · 2020-09-22T21:30:18Z

src/sonic-bgpcfgd/bgpmon_proj/bgpmon.py

+        # Check for stale state entries to be cleaned up
+        while len(self.peer_l) > 0:
+            # remove this from the stateDB and the current nighbor state entry
+            peer = self.peer_l.pop(0)


@gechiang Can't we juse pop here ? if we want to pop from front then queue is better data structure than using list from performance perspective

@abdosi Agreed. There is no need to pop from the front at all. The purpose is to pop one item from this list and handle it for the peer deleted case. We should address this next time we have a need to touch this file to enhance the performance. In most of our "normal" usage the neighbors are not removed so this code segment is not expected to be exercised regularly.

@gechiang Thanks but why can't this be addressed now ? It should be small change and quick verification

@abdosi Sure. I will pull another PR for this shortly. This PR was already merged before your comments arrived.

@gechiang Thanks, I meant in new PR only.

abdosi · 2020-09-22T21:31:17Z

src/sonic-bgpcfgd/bgpmon_proj/bgpmon.py

+            del_key = "NEIGH_STATE_TABLE|%s" % peer
+            data[del_key] = None
+            del self.peer_state[peer]
+            if len(data) > PIPE_BATCH_MAX_COUNT:


@gechiang do we need this if check in while loop ? even if this check is not there below if condition should take care of flushing data ?

@abdosi This check is required. The purpose of using pipeline is to be able to batch to reduce transaction overhead. But at the same time the server will also need to be batching the responses that depends on this batch depth and this batching will effectively increase the use/hold of memory from the server side.

@gechiang ok, so want to keep flushing in batches of PIPE_BATCH_MAX_COUNT for good server performance.

abdosi · 2020-09-22T21:36:04Z

@gechiang Please update https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/critical_processes
with new process

gechiang · 2020-09-22T22:41:21Z

@abdosi @lguohan When bgpmon fail to run it has no impact to the BGP functionality. The only impact it will have is to other potential components that consume the state DB that bgpmon produces. So it depends on the definition of what constitute to be a critical process for the BGP docker. If all agree I can certainly add it to be part of the critical processes.

abdosi · 2020-09-23T19:56:51Z

@abdosi @lguohan When bgpmon fail to run it has no impact to the BGP functionality. The only impact it will have is to other potential components that consume the state DB that bgpmon produces. So it depends on the definition of what constitute to be a critical process for the BGP docker. If all agree I can certainly add it to be part of the critical processes.

@gechiang In my opinion this is critical process as SNMP will use it for polling Device BGP health.

lguohan · 2020-09-23T22:09:50Z

i am thinking it is not critical, we can do autorestart in the supervisord configuration, but not restart bgp process.

abdosi · 2020-09-23T22:36:41Z

i am thinking it is not critical, we can do autorestart in the supervisord configuration, but not restart bgp process.

ok, that is better than container auto-restart. Supervisor auto-restart should be good for this process

gechiang · 2020-09-23T22:48:25Z

@lguohan @abdosi Thanks for the comment. I will work on another PR to make this change on master branch and hopefully cherry-pick-able for 201911 branch.

rlhui · 2020-10-21T19:54:56Z

@gechiang, should this PR be closed or stay? Thanks.

Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG…

26dbbed

…P docker

gechiang requested review from qiluo-msft, lguohan, rlhui and SuvarnaMeenakshi September 22, 2020 08:02

lguohan approved these changes Sep 22, 2020

View reviewed changes

rlhui approved these changes Sep 22, 2020

View reviewed changes

gechiang merged commit 7168fc8 into sonic-net:201911 Sep 22, 2020

abdosi reviewed Sep 22, 2020

View reviewed changes

This was referenced Sep 24, 2020

Renamed sonic-bgpcfgd/bgpmon_proj and make bgpmon autorestart=true #5453

Merged

make bgpmon autorestart enabled by supervisord #5460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426

Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426

gechiang commented Sep 22, 2020

qiluo-msft commented Sep 22, 2020

rlhui commented Sep 22, 2020

abdosi Sep 22, 2020 •

edited

Loading

gechiang Sep 22, 2020

abdosi Sep 22, 2020

gechiang Sep 22, 2020

abdosi Sep 23, 2020

abdosi Sep 22, 2020

gechiang Sep 22, 2020

abdosi Sep 23, 2020

abdosi commented Sep 22, 2020

gechiang commented Sep 22, 2020

abdosi commented Sep 23, 2020 •

edited

Loading

lguohan commented Sep 23, 2020

abdosi commented Sep 23, 2020

gechiang commented Sep 23, 2020

rlhui commented Oct 21, 2020

Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426

Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426

Conversation

gechiang commented Sep 22, 2020

qiluo-msft commented Sep 22, 2020

rlhui commented Sep 22, 2020

abdosi Sep 22, 2020 • edited Loading

Choose a reason for hiding this comment

gechiang Sep 22, 2020

Choose a reason for hiding this comment

abdosi Sep 22, 2020

Choose a reason for hiding this comment

gechiang Sep 22, 2020

Choose a reason for hiding this comment

abdosi Sep 23, 2020

Choose a reason for hiding this comment

abdosi Sep 22, 2020

Choose a reason for hiding this comment

gechiang Sep 22, 2020

Choose a reason for hiding this comment

abdosi Sep 23, 2020

Choose a reason for hiding this comment

abdosi commented Sep 22, 2020

gechiang commented Sep 22, 2020

abdosi commented Sep 23, 2020 • edited Loading

lguohan commented Sep 23, 2020

abdosi commented Sep 23, 2020

gechiang commented Sep 23, 2020

rlhui commented Oct 21, 2020

abdosi Sep 22, 2020 •

edited

Loading

abdosi commented Sep 23, 2020 •

edited

Loading