-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bgpmon under sonic-bgpcfgd to be started as a new daemon under BG… #5426
Conversation
I see #5329 lack unit test or vs test. So for the backporting, it is also lack of test. |
@abdosi - FYI |
# Check for stale state entries to be cleaned up | ||
while len(self.peer_l) > 0: | ||
# remove this from the stateDB and the current nighbor state entry | ||
peer = self.peer_l.pop(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gechiang Can't we juse pop here ? if we want to pop from front then queue is better data structure than using list from performance perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi Agreed. There is no need to pop from the front at all. The purpose is to pop one item from this list and handle it for the peer deleted case. We should address this next time we have a need to touch this file to enhance the performance. In most of our "normal" usage the neighbors are not removed so this code segment is not expected to be exercised regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gechiang Thanks but why can't this be addressed now ? It should be small change and quick verification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi Sure. I will pull another PR for this shortly. This PR was already merged before your comments arrived.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gechiang Thanks, I meant in new PR only.
del_key = "NEIGH_STATE_TABLE|%s" % peer | ||
data[del_key] = None | ||
del self.peer_state[peer] | ||
if len(data) > PIPE_BATCH_MAX_COUNT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gechiang do we need this if check in while loop ? even if this check is not there below if condition should take care of flushing data ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abdosi This check is required. The purpose of using pipeline is to be able to batch to reduce transaction overhead. But at the same time the server will also need to be batching the responses that depends on this batch depth and this batching will effectively increase the use/hold of memory from the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gechiang ok, so want to keep flushing in batches of PIPE_BATCH_MAX_COUNT for good server performance.
@gechiang Please update https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-fpm-frr/critical_processes |
@abdosi @lguohan When bgpmon fail to run it has no impact to the BGP functionality. The only impact it will have is to other potential components that consume the state DB that bgpmon produces. So it depends on the definition of what constitute to be a critical process for the BGP docker. If all agree I can certainly add it to be part of the critical processes. |
@gechiang In my opinion this is critical process as SNMP will use it for polling Device BGP health. |
i am thinking it is not critical, we can do autorestart in the supervisord configuration, but not restart bgp process. |
ok, that is better than container auto-restart. Supervisor auto-restart should be good for this process |
@gechiang, should this PR be closed or stay? Thanks. |
This is to port the same change from Master branch "https://github.com/Azure/sonic-buildimage/pull/5329/files" to 201911 branch as the way to start up a daemon in 201911 branch differs from master branch.
This PR is to auto start the bgpmon (BGP Monitor) Daemon under the BGP docker.
The purpose of bgpmon (BGP Monitor daemon) is to assist gathering BGP related information periodically based on BGP activity detection to store BGP Neighbor information into State DB. For components such as SNMP agent that uses this new BGP related DB in state DB to support the MIB operation.
This PR is to auto start the bgpmon daemon under the BGP docker. There is another PR which I originally used to implements the bgpmon processing code (sonic-net/sonic-swss#1429) that was later decided to be moved into this PR instead of under the sonic-swss repo. Many valuable comments are still in that PR so reader should refer to that PR for review comments history purpose only.
Note: This new daemon is also added to be monitored by Monit so that in case it stops running it will be reported.
The HLD doc PR can be found here:
The following PR (SNMP Agent Changes to use State DB for BGP MIB )has a dependency on this PR:
The following PR contains the Pytest code that validate this feature functionality whenever VS Image test is run as part of all PRs as well as used in DUT pytest:
-[Azure-sonic-mgmt] (sonic-net/sonic-mgmt#2241)
Signed-off-by: Gen-Hwa Chiang gechiang@microsoft.com
- What I did
Added a new daemon to be run under BGP docker periodically to gather the BGP peer state information and update them in
the state DB. In order not to continuously running even when there are no BGP peer activities, this new daemon will use the BGP frr.log file timestamp as an indicator whether there are any activities in BGP that warrants it to inspect the peer state info for possible updates.
- How I did it
- How to verify it
once the BGP docker is up and running, you can perform "show services" to see that this new daemon "bgpmon.py" is running under the BGP docker service similar to the following output:
For IPV4 You can perform "show ip bgp summary" and use that output to check against the state DB for each peer.
The state DB key would be something like "NEIGH_STATE_TABLE|10.0.0.33" where 10.0.0.33 is the peer ip address.
you can then read out the state information by issuing the redis cmd hgetall "NEIGH_STATE_TABLE|10.0.0.33". Compare that with the corresponding output of "show ip bgp summary".
Here is a sample output from the State DB:
admin@str-s6000-acs-8:~$ redis-cli -n 6
127.0.0.1:6379[6]> keys "NEIGH*"
127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.1"
127.0.0.1:6379[6]>
you can do the same for ipv6 by performing "show ipv6 bgp summary" and follow the same steps above to validate with the corresponding peer states stored in state DB.