Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new BGP Monitor Daemon to pull BGP peer state and store in State DB for MIB consumption #1429

Closed
wants to merge 8 commits into from

Conversation

gechiang
Copy link
Contributor

@gechiang gechiang commented Sep 5, 2020

Added bgpmon (BGP Monitor Daemon) support to assist gathering BGP related information periodically based on BGP activity detection to store those information into State DB. For components such as SNMP agent that uses this new BGP related DB in state DB to support the MIB operation.
This PR has the implementation of the bgpmon processing code. There is another PR (sonic-net/sonic-buildimage#5329) needed in the dockers/docker-fpm-frr to start up this bgpmon inside of the BGP docker.
Both of these PRs are needed to achieve the full functionality of bgpmon running inside of BGP docker.

The HLD doc PR can be found here:

The following PR (SNMP Agent Changes to use State DB for BGP MIB )has a dependency on this PR:

Signed-off-by: Gen-Hwa Chiang gechiang@microsoft.com

- What I did

Added a new daemon to be run under BGP docker periodically to gather the BGP peer state information and update them in
the state DB. In order not to continuously running even when there are no BGP peer activities, this new daemon will use the BGP frr.log file timestamp as an indicator whether there are any activities in BGP that warrants it to inspect the peer state info for possible updates.

- How I did it

  • Start a periodic timer to get triggered every 15 second.
  • Check if there is a change in the timestamp of the frr.log file against a cached timestamp from the previous update.
  • Request the back-end handler (FRR/Zebra) to dump out the "show bgp summary json"
  • For each peer state that it just received, check it against the previous state that was cached and if there is a change or this is new peer, update/add the corresponding state DB table entry.
  • At the end of processing all the peer info, check if there are any stale peer entry present and delete the corresponding state DB entry accordingly.

- How to verify it
For IPV4 You can perform "show ip bgp summary" and use that output to check against the state DB for each peer.
The state DB key would be something like "NEIGH_STATE_TABLE|10.0.0.33" where 10.0.0.33 is the peer ip address.
you can then read out the state information by issuing the redis cmd hgetall "NEIGH_STATE_TABLE|10.0.0.33". Compare that with the corresponding output of "show ip bgp summary".
Here is a sample output from the State DB:
admin@str-s6000-acs-8:~$ redis-cli -n 6
127.0.0.1:6379[6]> keys "NEIGH*"

  1. "NEIGH_STATE_TABLE|fc00::1a"
  2. "NEIGH_STATE_TABLE|fc00::32"
  3. "NEIGH_STATE_TABLE|10.0.0.33"
  4. "NEIGH_STATE_TABLE|fc00::66"
  5. "NEIGH_STATE_TABLE|fc00::62"
  6. "NEIGH_STATE_TABLE|fc00::46"
  7. "NEIGH_STATE_TABLE|10.0.0.35"
  8. "NEIGH_STATE_TABLE|10.0.0.53"
  9. "NEIGH_STATE_TABLE|10.0.0.47"
  10. "NEIGH_STATE_TABLE|10.0.0.9"
  11. "NEIGH_STATE_TABLE|fc00::6e"
  12. "NEIGH_STATE_TABLE|10.0.0.41"
  13. "NEIGH_STATE_TABLE|10.0.0.63"
  14. "NEIGH_STATE_TABLE|10.0.0.17"
  15. "NEIGH_STATE_TABLE|fc00::4a"
  16. "NEIGH_STATE_TABLE|fc00::52"
  17. "NEIGH_STATE_TABLE|fc00::5e"
  18. "NEIGH_STATE_TABLE|10.0.0.29"
  19. "NEIGH_STATE_TABLE|10.0.0.45"
  20. "NEIGH_STATE_TABLE|fc00::2a"
  21. "NEIGH_STATE_TABLE|fc00::42"
  22. "NEIGH_STATE_TABLE|10.0.0.1"
  23. "NEIGH_STATE_TABLE|10.0.0.59"
    127.0.0.1:6379[6]> hgetall "NEIGH_STATE_TABLE|10.0.0.1"
  24. "state"
  25. "Established"
    127.0.0.1:6379[6]>

you can do the same for ipv6 by performing "show ipv6 bgp summary" and follow the same steps above to validate with the corresponding peer states stored in state DB.

…ker to assist storing some BGP DB info into State DB for others to consume
…ker to assist storing some BGP DB info into State DB for others to consume
@lgtm-com
Copy link

lgtm-com bot commented Sep 5, 2020

This pull request introduces 5 alerts when merging f805107 into 65f63c1 - view on LGTM.com

new alerts:

  • 5 for Unused import

…ker to assist storing some BGP DB info into State DB for others to consume
@gechiang
Copy link
Contributor Author

gechiang commented Sep 6, 2020

retest lgtm please

fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
if self.ipv4_n_state[neighb] != self.new_ipv4_n_state[neighb]:
# state changed. Update state DB for this entry
state = self.new_ipv4_n_state[neighb]
self.db.set(self.db.STATE_DB, key, 'state', state)
Copy link
Contributor

@qiluo-msft qiluo-msft Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set [](start = 28, length = 3)

Suggest to use https://redis.io/commands/expire in case this process crashes or always fails.

If this is the right direction, "only update the entry if sate changed" may be not that important since we use pipeline and there is only one transaction. #WontFix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Since this StateDB is required by SNMP proxy agent and potential future consumers, if this process always fails, it will not matter what the BGP StateDB entries gets properly deleted or not... those agents will not operate correctly until bgpmon daemon is healthy again. If my understanding is correct after reading through the "expire" documentation it will require bgpmon to periodically update the expire to ensure those BGP key states does not get expired/deleted when there is no changes needed. Since we want to favor the design for the steady state condition (no more peer state changes in steady state), we should not use "expire" or else bgpmon in steady state will end up having to periodically update the state DB even there are no state changes. In this desing, bgpmon when restart will force the state DB to clean up and provide the most updated state from the new snapshot it gathered from BGP/Zebra. So StateDB clean up is taken cared of when necessary to prevent stale entries from staying in the state DB. Also, using "expire" means someone (Redis) will need to keep track of the time for each key. This operation is not free no matter how efficient Redis implements it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume there is a bug inside this daemon and it is stuck somewhere. It is better to expire and propagate this error to monitors. In that situation, it's dangerous to keep as false stable.


In reply to: 484678284 [](ancestors = 484678284)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Per our phone discussion I think we agreed that this is ok for now.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

…host ip when connect stateDB, moved cleaning snapshot DS only when no exception, removed unused log and debug counter
fpmsyncd/bgpmon.py Outdated Show resolved Hide resolved
return True

# Get a new snapshot of BGP neighbors and store them in the "new" location
def get_all_neigh_states(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_all_neigh_states [](start = 8, length = 20)

Could you add some unit test or vs test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Will add unit test in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer adding in the this PR if you can. It even benefit yourself in iterations.

… error code, use Redis pipeline for stateDB operations
@lgtm-com
Copy link

lgtm-com bot commented Sep 16, 2020

This pull request introduces 1 alert when merging 85dce8e into f041c84 - view on LGTM.com

new alerts:

  • 1 for Unused import

self.peer_state[peer] = state
if pipe_count > PIPE_BATCH_MAX_COUNT:
self.flush_pipe(data)
data = {}
Copy link
Contributor

@qiluo-msft qiluo-msft Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move data = {} into flush_pipe() function body #Closed

Copy link
Contributor Author

@gechiang gechiang Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Agreed. unlike "c", python pass the object instead of copy the value so it make sense to clear the data{} inside this flush_pipe() method.

data = {}
pipe_count = 0
# If anything in the pipeline not yet flushed, flush them now
if pipe_count > 0:
Copy link
Contributor

@qiluo-msft qiluo-msft Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if pipe_count > 0:
if len(data) > 0:
``` #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiluo-msft Agreed. There is no need to introduce the extra counter "pipe_count" since python dict implementation maintains an item count that can be accessed with O(1) time so it make sense to use the len(data) instead.

return True

# Get a new snapshot of BGP neighbors and store them in the "new" location
def get_all_neigh_states(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer adding in the this PR if you can. It even benefit yourself in iterations.

Copy link
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except missing test

@gechiang
Copy link
Contributor Author

Based on offline comment from Guohan, the bgpmon code should not reside in this repo. Based on its functionality it should be moved to under sonic-buildimage/src/sonic-bgpcfgd .
I am closing this PR and have moved this same code to the following PR:
sonic-net/sonic-buildimage#5329
Please use the above for further review.
There will be no more activities on this PR.

@gechiang gechiang closed this Sep 18, 2020
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Warn user while deleting VLAN if it has IP addresses.
Signed-off-by: d-dashkov <Dmytro_Dashkov@Jabil.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants