-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpd: Send "Peer De-configured" notification before closing the socket #5273
bgpd: Send "Peer De-configured" notification before closing the socket #5273
Conversation
🛑 Basic BGPD CI results: FAILUREResults table
For details, please contact louberger |
Here's the crash: |
Continuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotest tests on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-9587/test Topology Tests failed for Topotest tests on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9587/artifact/TOPOI386/ErrorLog/log_topotests.txt Topology tests on Ubuntu 16.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1604-9587/test Topology Tests failed for Topology tests on Ubuntu 16.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9587/artifact/TOPOU1604/ErrorLog/log_topotests.txt Topology tests on Ubuntu 18.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-9587/test Topology Tests failed for Topology tests on Ubuntu 18.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9587/artifact/TOPOU1804/ErrorLog/log_topotests.txt Successful on other platforms
Topology Tests memory analysis: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9587/artifact/TOPOI386/MemoryLeaks/
|
Instead of postponing the send notification at the end of bgp socket closure. Is it not fine to trigger bgp_notify_send(peer, BGP_NOTIFY_CEASE,BGP_NOTIFY_CEASE_PEER_UNCONFIG); in the config path where we handle the "no neighbour x.x.x.x" ? |
How are you sure it’s the latest “passenger” before closing the doors? In other words I explained this behavior in the commit. We must be sure it’s happening after timers are turned off. |
From the comments in code, bgp_stop() can get called multiple times for the same peer. When this happens, the peer ibuf and obuf gets freed. And while sending bgp_notify_send() reference to obuf happens which could be NULL and will cause seg-fault. This where the crash also indicated ? |
🛑 Basic BGPD CI results: FAILUREResults table
For details, please contact louberger |
Continuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopology tests on Ubuntu 18.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-9595/test Topology Tests failed for Topology tests on Ubuntu 18.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9595/artifact/TOPOU1804/ErrorLog/log_topotests.txt Topology tests on Ubuntu 16.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1604-9595/test Topology Tests failed for Topology tests on Ubuntu 16.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9595/artifact/TOPOU1604/ErrorLog/log_topotests.txt Topotest tests on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-9595/test Topology Tests failed for Topotest tests on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9595/artifact/TOPOI386/ErrorLog/log_topotests.txt Successful on other platforms
Topology Tests memory analysis: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9595/artifact/TOPOU1804/MemoryLeaks/
|
rfc4486 states: If a BGP speaker decides to de-configure a peer, then the speaker SHOULD send a NOTIFICATION message with the Error Code Cease and the Error Subcode "Peer De-configured". Before the patch: ~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' | \ jq '."192.168.0.2".lastNotificationReason' null ~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' | \ jq '."192.168.0.2".lastResetDueTo' "Peer closed the session" After the patch: ~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' | \ jq '."192.168.0.2".lastNotificationReason' "Cease/Peer Unconfigured" Closes #5262 Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
🛑 Basic BGPD CI results: FAILUREResults table
For details, please contact louberger |
figured out some issues, I will open an new PR with a fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to handle these cases while sending notification.
* when you send a notification, FSM Connect fires before | ||
* closing the socket and you still see `Peer closed the session`. | ||
*/ | ||
if (BGP_IS_VALID_STATE_FOR_NOTIF(peer->status)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending notification inside bgp_stop() would lead to generating a "BGP_Stop" event inside the function bgp_write_notify(), there by calling bgp_stop() again ?
BGP_XYZ_State --> Idle ----> Idle
This can be solved by moving the bgp stop event generation from bgp_write_notify() (--> BGP_EVENT_ADD(peer, BGP_Stop) )to the respective calling function.
@@ -1181,6 +1181,19 @@ int bgp_stop(struct peer *peer) | |||
} | |||
} | |||
|
|||
/* If a BGP speaker decides to de-configure a peer, then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bgp_stop() would be called when "BGP_Stop" even is raised. And "BGP_Stop" event can be raised at different states of bgp and for various reasons, not only for "Peer De-configured".
Eg clear bgp cmd
so we would end up
- sending "Peer De-configured" notification wrongly even when the peer is not delete.
- We might end up sending more than one notification message (pls see the below comment for the reason)
Continuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopology tests on Ubuntu 18.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1804-9598/test Topology Tests failed for Topology tests on Ubuntu 18.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9598/artifact/TOPOU1804/ErrorLog/log_topotests.txt Topotest tests on Ubuntu 16.04 i386: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOI386-9598/test Topology Tests failed for Topotest tests on Ubuntu 16.04 i386:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9598/artifact/TOPOI386/ErrorLog/log_topotests.txt Topology tests on Ubuntu 16.04 amd64: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-TOPOU1604-9598/test Topology Tests failed for Topology tests on Ubuntu 16.04 amd64:
see full log at https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9598/artifact/TOPOU1604/ErrorLog/log_topotests.txt Successful on other platforms
Topology Tests memory analysis: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9598/artifact/TOPOU1804/MemoryLeaks/
|
rfc4486 states:
If a BGP speaker decides to de-configure a peer, then the speaker
SHOULD send a NOTIFICATION message with the Error Code Cease and
the Error Subcode "Peer De-configured".
Before the patch:
~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' |
jq '."192.168.0.2".lastNotificationReason'
null
~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' |
jq '."192.168.0.2".lastResetDueTo'
"Peer closed the session"
After the patch:
~# vtysh -c 'show ip bgp neighbors 192.168.0.2 json' |
jq '."192.168.0.2".lastNotificationReason'
"Cease/Peer Unconfigured"
Closes #5262
Signed-off-by: Donatas Abraitis donatas.abraitis@gmail.com