-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra crashes when removing vrf #5369
Comments
FPM part was fixed here: #5361 We probably need to backport it to 7.2. |
@rzalamena Is that all? It seems my problem wasn't fixed at that PR. In my test, there is no zebra shutdown in running, just remove vrf device in kernel when FPM connected. If that vrf has routes, zebra could crash. |
@tylerlinp you are right. Sorry I recognized the last lines of the stack and assumed that was happening during I managed to reproduce the problem myself and here is what address sanitizer complains:
To reproduce I just created a VRF, put an interface on it, installed some routes via BGP and opened a socket with FPM only discarding data. After that I deleted the VRF and the crash happened. |
@rzalamena Yes, you reproduced it. Thank you. |
@tylerlinp it seems that this PR fixes the problem more precisely: #5553 . |
@rzalamena That is not enough. The RTM_DELROUTE message for ff00::/8 comes after RTM_NEWLINK vrf-change for Ethernet8. The vrf_id gets from ifindex in RTM_DELROUTE message, so it changes as vrf-change. So it is needed to update all routes nexthops in rib when handling vrf-change message. The worst thing is that handling is asynchronous, maybe nexthop vrf hasn't been updated when to delete the route, then compare gets not same. |
@tylerlinp thanks for the feedback. Meanwhile we don't get a definite fix for this, would you be willing to try my new FPM module that uses data plane framework? This new module doesn't the same race problem as the old one (it uses dplane_ctx that only gets PR: #5510 . |
After updating frr to 7.2(
514f508
) in SONiC, zebra crashes when removing vrf. I think there should be two bugs.The implement in SONiC ensure removing routes before vrf, so the first bug can avoid in frr7.1. But with second bug, vrf is not empty at deleting, so the first bug appears.
I created PR#5368 to fix bug 2.
The first bug, we can see the dest have two flags, RIB_DEST_UPDATE_FPM and RIB_DEST_SENT_TO_FPM, in function rib_gc_dest, we can see if these flags exist, the dest can not been deleted. But when vrf remove, they are freed regardless of these flags. The difficult is they want to free the vrf here, if we keep the dest for fpm, the vrf remove should been delayed until all of these dests freed after fpm done.
The text was updated successfully, but these errors were encountered: