-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: nexthops compare vrf only if ip type #5368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution to FRR!
- One of your commits is missing a
Signed-off-by
line; we can't accept your contribution until all of your commits have one - One of your commits does not have a blank line between the summary and body; this will break
git log --oneline
If you are a new contributor to FRR, please see our contributing guidelines.
@@ -120,6 +114,12 @@ static int _nexthop_cmp_no_labels(const struct nexthop *next1, | |||
switch (next1->type) { | |||
case NEXTHOP_TYPE_IPV4: | |||
case NEXTHOP_TYPE_IPV6: | |||
if (next1->vrf_id < next2->vrf_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if vrf is used not only with IPv4/IPv6 nexthop, but nexthop as interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ifindex given, vrf is a redundant term, actually it is get from ifindex, further more interface may change vrf, then RTM_DELROUTE handling failed, netlink message doesn't contain nexthop vrf either.
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
To compare nexthops, if given ifindex, it is enough to compare ifindex, the vrf is get from ifindex, and ifindex is more reliable. For blackhole, I think it is a special interface, vrf may be useless. So only type ip need to compare vrf. Signed-off-by: Tyler Li <tyler.li@mediatek.com>
💚 Basic BGPD CI results: SUCCESS, 0 tests failedResults table
For details, please contact louberger |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9768/ This is a comment from an automated CI system. clang_check |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-FRRPULLREQ-9769/ This is a comment from an automated CI system. clang_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems, to me, that moving this code would allow processes to delete routes they shouldn't be deleting... I think this is the wrong place to fix this problem.
Okay, I took a couple hours to look at this one. So this is only an issue with kernel routes. And it seems this specific case is fixed with current master with all cases but It was most likely fixed due to #5184 since we reprocess the kernel routes on the interface event.
However, with nexthops of type Therefore, the fix might actually be to do this (untested):
BUT we do have a second bug actually: We don't handle deletes from the kernel when the nexthop vrf is specified, regardless of whether it changed.
and its still present in the rib:
and this seems to be because we are not passing the nexthop's vrf_id into
which is not being set in the netlink code. The
|
So two commits are probably needed to fix this one.
That should resolve the issue without having to change the nexthop comparison function. |
The root problem is why we need vrf_id if nexthop ifindex given? We know if only nexthop ip given, we need vrf_id for lookup the outer interface, and then send nexthop ip and outer interface in netlink. (We can see the function nexthop_same_firsthop, what we really need is nexthop ip and outer interface)But if ifindex given, it is enough for us, the vrf_id is redundant. Keep a vrf_id in this type only makes us do more works useless or harmfully(if we doing something wrong like this problem). So I think Tyler‘s modifies is better. |
@ryan44guo I'm not confident enough to answer whether the We no longer receive explicit route deletes from the kernel on interface events that cause routes to be removed (they are silently deleted and we are expected to know). ex) This is with this patch added to current master on a 4.15 kernel:
The route with nexthop of type |
@sworleys Yes, you are right. My patch can not fix the issue in your example. Then kernel doesn't send RTM_DELROUTE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I too think it is not the correct place to fix this issue. It might be fixing one issue but opening ends for other issues to come.
If the kernel doesn't send DELROUTE, this is the kernel's bug, it should send DELROUTE to netlink if it had sent ADDROUTE to. That is 2 separated problem, although this modifier can make this route inactive. But make route inactive is not equal to remove route, so that is not a correct workaround. We can assume if the interface bind back to the origin vrf, in frr logic, the route can active again, but does kernel do the same thing? To the root reason, it is a vrf+route thing, it can not be repaired by nexthop vrf stuff. |
Can you give an example? We only have an example that processes not delete routes which should be deleted now. |
Actually, no it was intentionally done in the kernel to limit route notifications on interface events from what I have been told. |
If so, we can assume that is their design, although I doubt that is a good design. You know what they did is removing static routes which nexthop point to down interface, so when interface up, it can not been added back. I think that is not what we hope to do. When change upper(vrf), if seems a down and an up, so the static route is lost. |
@ryan44guo I added in #5184 to handle the lack of explicit RTM_DELROUTE for kernel routes. I shared my suggestions earlier to add to this patch so that we can handle the vrf changing case. I tested it and it seems to work on my 4.19 kernel.
If we add (2) I think we will no longer need tyler's change with the vrf_id though. |
It might be worth discussing this in our slack instead. If you aren't a member, please join by clicking the slack icon and self-inviting yourself https://frrouting.org/#participate We can discuss this further in the #development channel or a private group one with everyone |
|
partially fixed by #5553 |
Using frr7.2, I encounterd a problem that zebra crashes when removing vrf(I would create an issue). And found when changing an interface vrf, route ff00::/8 deletes failed because of nexthop vrf changed.
To compare nexthops, if given ifindex, it is enough to compare ifindex, the vrf is get from ifindex, and ifindex is more reliable. For blackhole, I think it is a special interface, vrf may be useless. So only type ip need to compare vrf.
logs: