-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: fix wrong nexthop check for kernel routes #13808
zebra: fix wrong nexthop check for kernel routes #13808
Conversation
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 3: Failed (click for details)Topotests Ubuntu 18.04 amd64 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12377/artifact/TP3U1804AMD64/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TP3U1804AMD64-12377/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 3 Successful on other platforms/tests
|
ci:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-12381/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Successful on other platforms/tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When changing one interface's vrf, the kernel routes are wrongly kept
in old vrf. Finally, the forwarding table in that old vrf can't forward
traffic correctly for those residual entries.
This seems should be easy to reproduce in topotests. Can we do this?
ci:rerun |
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12389/ This is a comment from an automated CI system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - and I agree with Donatas, that the vrf-change issue would be good to capture in a test?
@Mergifyio backport dev/9.0 |
✅ Backports have been created
|
Happy to try it but maybe need more time to finish the test case for me. |
r1.run("ip link set dev r1-eth0 master {}".format(vrf)) | ||
test_func = partial( | ||
topotest.router_json_cmp, r1, "show ip route 3.5.1.0/24 json", expected) | ||
_, result = topotest.run_and_expect(test_func, None, count=5, wait=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be not None
I think, otherwise we don't wait 5 seconds and assert immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have always tested this test before pushed, it should fail with old code and succeed with new code fixed by this PR.
You mentioned two issues:
- Yes, it should be
(count=1, wait=5)
for that. - No, it really means
result
. After interface is moved,show ip route <Net> json
should be different with origin. So,result
should be not None.
Anyway, it is not good. So i re-modified it to be clear, and re-pushed, thanks.
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12544/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Topotests debian 10 amd64 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6DEB10AMD64-12544/test Topology Tests failed for Topotests debian 10 amd64 part 6 Successful on other platforms/tests
|
0ce1528
to
5c3e07a
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-12561/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Topotests Ubuntu 18.04 amd64 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6U18AMD64-12561/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 6 Topotests debian 10 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-12561/test Topology Tests failed for Topotests debian 10 amd64 part 9 Successful on other platforms/tests
|
5c3e07a
to
02d1359
Compare
Just rebased it. |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12574/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Successful on other platforms/tests
|
02d1359
to
7454d55
Compare
7454d55
to
7b8fb6c
Compare
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12626/ This is a comment from an automated CI system. |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedAddresssanitizer topotests part 4: Failed (click for details)
Addresssanitizer topotests part 4: Unknown Log Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-ASANP4-12636/test Topology Tests failed for Addresssanitizer topotests part 4 Successful on other platforms/tests
|
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12647/ This is a comment from an automated CI system. |
r1.run("ip link add {} type vrf table 1".format(vrf)) | ||
r1.run("ip link set {} up".format(vrf)) | ||
r1.run("ip link set dev r1-eth0 master {}".format(vrf)) | ||
sleep(4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sleep() is denied in topotests, Use run_and_expect patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
7b8fb6c
to
52dbe33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, apply frrbot formatting.
test_func = partial( | ||
topotest.router_output_cmp, r1, "show ip route 3.5.1.0/24 json", expected | ||
) | ||
ok, diff = topotest.run_and_expect(test_func, "", count=5, wait=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same pattern _, result = topotest.run_and_expect(test_func, None, count=5, wait=1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the related code. Here the return value of topotest.router_output_cmp
should be result, diff
like other places. Done, thanks.
Continuous Integration Result: SUCCESSFULContinuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12671/ This is a comment from an automated CI system. |
52dbe33
to
8163306
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 arm8 part 8: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 8: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12679/artifact/TOPO8U18AMD64/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 8: No useful log foundSuccessful on other platforms/tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with the zebra change - I know Donatas may have had a question about the topotest?
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12689/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12689/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Successful on other platforms/tests
|
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 arm8 part 8: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 8: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12690/artifact/TOPO8U18AMD64/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 8: No useful log foundSuccessful on other platforms/tests
|
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12691/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Successful on other platforms/tests
|
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12693/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Topotests Ubuntu 18.04 i386 part 1: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO1U18I386-12693/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 1 Successful on other platforms/tests
|
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDTest incomplete. See below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: IncompleteTopotests debian 10 amd64 part 1: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO1DEB10AMD64-12694/test Topology Tests failed for Topotests debian 10 amd64 part 1 Addresssanitizer topotests part 4: Incomplete(check logs for details)Topotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12694/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Successful on other platforms/tests
|
Continuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-12703/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Successful on other platforms/tests
|
There are relaxed nexthop requirements for kernel routes because we trust kernel routes. Two minor changes for kernel routes: 1. `if_is_up()` is one of the necessary conditions for `if_is_operative()`. Here, we can remove this unnecessary check for clarity. 2. Since `nexthop_active()` doesn't distinguish whether it is kernel route, modified the corresponding comment in it. Signed-off-by: anlan_cs <vic.lan@pica8.com>
When changing one interface's vrf, the kernel routes are wrongly kept in old vrf. Finally, the forwarding table in that old vrf can't forward traffic correctly for those residual entries. Follow these steps to make this problem happen: ( Firstly, "x1" interface of default vrf is with address of "6.6.6.6/24". ) ``` anlan# ip route add 4.4.4.0/24 via 6.6.6.8 dev x1 anlan# ip link add vrf1 type vrf table 1 anlan# ip link set vrf1 up anlan# ip link set x1 master vrf1 ``` Then check `show ip route`, the route of "4.4.4.0/24" is still selected in default vrf. If the interface goes down, the kernel routes will be reevaluated. Those kernel routes with active interface of nexthop can be kept no change, it is a fast path. Otherwise, it enters into slow path to do careful examination on this nexthop. After the interface's vrf had been changed into new vrf, the down message of this interface came. It means the interface is not in old vrf although it still exists during that checking, so the kernel routes should be dropped after this nexthop matching against a default route in slow path. But, in current code they are wrongly kept in fast path for not checking vrf. So, modified the checking active nexthop with vrf comparision for the interface during reevaluation. Signed-off-by: anlan_cs <vic.lan@pica8.com>
Check `show ip route` for specific kernel routes after the interface as their nexthop changes vrf. After moving interface's vrf, there should be no kernel route in old vrf. Signed-off-by: anlan_cs <vic.lan@pica8.com>
8163306
to
019ac03
Compare
CI always failed for unrelated cases, so just re-based it. |
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12704/ This is a comment from an automated CI system. |
zebra: fix wrong nexthop check for kernel routes (backport #13808)
Two commits:
zebra: Remove unnecessary condition check for kernel routes
There are relaxed nexthop requirements for kernel routes because we
trust kernel routes.
Two minor changes for kernel routes:
if_is_up()
is one of the necessary conditions forif_is_operative()
.Here, we can remove this unnecessary check for clarity.
Since
nexthop_active()
doesn't distinguish whether it is kernel route,modified the corresponding comment in it.
zebra: fix wrong nexthop check for kernel routes
When changing one interface's vrf, the kernel routes are wrongly kept
in old vrf. Finally, the forwarding table in that old vrf can't forward
traffic correctly for those residual entries.
Follow these steps to make this problem happen:
( Firstly, "x1" interface of default vrf is with address of "6.6.6.6/24". )
Then check
show ip route
, the route of "4.4.4.0/24" is still selectedin default vrf.
If the interface goes down, the kernel routes will be reevaluated. Those
kernel routes with active interface of nexthop can be kept no change, it
is a fast path. Otherwise, it enters into slow path to do careful examination
on this nexthop.
After the interface's vrf had been changed into new vrf, the down message of
this interface came. It means the interface is not in old vrf although it
still exists during that checking, so the kernel routes should be dropped
after this nexthop matching against a default route in slow path. But, in
current code they are wrongly kept in fast path for not checking vrf.
So, modified the checking active nexthop with vrf comparision for the interface
during reevaluation.