Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tests: Attempt to fix bgp_l3vpn_to_direct timing issues
The bgp_l3vpn_to_direct test is failing sometimes because the 2.2.2.2 route is dissapearing. What is happening? The log file for the failed test run shows us this: build 15-Oct-2021 07:26:12 scripts/adjacencies.py:8 WAIT:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping:60:0.5: build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:8 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping: build 15-Oct-2021 07:26:12 COMMAND OUTPUT:PING 2.2.2.2 (2.2.2.2) 56(84) bytes of data. build 15-Oct-2021 07:26:12 64 bytes from 2.2.2.2: icmp_seq=1 ttl=64 time=0.143 ms build 15-Oct-2021 07:26:12 build 15-Oct-2021 07:26:12 --- 2.2.2.2 ping statistics --- build 15-Oct-2021 07:26:12 1 packets transmitted, 1 received, 0% packet loss, time 0ms build 15-Oct-2021 07:26:12 rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms: build 15-Oct-2021 07:26:12 Done after 1 loops, time=0.024507761001586914, Found= 0% packet loss build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:9 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:pass:PE->P2 (loopback) ping +0.02 secs: build 15-Oct-2021 07:26:12 2021-10-15 14:26:12,446 WARNING: topolog.r4: LinuxNamespace(r4): proc failed: rc 2 pid 28826 build 15-Oct-2021 07:26:12 args: /usr/bin/nsenter -a -t 27444 -F --wd=/tmp/topotests/bgp_l3vpn_to_bgp_direct.test_bgp_l3vpn_to_bgp_direct/r4 /bin/bash -c ping 2.2.2.2 -c 1 build 15-Oct-2021 07:26:12 stdout: connect: Network is unreachable: build 15-Oct-2021 07:26:17 COMMAND OUTPUT:connect: Network is unreachable: build 15-Oct-2021 07:26:17 R:9 r4 PE->P2 (loopback) ping +0.02 secs 0 1 So the 2.2.2.2 route is coming/going and is failing on these test lines: luCommand( "r1", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60 ) luCommand( "r3", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60 ) luCommand( "r4", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60 ) So the 2.2.2.2 routes on r1,3 and 4 are received via ospf, but are modified by some other process to add labels ( probably ldp, since it is running too ). The 2nd ping to 2.2.2.2 is failing because the 2.2.2.2 route on r4 is being replaced. As an example here is `ip monitor all` on r4 during boot up. Please note timestamps are not necessarily representative of what we will see on the loaded ci system. [2021-10-15T15:46:52.261456] [NEXTHOP]id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra [2021-10-15T15:46:52.261490] [ROUTE]2.2.2.2 nhid 27 via 10.0.2.2 dev r4-eth0 proto ospf metric 20 <snip> [2021-10-15T15:46:53.556405] [NEXTHOP]Deleted id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra <snip> [2021-10-15T15:46:53.566575] [NEXTHOP]id 32 via 10.0.2.2 dev r4-eth0 scope link proto zebra [2021-10-15T15:46:53.566585] [ROUTE]2.2.2.2 nhid 32 via 10.0.2.2 dev r4-eth0 proto ospf metric 20 For a small amount of time the route was *gone*. I believe the upstream CI system hits that window sometimes, causing the test to fail. This patch attempts to ensure that the 2.2.2.2 route should be learned appropriately ( thus slowing it down ) before the test moves onto the ping. I suspect the long term answer might be to add a test to the scripts/adjancies.py script to ensure that the test does not continue until the appropriate label is in place, but I want to make the test run a bit more perscriptive in what it is looking for here. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
- Loading branch information