-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ip route in the kernel does not match routes in bgp #5026
Comments
fixes #5026 Explanation: In the log from the issue I found: ``` I see following in the log Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Analyzing source code I found that the error message could be issues only when `zclient_send_rnh()` return less than 0. ``` ret = zclient_send_rnh(zclient, command, p, exact_match, bnc->bgp->vrf_id); /* TBD: handle the failure */ if (ret < 0) flog_warn(EC_BGP_ZEBRA_SEND, "sendmsg_nexthop: zclient_send_message() failed"); ``` I checked [zclient_send_rnh()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L654) and found that this function will return the exit code which the function gets from [zclient_send_message()](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L266) But the latter function could return not 0 in two cases: 1. bgpd didn’t connect to the zclient socket yet [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L269) 2. The socket was closed. But in this case we would receive the error message in the log. (And I can find the message in the log when we reboot sonic) [code](https://github.com/Azure/sonic-frr/blob/88351c8f6df5450e9098f773813738f62abb2f5e/lib/zclient.c#L277) Also I see from the logs that client connection was set later we had the issue in bgpd. Bgpd.log ``` Jul 22 21:13:06.574831 vlab-01 WARNING bgp#bgpd[49]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed ``` Vs Zebra.log ``` Jul 22 21:13:12.713249 vlab-01 NOTICE bgp#zebra[48]: client 25 says hello and bids fair to announce only static routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 30 says hello and bids fair to announce only bgp routes vrf=0 Jul 22 21:13:12.820352 vlab-01 NOTICE bgp#zebra[48]: client 33 says hello and bids fair to announce only vnc routes vrf=0 ``` So in our case we should start zebra first. Wait until it is started and then start bgpd and other daemons. **- How I did it** I changed a graph to start daemons in the following order: 1. First start zebra 2. Then starts staticd and bgpd 3. Then starts vtysh -b and bgpeoi after bgpd is started.
@lguohan 27/08/2020 05:37:21 DEBUG devices.py:_run:85: /var/johnar/sonic-mgmt/tests/common/devices.py::get_ip_route_info#617: [str-msn2700-01] AnsibleModule::command Result => {"stderr_lines": [], "cmd": ["ip", "route", "list", "exact", "0.0.0.0/0"], "end": "2020-08-27 05:37:21.527655", "_ansible_no_log": false, "stdout": "default dev Loopback0 scope link src 10.1.0.32 ", "changed": true, "rc": 0, "start": "2020-08-27 05:37:21.522416", "stderr": "", "delta": "0:00:00.005239", "invocation": {"module_args": {"warn": true, "executable": null, "_uses_shell": false, "strip_empty_ends": true, "_raw_params": "ip route list exact 0.0.0.0/0", "removes": null, "argv": null, "creates": null, "chdir": null, "stdin_add_newline": true, "stdin": null}}, "stdout_lines": ["default dev Loopback0 scope link src 10.1.0.32 "], "failed": false} |
@stephengzh Thank you for your update. Do you have bgp and zebra logs for the moment, when default route test stopped working? |
@stephengzh You don't see parsed info, because your raw info doesn't have both src or nexthops information. |
@pavel-shirshov These two files are empty files. |
But in this case there were no bgp sessions established. I'd say frr even didn't start working. |
It seems that there is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within this 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. |
Fix #5026 There is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within these 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. To avoid such a scenario, bgpd should start after zebra is ready to accept connections.
Fix #5026 There is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within these 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. To avoid such a scenario, bgpd should start after zebra is ready to accept connections.
Description
Steps to reproduce the issue:
Describe the results you received:
check ip route.
check ip bgp sum
show ip bgp will give you lots of routes.
something is wrong with frr
Describe the results you expected:
ip route in the kernel should around 32000
Additional information you deem important:
The text was updated successfully, but these errors were encountered: