Skip to content

Commit d680ce2

Browse files
authored
[neighsyncd] increase neighbor syncd restore timeout to 110 seconds (#745)
* [neighsyncd] increase neighbor syncd restore timeout to 120 seconds Neighbor syncd is restoring important information for teamd and BGP. our timeout should not be shorter than the down stream service. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * [restore_neighbor] improve restore neighbor timeouts Try to get the bgp timeout and use it for restoring neighbor timeout. When unavailable, use default 110 seconds. Signed-off-by: Ying Xie <ying.xie@microsoft.com> * Set default values according group discussion result - restore_neighbors.py timeout at 110 seconds due to observed requirement of greater than 70 seconds. - neighbor syncd timeout at 120 seconds (longer than 110 seconds). Signed-off-by: Ying Xie <ying.xie@microsoft.com>
1 parent b78cc8d commit d680ce2

File tree

2 files changed

+9
-8
lines changed

2 files changed

+9
-8
lines changed

neighsyncd/neighsync.h

+2-2
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@
1111

1212
/*
1313
* This is the timer value (in seconds) that the neighsyncd waits for restore_neighbors
14-
* service to finish, should be longer than the restore_neighbors timeout value (60)
14+
* service to finish, should be longer than the restore_neighbors timeout value (110)
1515
* This should not happen, if happens, system is in a unknown state, we should exit.
1616
*/
17-
#define RESTORE_NEIGH_WAIT_TIME_OUT 70
17+
#define RESTORE_NEIGH_WAIT_TIME_OUT 120
1818

1919
namespace swss {
2020

neighsyncd/restore_neighbors.py

+7-6
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,12 @@
3030
logger.setLevel(logging.WARNING)
3131
logger.addHandler(logging.NullHandler())
3232

33-
# timeout the restore process in 1 min if not finished
33+
# timeout the restore process in 110 seconds if not finished
3434
# This is mostly to wait for interfaces to be created and up after system warm-reboot
3535
# and this process is started by supervisord in swss docker.
36-
# It would be good to keep that time below routing reconciliation time-out.
37-
TIME_OUT = 60
36+
# There had been devices taking close to 70 seconds to complete restoration, setting
37+
# default timeout to 110 seconds.
38+
DEF_TIME_OUT = 110
3839

3940
# every 5 seconds to check interfaces states
4041
CHECK_INTERVAL = 5
@@ -189,13 +190,13 @@ def set_statedb_neigh_restore_done():
189190
# Once all the entries are restored, this function is returned.
190191
# The interfaces' states were checked in a loop with an interval (CHECK_INTERVAL)
191192
# The function will timeout in case interfaces' states never meet the condition
192-
# after some time (TIME_OUT).
193-
def restore_update_kernel_neighbors(intf_neigh_map):
193+
# after some time (DEF_TIME_OUT).
194+
def restore_update_kernel_neighbors(intf_neigh_map, timeout=DEF_TIME_OUT):
194195
# create object for netlink calls to kernel
195196
ipclass = IPRoute()
196197
mtime = monotonic.time.time
197198
start_time = mtime()
198-
while (mtime() - start_time) < TIME_OUT:
199+
while (mtime() - start_time) < timeout:
199200
for intf, family_neigh_map in intf_neigh_map.items():
200201
# only try to restore to kernel when link is up
201202
if is_intf_oper_state_up(intf):

0 commit comments

Comments
 (0)