Increased NSE expiration time might cause traffic disturbance #438
Labels
area/networking
component/proxy
component/stateless-lb
kind/bug
Something isn't working
priority/medium
Describe the bug
The NSE expiration time calculation has changed in NSM:
With the changes basically it's NSM MaxTokenLifetime that determines the lifetime of an NSE.
ExpirationTime
time parameter in theNetworkServiceEndpoint
structure that has been used during the registration procedure is ignored right after the first refresh (falls back using token lifetime).This might cause traffic disturbances if for example a node hosting a LB-FE gets rebooted.
That's because the related NSE Custom Resource could remain in etcd much longer. NSM_MAX_TOKEN_LIFETIME defaults to 10 minutes, while default expiration time was 1 minute before.
Until the NSE lifetime of an unavailable LB expires, proxies consider it a valid egress next hop (assuming datapath monitoring is off).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
We should be able to control NSE expiration time independent from the MaxTokenLifetime.
A possible way forward could be to introduce a custom chain element that sets the expiration time to 1 minute, thus ensuring backward compatibility.
Check if datapath monitoring should be enabled (e.g. through env variable) between Proxy and LB.
Context
Logs
NA
The text was updated successfully, but these errors were encountered: