Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper resource cleanup for expired connections #1445

Open
BenAgai opened this issue Apr 2, 2023 · 6 comments
Open

Proper resource cleanup for expired connections #1445

BenAgai opened this issue Apr 2, 2023 · 6 comments

Comments

@BenAgai
Copy link

BenAgai commented Apr 2, 2023

Hi,
I wanted to ask a question regarding cleanup of expired connections made from an NSC to NSE.

Assuming NSC successfully established connection to an NSE.
If the NSC runs on a pod that got terminated without the NSC initiating close flow, we will have leak across the data path for that connection.

Is there any built in mechanism the recover from such scenario? (for example expiration on a connection that is renewed every time refresh occurs, and once the connection is expired a close flow is initiated for it).

I encountered the following links that looks relevant, but wasn’t sure:
networkservicemesh/deployments-k8s#8882
#1439
#1440

I will be glad to know if the issue is resolved and how, or whether should I implement such mechanism.
Thanks!

@glazychev-art
Copy link
Contributor

Hi @BenAgai,
That's right, we have such functionality.
There is timeout chain element that will call Close for inactive connections. This happens after the token expires (default 10 minutes).
If the connection is alive, then refreshes occur periodically, and this doesn't allow the connection to expire.
If the NSC didn't initiate Close when the pod ended, this Close will occur when the timeout occurs.

@BenAgai
Copy link
Author

BenAgai commented Jun 8, 2023

Hi @glazychev-art ,
Thank you very much for the update!

@or-adar
Copy link

or-adar commented Jul 16, 2023

@glazychev-art I'm reusing this thread as I have some questions about the timeout chain element which are unclear to me:
if I don't want to use the default 10 minutes and want to use a lower value, like 1 minue - what should be considered?
I mean, if the timeout element only considers the expiration time of it's previous path segment - should I align the env of NSM_MAX_TOKEN_LIFETIME for all the components in the path (NSC, NSMgr, forwarders, passthrough, NSEs), so the new value is same for all,
or are there particular components that should be enough to only set their values (while keeping the default value for others)?

@glazychev-art
Copy link
Contributor

@or-adar
You are right, currently the timeout element only considers it's previous path segment.
If the timeout calls Close - this call goes though all components.

So, for example, if you set NSM_MAX_TOKEN_LIFETIME=1m only for NSC, then NSMgr calls Close after 1m (if refreshes do not take place regularly). Other components (forwarders, passthrough, NSEs) will also receive this Close from the NSMgr, but not from its timeout element (because they use NSM_MAX_TOKEN_LIFETIME=10m).

For the normal case, this is sufficient. Some questions may arise in the case of healing.
For example, NSC and its NSMgr die. In this case other components will clean up their resources only on their own timeouts (i.e after 10m). Because the only one (NSMgr) who was responsible for clearing resources after 1 minute died.

@or-adar
Copy link

or-adar commented Jul 17, 2023

@glazychev-art appreciate the response!
so if I got you right - if I want NSEs & Forwarders to clean up their resources after 1m, it is more recommended to set the NSM_MAX_TOKEN_LIFETIME to 1m for all components that might be included in the PathSegment (NSC, NSMgr, forwarders, passthrough, NSEs), to accomodate for cases where components die?

@glazychev-art
Copy link
Contributor

Yes, that's right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants