Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup interfaces before the forwarder termination #516

Closed
glazychev-art opened this issue Mar 11, 2022 · 10 comments
Closed

Cleanup interfaces before the forwarder termination #516

glazychev-art opened this issue Mar 11, 2022 · 10 comments
Assignees

Comments

@glazychev-art
Copy link
Contributor

glazychev-art commented Mar 11, 2022

Description

This problem is relevant if we use shared VPP (Calico)

We know that when users update the forwarder, the pod restarts without saving the name. In this case, the correct solution is to remove the interfaces before terminating.

We have 2 options for this case:

1. Add cleanup chain element that works with metadata

It will store the delete functions in the metadata. As soon as the chain context.Done, these functions will be called.
We also need to wait for all interfaces to be removed before terminating the application. To do this, we can use a channel in this chain element and wait for event here - https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/main/main.go#L282

Disadvantages:
A. We need to add the delete function into each element of interest to us (for example, into mechanisms, cross-connects ...). It looks like:

func createTapInterface(...) {
....
cleanup.Store(ctx, isClient, "tap", func() error {
	_, err := tapv2.NewServiceClient(vppConn).TapDeleteV2(context.Background(), &tapv2.TapDeleteV2{
		SwIfIndex: rsp.SwIfIndex,
	})
	return err
})
...
}

B. Removing elements will be random. This affects us because for example the L3XC must be removed before the interfaces. I created a bug in JIRA - https://jira.fd.io/browse/VPP-2019

2. Monitor own connections and Close() before termination

We can do MonitorConnections() before termination and call Close() on the endpoint in the main func -
https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/main/main.go#L195

To prevent calling a Request on other elements (manager, another forwarder) - we can add a chainCtx check in the connect.Client https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/connect/client.go

Possible problem:
Since the chain context is already Done at the time of the MonitorConnections() and Close call - there may be problems

3. Add cleanup chain element that calls Close()

This chain element will call Close() as soon as the chainCtx.Done, using eventFactory := begin.FromContext(ctx).

To prevent calling a Request on other elements (manager, another forwarder) - we can add a chainCtx check in the connect.Client https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/connect/client.go

We also need to wait for all interfaces to be removed before terminating the application. To do this, we can use a channel in this chain element and wait for event here - https://github.com/networkservicemesh/cmd-forwarder-vpp/blob/main/main.go#L282

Possible problem:
Since the chain context is already Done at the time of the Close call - there may be problems

@edwarnicke
Copy link
Member

Option 3 " Add cleanup chain element that calls Close()" looks good. We may have a small number of places we need to clean up wrt the chainCtx being closed when we call Close... but overall its a nice simple approach.

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Mar 13, 2022

@edwarnicke Should this chain element close remote side? In my mind for forwarder - no, for SDK use - yes.

@edwarnicke
Copy link
Member

@denis-tingaikin Could you say more about what you man by 'close the remote side' ?

@denis-tingaikin
Copy link
Member

@edwarnicke means do not close other datapath parts. Because the forwarder can return back and heal connection. Otherwise other datapath parts could be closed by timeout.

@edwarnicke
Copy link
Member

@denis-tingaikin How would we accomplish that here?

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Mar 13, 2022

@edwarnicke

I think we can do something like this.

package cleanup

import (
	"context"

	"github.com/golang/protobuf/ptypes/empty"
	"github.com/networkservicemesh/api/pkg/api/networkservice"
	"github.com/networkservicemesh/sdk/pkg/networkservice/common/begin"
	"github.com/networkservicemesh/sdk/pkg/networkservice/common/clientconn"
	"github.com/networkservicemesh/sdk/pkg/networkservice/core/next"
	"github.com/networkservicemesh/sdk/pkg/networkservice/utils/metadata"
	"google.golang.org/grpc"
)

type cleanupKey struct{}
type cancelKey struct{}

type cleanupClient struct {
	chainCtx context.Context
}

func NewClient(chainCtx context.Context) networkservice.NetworkServiceClient {
	return &cleanupClient{
		chainCtx: chainCtx,
	}
}

func (c *cleanupClient) Request(ctx context.Context, request *networkservice.NetworkServiceRequest, opts ...grpc.CallOption) (*networkservice.Connection, error) {
	resp, err := next.Client(ctx).Request(ctx, request, opts...)
	if err == nil {
		factory := begin.FromContext(ctx)
		go func() {
			<-c.chainCtx.Done()
			metadata.Map(ctx, true).Store(cleanupKey{}, struct{}{})
			factory.Close()
			ctx, cancel := context.WithCancel(context.Background())
			metadata.Map(ctx, true).Store(cancelKey{}, cancel)
			go func() {
				defer cancel()
				select {
				case <-c.chainCtx.Done():
					metadata.Map(ctx, true).Store(cleanupKey{}, struct{}{})
					factory.Close(begin.CancelContext(ctx))
				case <-ctx.Done():
					return
				}

			}()

		}()
	}
	return resp, err
}

func (c *cleanupClient) Close(ctx context.Context, conn *networkservice.Connection, opts ...grpc.CallOption) (*empty.Empty, error) {
	if _, ok := metadata.Map(ctx, true).Load(cleanupKey{}); ok {
		if cc, ok := clientconn.Load(ctx); ok {
			if closable, ok := cc.(interface{ Close() error }); ok {
				_ = closable.Close()
			}
			clientconn.Delete(ctx)
		}
	} else if v, ok := metadata.Map(ctx, true).Load(cancelKey{}); ok {
		if cancel, ok := v.(context.CancelFunc); ok {
			cancel()
		}
	}

	return next.Client(ctx).Close(ctx, conn, opts...)
}

Additional functionally is located before connect client in the client chain https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/chains/client/client.go#L68

So then we can just close our cc related to remote side.

@edwarnicke
Copy link
Member

This is good for the forwarder case... but what about the originating NSC case?

If I have:

NSC -> ... -> NSE

and the NSC itself is being terminated, we probably want to close out the connection including the remote pieces.

My guess is we could handle this easily with a With Option (and we would also need a With Option for the channel @glazychev-art mentioned for letting the main block till the Close is finished.

@denis-tingaikin
Copy link
Member

Sounds good.

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Mar 13, 2022

Finally our solution is:

  1. Implement cleanup client chain element Cleanup interfaces before the forwarder termination #516 (comment) @glazychev-art Note: begin factory is passing from begin.Server chain element. So when we'll call close from the client we'll close servers and clients.
  2. Add an option for closing cc from clientconn and use the option for forwarders.
  3. Add an option for waiting cleanup done and use the option for forwarders.
  4. Add cleanup by default for the NSCs.

@glazychev-art
Copy link
Contributor Author

Closed with networkservicemesh/sdk#1246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants