Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSMgr should try to re-use previously selected forwarder in case of restart #1230

Closed
denis-tingaikin opened this issue Feb 16, 2022 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@denis-tingaikin
Copy link
Member

denis-tingaikin commented Feb 16, 2022

Steps to Reproduce

  1. Deploy NSMgr, forwarder1, registry
  2. Deploy nsc and nse for service "my-ns"
  3. Wait for nsc request to complete
  4. Deploy forwarder2, forwarder3
  5. Restart NSMgr

Expected Behavior

NSMgr selects the previously selected forwarder (forwarder1) for nsc on restarting. Datapath is still alive.

Current Behavior

NSMgr selects the first forwarder for registry response (it could be forwarder1, forwarder2, forwarder3) for nsc on restarting. Previous datapath that is alive may be closed and another one may be created(!).

@denis-tingaikin
Copy link
Member Author

@edwarnicke Can we consider this issue?

@denis-tingaikin denis-tingaikin added the bug Something isn't working label Feb 16, 2022
@edwarnicke
Copy link
Member

Yes. Question though, how does this relate to the discussion about forwarder loadbalancing?

@denis-tingaikin
Copy link
Member Author

denis-tingaikin commented Feb 16, 2022

how does this relate to the discussion about forwarder loadbalancing?

The problem could be considered as part of forwarder load-balancing. But as we don't want to add forwarder load-balancing at this moment we could consider the problem separately.

@denis-tingaikin
Copy link
Member Author

@edwarnicke Can we add this into 1.3.0 project board?

@edwarnicke
Copy link
Member

@denis-tingaikin Please do add it.

@edwarnicke
Copy link
Member

@denis-tingaikin The really interesting questions here is going to be the balancing question between doing something about this and the cost of doing something about this... so whether we do it will depend a lot of the space of available solutions.

@denis-tingaikin
Copy link
Member Author

@edwarnicke , @NikitaSkrynnik

I've prepared a unit test that demonstrates a problem

func Test_ForwarderShouldBeSelectedCorrectlyOnNSMgrRestart(t *testing.T) {
	t.Cleanup(func() { goleak.VerifyNone(t) })

	ctx, cancel := context.WithTimeout(context.Background(), time.Second*5)
	defer cancel()

	domain := sandbox.NewBuilder(ctx, t).
		SetNodesCount(1).
		SetRegistryProxySupplier(nil).
		SetNSMgrProxySupplier(nil).
		Build()

	var expectedForwarderName string

	require.Len(t, domain.Nodes[0].Forwarders, 1)
	for k := range domain.Nodes[0].Forwarders {
		expectedForwarderName = k
	}

	nsRegistryClient := domain.NewNSRegistryClient(ctx, sandbox.GenerateTestToken)

	_, err := nsRegistryClient.Register(ctx, &registry.NetworkService{
		Name: "my-ns",
	})
	require.NoError(t, err)

	nseReg := &registry.NetworkServiceEndpoint{
		Name:                "my-nse-1",
		NetworkServiceNames: []string{"my-ns"},
	}

	domain.Nodes[0].NewEndpoint(ctx, nseReg, sandbox.GenerateTestToken)

	nsc := domain.Nodes[0].NewClient(ctx, sandbox.GenerateTestToken)

	request := defaultRequest("my-ns")

	for i := 0; i < 10; i++ {
		conn, err := nsc.Request(ctx, request.Clone())
		require.NoError(t, err)
		require.NotNil(t, conn)
		require.Equal(t, 4, len(conn.Path.PathSegments))
		require.Equal(t, expectedForwarderName, conn.GetPath().GetPathSegments()[2].Name)

		domain.Nodes[0].NewForwarder(ctx, &registryapi.NetworkServiceEndpoint{
			Name:                sandbox.UniqueName(fmt.Sprintf("%v-forwarder", i)),
			NetworkServiceNames: []string{"forwarder"},
			NetworkServiceLabels: map[string]*registryapi.NetworkServiceLabels{
				"forwarder": {
					Labels: map[string]string{
						"p2p": "true",
					},
				},
			},
		}, sandbox.GenerateTestToken)

		domain.Nodes[0].NSMgr.Restart()

	}
}

@edwarnicke
Copy link
Member

@denis-tingaikin What ideas do we have on how to deal with the situation?

@denis-tingaikin
Copy link
Member Author

denis-tingaikin commented Feb 16, 2022

@edwarnicke I think we can simply try to match response from the registry with forwarders with the next path segment name. If we have no match, then we should do as we do currently. if we have a match then we select matched forwarder.

@edwarnicke
Copy link
Member

Oooh... I like that. Its simple :)

@denis-tingaikin
Copy link
Member Author

Done by #1232

Repository owner moved this from Todo to Done in Release 1.3.0 Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants