add service registration polling to connect-init #452

kschoche · 2021-03-12T20:51:24Z

Changes proposed in this PR:

Extends connect-init command to wait for the service to be registered. When the endpoints controller is added this will be allow the init container to wait for the service to be registered prior to bootstrapping envoy.
Adds pod-name , pod-namespace flags to connect-init as well as a flag to disable the new service registration polling feature to preserve backwards compatibility until the endpoints controller is merged.

How I've tested this PR:
Unit tests added and old ones passing.
Manually build consul-k8s dev image and deploy with ACLs enabled and connect inject an application.

How I expect reviewers to test this PR:
code review and manually test by deploying consul and an injected app and seeing that it gets injected and started.

Checklist:

Tests added
CHANGELOG entry added (HashiCorp engineers only, community PRs should not add a changelog entry)

subcommand/connect-init/command.go

connect-inject/container_init.go

subcommand/connect-init/command_test.go

connect-inject/container_init.go

ishustava

Looking good so far! I've left some comments, questions, and suggestions. Most of them are minor code or cleanup suggestions and improvements, as well as some testing suggestions.

connect-inject/container_init.go

subcommand/connect-init/command.go

subcommand/connect-init/command_test.go

ndhanushkodi

Looks great! Nice RetryServicePolling test! Just had an idea for a comment to be added.

subcommand/connect-init/command.go

ishustava · 2021-03-17T21:01:01Z

subcommand/connect-init/command_test.go

+		require.Equal(t, 0, exitCode)
+	case <-time.After(time.Second * 10):
+		// Fail if the stopCh was not caught.
+		require.Fail(t, "timeout waiting for command to exit")


Although thinking about it and the way the test is set up I don't think we will ever hit this case with the way the code is currently written. If the command times out after 10 seconds then it'll send 1 to the exitCode channel and that will still fall into the first case.

The only way it could go into the second case is when the command is implemented where it doesn't exit after 10 sec (or at all). Then we'd hit this case, but that would mean that the command is still running after the test has finished running. So I think to be good citizens when you start something in a goroutine in a test you should always make sure the goroutine finishes when the test terminates, and we probably shouldn't make an assumption that the command will always be implemented in a certain way.

Like you said though, the command doesn't really catch signals. But perhaps there are a couple of ways we can work with it that we can look into:

One idea is to set context to the backoff and allow that context to be set by the tests. I think you can do that by wrapping it with backoff.WithContext function. Then you could always terminate the command by canceling the context.

You could instead start the test agent in a goroutine after a delay. Then because the agent is not available, it'll force polling to retry. You can always terminate the agent by calling agent.

You could register services in a goroutine after a delay.

subcommand/connect-init/command_test.go

Co-authored-by: Iryna Shustava <ishustava@users.noreply.github.com>

* Add service registration polling to connect-init Co-authored-by: Iryna Shustava <iryna@hashicorp.com>

kschoche added 6 commits March 9, 2021 17:52

Add basic tests and service registration polling

ccaf818

fix merge mess

35b837e

add some more unit tests

50a168b

plumb through to init_container

afca1af

fix tests

5598cb1

fix typo

2af63d9

kschoche added area/connect Related to Connect service mesh, e.g. injection theme/tproxy Items related to transparent proxy labels Mar 12, 2021