refactor dash.Runner startup #1676

jamieklassen · 2020-11-20T17:15:19Z

What this PR does / why we need it: Adds two simple integration tests for the two main startup flows (with and without a kubeconfig)

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer: These tests require the test runner process to be able to create a directory, create/delete temporary files, and open TCP listeners (high-numbered random ports via 127.0.0.1:0) on the host.

Release note:

release-note

jamieklassen · 2020-11-23T23:20:08Z

Pushed some low-hanging refactors to clean up Runner.Start especially.

I've toyed with ways to break the filesystem dependency (even though an in-memory afero.Fs be used as a double for testing ValidateKubeConfig, cluster.FromKubeConfig still calls out to the real filesystem) or the use of a 'real' httptest.Server for the fake k8s API, but both would require really intrusive refactorings and result in duplicating some of the code from k8s.io/client-go so I think it's worth paying this performance/stability price in the tests.

However after discussing with @wwitzel3, tomorrow I will add a flag and a field on dash.Options to allow specifying the host:port for octant. This seems to be a sensible feature that nobody has yet gotten around to. Ideally this will also pave the way to use a a fake, in-memory net.Listener (not backed by a real network interface) in tests.

EDIT: I'd also like to ease the burden on the caller to use ocontext.WithKubeConfigCh and instead put this into Start.

GuessWhoSamFoo · 2020-11-30T17:55:24Z

I'll take a pass at this later today

GuessWhoSamFoo

This looks good. It greatly improves the readability of dash.go

GuessWhoSamFoo · 2020-12-01T16:37:26Z

pkg/dash/dash.go

-	} else {
-		r.startAPIService(ctx, logger, options)
+		logger.Infof("waiting for kube config ...")
+		options.KubeConfig = <-ocontext.KubeConfigChFrom(r.ctx)


This blocks which is fine because it needs to wait if the user adds a kubeconfig, however it will hang on sigint/sigterm without shutting down plugins. In other words, octant will hang if the users tries to exit in the loading api state via ctrl + c without a way to recover

oh that makes sense, thank you! I'll fix this up.

GuessWhoSamFoo

lgtm, can you squash some of these commits?

jamieklassen · 2020-12-04T16:29:54Z

@GuessWhoSamFoo I rearranged the changes to remove some of the re-work -- for example, in the old commit log I removed the "wait for kubeconfig in a goroutine" logic and then re-added it based on your feedback, whereas in this version I add that integration test in the first commit and ensure that all commits pass the tests -- but I kept the individual commits for each stage of the refactor.

I thought it would be valuable to keep each change separate, along with its justification. This is mainly because the tests I added are not exhaustive, and I may have made another mistake besides the one you already observed. If that turns out to be the case, then somebody may need to look back at my changes. I imagine it would be helpful to have a log of each refactoring decision as it was made.

I'm OK to squash more aggressively if you prefer. Let me know! 😄

Also externalize NewRunner's dependence on net.Listener and inject an in-memory listener for the integration tests -- this should make them a bit faster, stabler (no dependence on the operating system to provision a TCP socket) and safe to run in parallel. Really just covering the happy paths here. Ideally some function signatures can get cleaned up and potentially an isolate-able 'startup component' could be extracted to get more focused unit-test feedback. The 'startup without a kubeconfig' flow was a bit more delicate -- after sending the kubeconfig over the streaming API, there is a brief period of downtime while the API reloads and the test had to wait for this. There is one possible edge case where the "clean shutdown before receiving kubeconfig" scenario will erroneously pass -- this is in the case that the machine running the tests contains a valid kubeconfig at the default location, and the test takes over a second to run. In this case the LoadingManager's behaviour of loading a kubeconfig from the default location (1 second after starting) would kick in, and the API would successfully be initialized -- therefore we would not be exercising the "shutdown before loading kubeconfig" scenario we claim to exercise. We tried refactoring various packages to the point that we could inject a fake filesystem into the LoadingManager (to force it to never find a valid kubeconfig at the default location) but it was fairly intrusive, and it seems exceedingly rare that our new test should take anywhere near 1 second to run. Signed-off-by: Jamie Klassen <jklassen@vmware.com> Co-authored-by: Dimitry Besson <dbesson@vmware.com> Co-authored-by: James Thomson <jamesth@vmware.com> Co-authored-by: Vikram Yadav <yvikram@vmware.com>

apiErr is always nil in this case, so the error-handling branch will never be executed. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

Also the nil check in Start seems to be redundant. Honestly that "unexpected empty kubeconfig" line could maybe be removed as I'm pretty sure it's dead code. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

I actually find it easier to understand (particularly the `r.dash.server.Handler` side effect) if it's just in the main body of the function. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

allowing users to simply call `NewRunner` without having to remember to populate the passed context with a kubeconfig channel. We achieve this by having the Runner keep track of the context it was created with. While we were at it, we noticed that there was no need to have a Logger parameter for Start -- in fact, in the long run we may be able to keep removing parameters from Start until only the startup/shutdown channels remain. Signed-off-by: Jamie Klassen <jklassen@vmware.com> Co-authored-by: James Thomson <jamesth@vmware.com>

Signed-off-by: Jamie Klassen <jklassen@vmware.com> Co-authored-by: James Thomson <jamesth@vmware.com>

jamieklassen force-pushed the runner-startup-tests branch 2 times, most recently from bdc2e15 to e44951b Compare November 20, 2020 17:43

wwitzel3 requested a review from a team November 21, 2020 00:43

jamieklassen marked this pull request as draft November 23, 2020 23:12

jamieklassen marked this pull request as ready for review November 30, 2020 14:07

GuessWhoSamFoo suggested changes Dec 1, 2020

View reviewed changes

jamieklassen requested a review from GuessWhoSamFoo December 2, 2020 16:28

GuessWhoSamFoo reviewed Dec 3, 2020

View reviewed changes

GuessWhoSamFoo approved these changes Dec 3, 2020

View reviewed changes

jamieklassen changed the title ~~runner startup tests~~ refactor dash.Runner startup Dec 3, 2020

jamieklassen force-pushed the runner-startup-tests branch 2 times, most recently from 585a78c to 49d9e8d Compare December 4, 2020 16:25

Jamie Klassen and others added 4 commits December 4, 2020 11:30

inline initLoadingAPI

9cb197a

apiErr is always nil in this case, so the error-handling branch will never be executed. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

inline the "loading" branch of startAPIService

0ffc463

Also the nil check in Start seems to be redundant. Honestly that "unexpected empty kubeconfig" line could maybe be removed as I'm pretty sure it's dead code. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

inline startAPIService

eb4dfb5

I actually find it easier to understand (particularly the `r.dash.server.Handler` side effect) if it's just in the main body of the function. Signed-off-by: Jamie Klassen <jklassen@vmware.com>

jamieklassen force-pushed the runner-startup-tests branch from 49d9e8d to e29b6c1 Compare December 4, 2020 16:31

Jamie Klassen and others added 2 commits December 4, 2020 13:35

add changelog entry

e5b1d5b

Signed-off-by: Jamie Klassen <jklassen@vmware.com> Co-authored-by: James Thomson <jamesth@vmware.com>

jamieklassen force-pushed the runner-startup-tests branch from e29b6c1 to e5b1d5b Compare December 4, 2020 18:38

GuessWhoSamFoo merged commit 0e2e4e2 into vmware-archive:master Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor dash.Runner startup #1676

refactor dash.Runner startup #1676

jamieklassen commented Nov 20, 2020

jamieklassen commented Nov 23, 2020 •

edited

Loading

GuessWhoSamFoo commented Nov 30, 2020

GuessWhoSamFoo left a comment

GuessWhoSamFoo Dec 1, 2020

jamieklassen Dec 1, 2020

GuessWhoSamFoo left a comment

jamieklassen commented Dec 4, 2020

refactor dash.Runner startup #1676

refactor dash.Runner startup #1676

Conversation

jamieklassen commented Nov 20, 2020

jamieklassen commented Nov 23, 2020 • edited Loading

GuessWhoSamFoo commented Nov 30, 2020

GuessWhoSamFoo left a comment

Choose a reason for hiding this comment

GuessWhoSamFoo Dec 1, 2020

Choose a reason for hiding this comment

jamieklassen Dec 1, 2020

Choose a reason for hiding this comment

GuessWhoSamFoo left a comment

Choose a reason for hiding this comment

jamieklassen commented Dec 4, 2020

jamieklassen commented Nov 23, 2020 •

edited

Loading