Minimal implementation #1

tommyknows · 2023-02-22T16:56:00Z

This PR includes the first minimal implementation of the kubernetes-scanner.
It currently takes a config which includes a list of GVKs and (potentially) namespaces to scan, and logs the result.

dockerfile switches to use boringcrypto instead of the regular crypto module (I think that would make it FIPS-compliant).

craigfurman · 2023-02-23T14:45:51Z

internal/config/config.go

+}
+
+type Config struct {
+	Scanning       Scan            `json:"scanning"`


For the scan config, as well as the group,version,kind, I was imaging we'd need potentially 2 more fields:

Namespaces, as an array (or * as a special case to scan all namespaces. No need for glob logic)

It's possible, but outside the scope of this PR, that we'll want to default to scanning the namespace into which we're deployed, if not overridden with config

IsNamespaced, or potentially IsNotNamespaced since being namespaced is the norm, defaulting to "is namespaced", if we need to tell the k8s libraries which namespaces to scan. Certainly, for the dynamic client, you do need to know what's namespaced and what's not. If that's not the case with event listeners, maybe this point disappears after implementing the above one. Users would just write config to scan an arbitrary namespace for non-namespaced resources.

Re IsNamespaced: I don't think that's required, at least not for controller-runtime.
The manager.Options type contains a Namespace field that would do what we want (have a look at the docs), however, that's scoping the entire manager to a specific namespace.

What that means is that our cache will always contain the objects from all namespaces, and we'll need to write our own logic to "skip" resources from certain namespaces.

I assume that's fine in most installations, and the clusters that are big enough for this to be a problem should probably just run completely namespace-scoped controllers instead? 🤔

Ah, I understand now. Sorry, I should have re-familiarised myself with controller-runtime before commenting. I see what you mean about that namespace option.

If we implement my namespaces note, it'll still be possible to partition multiple instances by {GVK,namespace}, and un-namespaced resources can then just be configured use an arbitrary namespace, no?

it'll still be possible to partition multiple instances by {GVK,namespace}, and un-namespaced resources can then just be configured use an arbitrary namespace, no?

I'm not quite sure what you're asking, but iiuc you just wouldn't set the Namespaces option for cluster-scoped resources.

Currently, if you'd do something like this:

Group: "", Version: "v1", Kind: "Namespace", Namespaces: ["foo", "bar"]

You'd never reconcile a Namespace resource because they're not in the given namespaces.

With namespaced resources, that would work as expected.

hmm let me try with an example, it's possible we're misunderstanding each other. With a config that looks something like this:

scan: - gvk: apps/v1.Deployment namespaces: ['default', 'plantuml'] - gvk: v1.Service namespaces: ['*'] - gvk: rbac.authorization.k8s.io/v1.ClusterRole

Namespaced resources can be scanned from multiple namespaces, or all with *. Non-namespaced resources like the ClusterRole do not specify a namespace. We indeed don't need IsNamespaced.

In the future we could default namespaces to the currently-deployed namespace, perhaps. It's not clear to me how useful that'd be.

I also cheekily proposed a friendly string syntax for GVK, we don't have to do this / can do it later. It's orthogonal to the point and was quicker to type.

Namespaced resources can be scanned from multiple namespaces, or all with *. Non-namespaced resources like the ClusterRole do not specify a namespace.

Yes. Two things that are "different" in my current implementation:

"all with *" is instead "all with an empty list". Minor UX difference; I have no strong opinion on what's better. Omission meaning "all namespaces" is less to type though :) Also, "no namespaces" doesn't really make sense.

"Non-namespaced resources do not specify a namespace". Can't stop customers from defining one in their config. Currently, this would mean the reconciliation loop for that resource "breaks" (no resources would be reconciled). We can also change it to ignore it. Also no strong opinion here :)

proposed a friendly string syntax for GVK

Yeah we have to see what's possible. Thinking about the ClusterRole that we do want to generate, I think they use the plural naming of things (e.g. pods instead of "Pod"), so maybe we instead need to support that naming. We'll see once we get to writing the Helm chart? :)

All of that sounds fine, I think the core concept is sound. As a first pass, it's fine to not prevent user misconfiguration. In a future pass, we could detect such things and print warnings (just an example), and also translate between singular and plural names by asking the cluster for its api resources. Let's put anything that might have value like that (if you agree) in follow-on issues in the backlog. They might sit there forever, or they might be useful in the future!

I think especially the translation-feature will become necessary once we start developing the Helm chart. I'll add it to the backlog.

, I think they use the plural naming of things (e.g. pods instead of "Pod")

Yeah, for the role, we specify the things in plural format

https://github.com/snyk/kubernetes-monitor/blob/99215c8a3f06e4dbda48b90eda6190b19f27b326/snyk-monitor/templates/clusterrole.yaml

- apiGroups: - "" resources: - replicationcontrollers verbs: - get - list - watch - apiGroups: - batch resources: - cronjobs - jobs verbs: - get - list - watch

This kubectl api-resources -o wide prints supported resources with GVK / namespaced (true|false), I guess API equivalent is here: https://github.com/kubernetes/client-go/blob/master/discovery/discovery_client.go

But yeah, all of these should be part of the backlog, nothing to tackle at T=0

internal/config/config.go

craigfurman · 2023-02-23T14:47:08Z

internal/config/config.go

+	flag.StringVar(&configFile, "config", "/etc/kubernetes-scanner/config.yaml", "defines the location of the config file")
+	flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
+	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
+	flag.BoolVar(&enableLeaderElection, "leader-elect", false,


This appears to be unused. Just to check, this is a controller-runtime concept, right? We're not about to implement a consensus algorithm? I don't doubt you could do it of course, but still 😁

Yes that whole leaderelection thing is already built into controller-runtime. I've removed everything else from it, but forgot about that one.

I also won't implement a consensus algorithm ;)

craigfurman

Great start! I'm amazed how short it is. We can add observability etc in a future pass, it's good to have a minimally-viable base to iterate on.

A few notes, and FWIW I haven't run it yet, still waiting on an example config file, and the hope of out-of-cluster auth 😁

craigfurman · 2023-02-24T10:28:28Z

internal/config/config.go

+func init() {
+	flag.StringVar(&configFile, "config", "/etc/kubernetes-scanner/config.yaml", "defines the location of the config file")
+	flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
+	flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")


A few notes on config that I'll put arbitrarily on this line 😁

Why do we have flags that duplicate 2 config file fields (metrics and probe addrs)?

Can we provide an example local config in the repo for development? I know I already wrote a note about this, but starting to write one in json was a bit of a drag

Can we make the config file yaml, to make it a little easier to hammer out by hand for development? JSON is still valid yaml, so we can have our cake and eat it here

Why do we have flags that duplicate 2 config file fields (metrics and probe addrs)?

They were flags originally (kubebuilder scaffolding), I thought we should also allow this to be configured through the config file. But one place should be enough, so I'll just keep it in the config file.

Can we provide an example local config in the repo for development? I know I already wrote a note about this, but starting to write one in json was a bit of a drag

Sure, I'll add a more complete test that marshals a complete config somewhere.

Can we make the config file yaml, to make it a little easier to hammer out by hand for development? JSON is still valid yaml, so we can have our cake and eat it here

Yes, definitely 😅

Pushed some changes, we now have a test that writes a sample config to a temp file. You can start from there.

Kubeconfig is specified either from the -kubeconfig flag or through $KUBECONFIG or ~/.kube/config. See the ctrl.GetConfig() docs on this to learn more. The test also shows this by setting the flag.

craigfurman · 2023-02-24T10:35:29Z

main_test.go

+/*
+Copyright 2023.
+
+Licensed under the Apache License, Version 2.0 (the "License");


Was this generated with kubebuilder, out of interest? That's fine, but so much was removed if so that I can't tell, which is also fine. This license header makes me suspect that it was. I think we should remove these, if we're allowed (and off the top of my head, the apache license isn't "viral", but I'd recommend checking), because we haven't decided on an open source license for this project yet.

Yeah the initial files, which I pretty much deleted completely in this process, was generated by kubebuilder. But kubebuilder's testing strategy is quite different - they use BDD-style testing with a library called Ginkgo - which I don't really like 😄

I'll remove the header.

Hah, I used to work with the ginkgo developer. I used to quite buy into that rspec-style testing DSL, these days I've come back down to earth (go test, junit)

craigfurman · 2023-02-24T10:43:11Z

main.go

+	for _, scanType := range cfg.Scanning.Types {
+		if err := (&reconciler{
+			Reader:       mgr.GetClient(),
+			requeueAfter: cfg.Scanning.RequeueAfter.Duration,
+			reconcile:    reconcile,
+			newObject:    newObjectFn(scanType.GroupVersionKind),
+			namespaces:   scanType.Namespaces,
+		}).SetupWithManager(mgr); err != nil {
+			return nil, fmt.Errorf("unable to create controller for GVK %v: %w", scanType.String(), err)
+		}
+	}


If I understand correctly, this is the setup of the controller-runtime loops, like any kubernetes controller, and we're doing 1 per GVK. I have a few possibly unhelpful, low-context questions, feel free to push back on the premise:

How does this compare to using a DynamicInformer (e.g. https://blog.dsb.dev/posts/creating-dynamic-informers/).

In fact, that guy's now-archived project, Kollect, looks pretty similar to this in concept. I suspect there are a lot of those on the internet, which makes me wonder if we should use/adapt one, but on the other hand, if the only similarity is the control loop, there's not much value.

Is there a way to register multiple resource types on one manager, and if so, what are the trade-offs of doing that?

How does this compare to using a DynamicInformer

I think that's what's being used under the hood in kubebuilder / controller-runtime. I wouldn't know how they do it otherwise 😄

if we should use/adapt one

There's a lot of those, mainly because all Kubernetes control loops look the same :)
The dynamic ones - kollect, port - are even more similar for obvious reasons.

I would argue that the "core logic" that we currently have is not worth using a dependency. In contrast, having this in our control means we get a bit more freedom for the features we want to build on top. I think it's easier than trying to depend on something that's already built.

Additionally, it's just not that much code thanks to controller-runtime as well 😄

Is there a way to register multiple resource types on one manager, and if so, what are the trade-offs of doing that?

That's what we're doing here; but I assume that's not what you asked for.
We create one reconciler per GVK, and attach them all to the same manager.
The two other ways:

One reconciler for multiple GVKs at once: I've tried that in the beginning, but that doesn't work. The reconcile-loops are tied to a single resource type.

One reconciler per manager, having multiple managers: I'm not sure why you'd do this.
Having a single manager means you can share a cache, share the setup, share the logic etc etc...

Does that answer your question?

It does, I was mistaken about multiple managers, now that I look again. I really meant reconcilers, and you've answered that! Thanks

craigfurman · 2023-02-24T10:44:46Z

main.go

+
+	// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
+	// to ensure that exec-entrypoint and run can make use of them.
+	// TODO: not sure what's needed here if we only support in-cluster auth.


I would like the ability to do out-of-cluster auth, e.g. https://gist.github.com/craigfurman/7160ce317e5a4d868e78c558593601fd#file-kube_list_all_resources-go-L20-L31 (itself cribbed from a kubernetes example somewhere). Development loops will be much faster just running this controller on our laptops, pointed at some kube cluster, e.g. docker-desktop.

See other comment, you can either set -kubeconfig, $KUBECONFIG or it will automatically use ~/.kube/config. I guess this comment can be removed.

ivanstanev · 2023-02-27T10:34:50Z

Dockerfile

@@ -7,7 +7,7 @@ ARG COMMIT_SHA
 ARG GIT_TAG

 RUN go mod download
-RUN CGO_ENABLED=0 go build \
+RUN CGO_ENABLED=0 GOEXPERIMENT=boringcrypto go build \


Nice find! I wish this question https://stackoverflow.com/questions/71215891/does-google-go-dev-boringcrypto-branch-contains-fips-certified-boringcrypto-libr had an answer... 😄

This does though:

Is the plan to inherit the existing CMVP validation of BoringCrpto until 7/12/2025 or submit for a new FIPS validation?

@pfox1969 The BoringCrypto module being used did not change as part of this work.

And because the BoringCrypto module is FIPS-certified, I'd say yes it is FIPS-certified as well?

Also, this seems to be an interesting discussion, which links to this

// Package fipsonly restricts all TLS configuration to FIPS-approved settings. // // The effect is triggered by importing the package anywhere in a program, as in: // // import _ "crypto/tls/fipsonly" // // This package only exists when using Go compiled with GOEXPERIMENT=boringcrypto.

Looks pretty good to me then? 😄 (we do not have imported this package currently though)

ivanstanev · 2023-02-27T10:47:50Z

internal/config/config_test.go

+	}
+}
+
+func TestConfigRealAPIServer(t *testing.T) {


What API server is being used here? From setup-envtest?

Yes, exactly. The restCfg (which indicates how everything should connect to the API server) is created from test.SetupEnv, which in turn uses the envtest package.

envtest uses a local kube-apiserver and etcd binary, which is "installed" by setup-envtest.

main.go

ivanstanev · 2023-02-27T10:52:51Z

main.go

+func newObjectFn(forGVK schema.GroupVersionKind) func() client.Object {
+	return func() client.Object {
+		u := &unstructured.Unstructured{}
+		u.SetGroupVersionKind(forGVK)
+		return u
+	}
+}


Does it mean we only collect the GVK from a watched resource and nothing else?

Sorry I'm not sure what you're asking here.

We don't "collect" GVKs, we only collect the watched resources, which are identified based on the GVK.

I don't understand the code, it creates an empty unstructured object and only sets GVK... Just not sure what is happening here 😄

The controller-runtime client.Client functions have signatures like this:

Get(ctx context.Context, key client.ObjectKey, obj client.Object, opts ...client.GetOption) error

We're talking about this client.Object which is an interface for a "generic" object.
Usually, the client is used like this:

pod := &corev1.Pod{} if err := client.Get(ctx, <namespaced-name>, pod); err != nil { ... }

This will populate the pod object.

However, we don't want to do this because that would require us to add all the types to the code (e.g. make it not dynamic), and we'd also need to register all the CRD's schemas - meaning depending on some external Go code as well - in the scheme for encoding & decoding objects.

The unstructured.Unstructured type also implements client.Object, however this only works when you set the GroupVersionKind so that the client.Get actually knows which type it should return.

In that case, you don't need type-specific encoding- and decoding-logic, as it will just be put into this "generic" representation of an object within unstructured.Unstructured{}.

This allows us to do a "dynamic" scanning simply by knowing the GVK.

ivanstanev · 2023-02-27T10:56:26Z

main.go

+	}
+
+	obj := r.newObject()
+	if err := r.Get(ctx, req.NamespacedName, obj); err != nil {


Does this call the Kubernetes API, and does it have a chance of failing due to some network error or by triggering rate limiting?

Does this call the Kubernetes API

Technically yes, practically afaik not.
The standard client has a cache built-in, and that cache is populated from things like the Watch events. Meaning if we get the object here, it should very likely be cached :)

(At least that's the assumption I've always made. Given that I've never ran into rate-limiting with that client in my career, I would expect this assumption to be correct)

We do Get the object from the API in response to events, or does that just pull a cached document out of the event or something along those lines?

I'm not sure what happens "internally" (within controller-runtime), whether they get the actual document with the event or not. For us, we just get the NamespacedName based on an event, and the actual object is already cached within the client (transparently to us). So:

We do Get the object from the API in response to events

We don't, but I assume that the controller-runtime does that before invoking the Reconcile method (or gets it directly with the event).

or does that just pull a cached document out of the event or something along those lines?

We just pull a cached document out of it, but again that happens transparently :)

From our (code's) PoV, whether this thing is cached or not doesn't matter, and we don't really need to know.

controller-runtime handles this, and I assume that when you Watch and reconcile specific resources, they will already be cached before it starts the Reconcile call :)

We just pull a cached document out of it, but again that happens transparently :)

You gotta admit, that was an impressive total guess of mine 😂

All SGTM

craigfurman

@tommyknows and I just paired a bit to help me fully understand what's going on. Great work!

I'll approve this now, feel free to trim loose ends, squash and merge whenever you like. Let's be biased towards merging and iterating 💪

Dangling threads can be spun out into issues (and there may already be existing relevant issues, e.g. ensure eventual consistency of deletions)

This commit adds a very rough first implementation of the kubernetes-scanner, plus tests. It also contains some documentation about what and how we are building this controller.

TODO: maybe instead cache / save them in the image...

snyksec · 2023-03-17T10:05:33Z

🎉 This PR is included in version 1.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

tommyknows force-pushed the feat/implementation branch 4 times, most recently from cd44898 to 090b9fe Compare February 23, 2023 10:16

chore: miscellaneous repo updates

7352b75

dockerfile switches to use boringcrypto instead of the regular crypto module (I think that would make it FIPS-compliant).

tommyknows force-pushed the feat/implementation branch from 090b9fe to d4e41d2 Compare February 23, 2023 14:25

craigfurman reviewed Feb 23, 2023

View reviewed changes

tommyknows force-pushed the feat/implementation branch 2 times, most recently from a3b1a0c to e89ddb1 Compare February 24, 2023 08:53

tommyknows changed the title ~~Feat/implementation~~ Minimal implementation Feb 24, 2023

tommyknows marked this pull request as ready for review February 24, 2023 09:11

tommyknows requested a review from a team as a code owner February 24, 2023 09:11

tommyknows requested a review from craigfurman February 24, 2023 09:11

craigfurman reviewed Feb 24, 2023

View reviewed changes

tommyknows force-pushed the feat/implementation branch 3 times, most recently from fb67740 to 6d18896 Compare February 24, 2023 15:04

ivanstanev reviewed Feb 27, 2023

View reviewed changes

main.go Show resolved Hide resolved

ivanstanev reviewed Feb 27, 2023

View reviewed changes

craigfurman approved these changes Feb 27, 2023

View reviewed changes

tommyknows force-pushed the feat/implementation branch from 2b6358e to 3b5e536 Compare February 27, 2023 14:13

tommyknows added 3 commits February 27, 2023 16:05

feat: initial implementation of the controller

8f7e1cd

This commit adds a very rough first implementation of the kubernetes-scanner, plus tests. It also contains some documentation about what and how we are building this controller.

ci: move module download to separate step

b21d661

ci: install controller tools

5b683ef

TODO: maybe instead cache / save them in the image...

tommyknows force-pushed the feat/implementation branch from 3b5e536 to 5b683ef Compare February 27, 2023 15:05

tommyknows merged commit 339edda into main Feb 27, 2023

tommyknows deleted the feat/implementation branch February 27, 2023 15:22

snyksec added the released label Mar 17, 2023

Minimal implementation #1

Minimal implementation #1

Conversation

tommyknows commented Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tommyknows Feb 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craigfurman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ivanstanev Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tommyknows Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craigfurman left a comment

Choose a reason for hiding this comment

snyksec commented Mar 17, 2023

tommyknows commented Feb 22, 2023 •

edited

Loading

tommyknows Feb 24, 2023 •

edited

Loading

ivanstanev Feb 27, 2023 •

edited

Loading

tommyknows Feb 27, 2023 •

edited

Loading