Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal implementation #1

Merged
merged 4 commits into from
Feb 27, 2023
Merged

Minimal implementation #1

merged 4 commits into from
Feb 27, 2023

Conversation

tommyknows
Copy link
Contributor

@tommyknows tommyknows commented Feb 22, 2023

This PR includes the first minimal implementation of the kubernetes-scanner.
It currently takes a config which includes a list of GVKs and (potentially) namespaces to scan, and logs the result.

@tommyknows tommyknows force-pushed the feat/implementation branch 4 times, most recently from cd44898 to 090b9fe Compare February 23, 2023 10:16
dockerfile switches to use boringcrypto instead of the regular crypto
module (I think that would make it FIPS-compliant).
}

type Config struct {
Scanning Scan `json:"scanning"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the scan config, as well as the group,version,kind, I was imaging we'd need potentially 2 more fields:

  • Namespaces, as an array (or * as a special case to scan all namespaces. No need for glob logic)
    • It's possible, but outside the scope of this PR, that we'll want to default to scanning the namespace into which we're deployed, if not overridden with config
  • IsNamespaced, or potentially IsNotNamespaced since being namespaced is the norm, defaulting to "is namespaced", if we need to tell the k8s libraries which namespaces to scan. Certainly, for the dynamic client, you do need to know what's namespaced and what's not. If that's not the case with event listeners, maybe this point disappears after implementing the above one. Users would just write config to scan an arbitrary namespace for non-namespaced resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re IsNamespaced: I don't think that's required, at least not for controller-runtime.
The manager.Options type contains a Namespace field that would do what we want (have a look at the docs), however, that's scoping the entire manager to a specific namespace.

What that means is that our cache will always contain the objects from all namespaces, and we'll need to write our own logic to "skip" resources from certain namespaces.

I assume that's fine in most installations, and the clusters that are big enough for this to be a problem should probably just run completely namespace-scoped controllers instead? 🤔

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I understand now. Sorry, I should have re-familiarised myself with controller-runtime before commenting. I see what you mean about that namespace option.

If we implement my namespaces note, it'll still be possible to partition multiple instances by {GVK,namespace}, and un-namespaced resources can then just be configured use an arbitrary namespace, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'll still be possible to partition multiple instances by {GVK,namespace}, and un-namespaced resources can then just be configured use an arbitrary namespace, no?

I'm not quite sure what you're asking, but iiuc you just wouldn't set the Namespaces option for cluster-scoped resources.

Currently, if you'd do something like this:

Group: "",
Version: "v1",
Kind: "Namespace",
Namespaces: ["foo", "bar"]

You'd never reconcile a Namespace resource because they're not in the given namespaces.

With namespaced resources, that would work as expected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm let me try with an example, it's possible we're misunderstanding each other. With a config that looks something like this:

scan:
  - gvk: apps/v1.Deployment
    namespaces: ['default', 'plantuml']
  - gvk: v1.Service
    namespaces: ['*']
  - gvk: rbac.authorization.k8s.io/v1.ClusterRole

Namespaced resources can be scanned from multiple namespaces, or all with *. Non-namespaced resources like the ClusterRole do not specify a namespace. We indeed don't need IsNamespaced.

In the future we could default namespaces to the currently-deployed namespace, perhaps. It's not clear to me how useful that'd be.

I also cheekily proposed a friendly string syntax for GVK, we don't have to do this / can do it later. It's orthogonal to the point and was quicker to type.

Copy link
Contributor Author

@tommyknows tommyknows Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespaced resources can be scanned from multiple namespaces, or all with *. Non-namespaced resources like the ClusterRole do not specify a namespace.

Yes. Two things that are "different" in my current implementation:

  1. "all with *" is instead "all with an empty list". Minor UX difference; I have no strong opinion on what's better. Omission meaning "all namespaces" is less to type though :) Also, "no namespaces" doesn't really make sense.
  2. "Non-namespaced resources do not specify a namespace". Can't stop customers from defining one in their config. Currently, this would mean the reconciliation loop for that resource "breaks" (no resources would be reconciled). We can also change it to ignore it. Also no strong opinion here :)

proposed a friendly string syntax for GVK

Yeah we have to see what's possible. Thinking about the ClusterRole that we do want to generate, I think they use the plural naming of things (e.g. pods instead of "Pod"), so maybe we instead need to support that naming. We'll see once we get to writing the Helm chart? :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of that sounds fine, I think the core concept is sound. As a first pass, it's fine to not prevent user misconfiguration. In a future pass, we could detect such things and print warnings (just an example), and also translate between singular and plural names by asking the cluster for its api resources. Let's put anything that might have value like that (if you agree) in follow-on issues in the backlog. They might sit there forever, or they might be useful in the future!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think especially the translation-feature will become necessary once we start developing the Helm chart. I'll add it to the backlog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, I think they use the plural naming of things (e.g. pods instead of "Pod")

Yeah, for the role, we specify the things in plural format

https://github.com/snyk/kubernetes-monitor/blob/99215c8a3f06e4dbda48b90eda6190b19f27b326/snyk-monitor/templates/clusterrole.yaml

- apiGroups:
  - ""
  resources:
  - replicationcontrollers
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - get
  - list
  - watch

This kubectl api-resources -o wide prints supported resources with GVK / namespaced (true|false), I guess API equivalent is here: https://github.com/kubernetes/client-go/blob/master/discovery/discovery_client.go

But yeah, all of these should be part of the backlog, nothing to tackle at T=0

internal/config/config.go Show resolved Hide resolved
flag.StringVar(&configFile, "config", "/etc/kubernetes-scanner/config.yaml", "defines the location of the config file")
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")
flag.BoolVar(&enableLeaderElection, "leader-elect", false,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be unused. Just to check, this is a controller-runtime concept, right? We're not about to implement a consensus algorithm? I don't doubt you could do it of course, but still 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that whole leaderelection thing is already built into controller-runtime. I've removed everything else from it, but forgot about that one.

I also won't implement a consensus algorithm ;)

@tommyknows tommyknows force-pushed the feat/implementation branch 2 times, most recently from a3b1a0c to e89ddb1 Compare February 24, 2023 08:53
@tommyknows tommyknows changed the title Feat/implementation Minimal implementation Feb 24, 2023
@tommyknows tommyknows marked this pull request as ready for review February 24, 2023 09:11
@tommyknows tommyknows requested a review from a team as a code owner February 24, 2023 09:11
Copy link

@craigfurman craigfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I'm amazed how short it is. We can add observability etc in a future pass, it's good to have a minimally-viable base to iterate on.

A few notes, and FWIW I haven't run it yet, still waiting on an example config file, and the hope of out-of-cluster auth 😁

func init() {
flag.StringVar(&configFile, "config", "/etc/kubernetes-scanner/config.yaml", "defines the location of the config file")
flag.StringVar(&metricsAddr, "metrics-bind-address", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&probeAddr, "health-probe-bind-address", ":8081", "The address the probe endpoint binds to.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes on config that I'll put arbitrarily on this line 😁

  • Why do we have flags that duplicate 2 config file fields (metrics and probe addrs)?
  • Can we provide an example local config in the repo for development? I know I already wrote a note about this, but starting to write one in json was a bit of a drag
  • Can we make the config file yaml, to make it a little easier to hammer out by hand for development? JSON is still valid yaml, so we can have our cake and eat it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have flags that duplicate 2 config file fields (metrics and probe addrs)?

They were flags originally (kubebuilder scaffolding), I thought we should also allow this to be configured through the config file. But one place should be enough, so I'll just keep it in the config file.

Can we provide an example local config in the repo for development? I know I already wrote a note about this, but starting to write one in json was a bit of a drag

Sure, I'll add a more complete test that marshals a complete config somewhere.

Can we make the config file yaml, to make it a little easier to hammer out by hand for development? JSON is still valid yaml, so we can have our cake and eat it here

Yes, definitely 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed some changes, we now have a test that writes a sample config to a temp file. You can start from there.

Kubeconfig is specified either from the -kubeconfig flag or through $KUBECONFIG or ~/.kube/config. See the ctrl.GetConfig() docs on this to learn more. The test also shows this by setting the flag.

main_test.go Outdated
/*
Copyright 2023.

Licensed under the Apache License, Version 2.0 (the "License");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this generated with kubebuilder, out of interest? That's fine, but so much was removed if so that I can't tell, which is also fine. This license header makes me suspect that it was. I think we should remove these, if we're allowed (and off the top of my head, the apache license isn't "viral", but I'd recommend checking), because we haven't decided on an open source license for this project yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the initial files, which I pretty much deleted completely in this process, was generated by kubebuilder. But kubebuilder's testing strategy is quite different - they use BDD-style testing with a library called Ginkgo - which I don't really like 😄

I'll remove the header.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, I used to work with the ginkgo developer. I used to quite buy into that rspec-style testing DSL, these days I've come back down to earth (go test, junit)

main.go Outdated
Comment on lines 79 to 88
for _, scanType := range cfg.Scanning.Types {
if err := (&reconciler{
Reader: mgr.GetClient(),
requeueAfter: cfg.Scanning.RequeueAfter.Duration,
reconcile: reconcile,
newObject: newObjectFn(scanType.GroupVersionKind),
namespaces: scanType.Namespaces,
}).SetupWithManager(mgr); err != nil {
return nil, fmt.Errorf("unable to create controller for GVK %v: %w", scanType.String(), err)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this is the setup of the controller-runtime loops, like any kubernetes controller, and we're doing 1 per GVK. I have a few possibly unhelpful, low-context questions, feel free to push back on the premise:

  • How does this compare to using a DynamicInformer (e.g. https://blog.dsb.dev/posts/creating-dynamic-informers/).
  • In fact, that guy's now-archived project, Kollect, looks pretty similar to this in concept. I suspect there are a lot of those on the internet, which makes me wonder if we should use/adapt one, but on the other hand, if the only similarity is the control loop, there's not much value.
  • Is there a way to register multiple resource types on one manager, and if so, what are the trade-offs of doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compare to using a DynamicInformer

I think that's what's being used under the hood in kubebuilder / controller-runtime. I wouldn't know how they do it otherwise 😄

if we should use/adapt one

There's a lot of those, mainly because all Kubernetes control loops look the same :)
The dynamic ones - kollect, port - are even more similar for obvious reasons.

I would argue that the "core logic" that we currently have is not worth using a dependency. In contrast, having this in our control means we get a bit more freedom for the features we want to build on top. I think it's easier than trying to depend on something that's already built.

Additionally, it's just not that much code thanks to controller-runtime as well 😄

Is there a way to register multiple resource types on one manager, and if so, what are the trade-offs of doing that?

That's what we're doing here; but I assume that's not what you asked for.
We create one reconciler per GVK, and attach them all to the same manager.
The two other ways:

  • One reconciler for multiple GVKs at once: I've tried that in the beginning, but that doesn't work. The reconcile-loops are tied to a single resource type.
  • One reconciler per manager, having multiple managers: I'm not sure why you'd do this.
    Having a single manager means you can share a cache, share the setup, share the logic etc etc...

Does that answer your question?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, I was mistaken about multiple managers, now that I look again. I really meant reconcilers, and you've answered that! Thanks

main.go Outdated

// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
// TODO: not sure what's needed here if we only support in-cluster auth.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like the ability to do out-of-cluster auth, e.g. https://gist.github.com/craigfurman/7160ce317e5a4d868e78c558593601fd#file-kube_list_all_resources-go-L20-L31 (itself cribbed from a kubernetes example somewhere). Development loops will be much faster just running this controller on our laptops, pointed at some kube cluster, e.g. docker-desktop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment, you can either set -kubeconfig, $KUBECONFIG or it will automatically use ~/.kube/config. I guess this comment can be removed.

@tommyknows tommyknows force-pushed the feat/implementation branch 3 times, most recently from fb67740 to 6d18896 Compare February 24, 2023 15:04
@@ -7,7 +7,7 @@ ARG COMMIT_SHA
ARG GIT_TAG

RUN go mod download
RUN CGO_ENABLED=0 go build \
RUN CGO_ENABLED=0 GOEXPERIMENT=boringcrypto go build \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does though:

Is the plan to inherit the existing CMVP validation of BoringCrpto until 7/12/2025 or submit for a new FIPS validation?

@pfox1969 The BoringCrypto module being used did not change as part of this work.

And because the BoringCrypto module is FIPS-certified, I'd say yes it is FIPS-certified as well?

Also, this seems to be an interesting discussion, which links to this

// Package fipsonly restricts all TLS configuration to FIPS-approved settings.
//
// The effect is triggered by importing the package anywhere in a program, as in:
//
//	import _ "crypto/tls/fipsonly"
//
// This package only exists when using Go compiled with GOEXPERIMENT=boringcrypto.

Looks pretty good to me then? 😄 (we do not have imported this package currently though)

}
}

func TestConfigRealAPIServer(t *testing.T) {
Copy link

@ivanstanev ivanstanev Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What API server is being used here? From setup-envtest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. The restCfg (which indicates how everything should connect to the API server) is created from test.SetupEnv, which in turn uses the envtest package.

envtest uses a local kube-apiserver and etcd binary, which is "installed" by setup-envtest.

main.go Show resolved Hide resolved
Comment on lines +122 to +128
func newObjectFn(forGVK schema.GroupVersionKind) func() client.Object {
return func() client.Object {
u := &unstructured.Unstructured{}
u.SetGroupVersionKind(forGVK)
return u
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we only collect the GVK from a watched resource and nothing else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not sure what you're asking here.

We don't "collect" GVKs, we only collect the watched resources, which are identified based on the GVK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the code, it creates an empty unstructured object and only sets GVK... Just not sure what is happening here 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller-runtime client.Client functions have signatures like this:

Get(ctx context.Context, key client.ObjectKey, obj client.Object, opts ...client.GetOption) error

We're talking about this client.Object which is an interface for a "generic" object.
Usually, the client is used like this:

pod := &corev1.Pod{}
if err := client.Get(ctx, <namespaced-name>, pod); err != nil { ... }

This will populate the pod object.

However, we don't want to do this because that would require us to add all the types to the code (e.g. make it not dynamic), and we'd also need to register all the CRD's schemas - meaning depending on some external Go code as well - in the scheme for encoding & decoding objects.

The unstructured.Unstructured type also implements client.Object, however this only works when you set the GroupVersionKind so that the client.Get actually knows which type it should return.

In that case, you don't need type-specific encoding- and decoding-logic, as it will just be put into this "generic" representation of an object within unstructured.Unstructured{}.

This allows us to do a "dynamic" scanning simply by knowing the GVK.

}

obj := r.newObject()
if err := r.Get(ctx, req.NamespacedName, obj); err != nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this call the Kubernetes API, and does it have a chance of failing due to some network error or by triggering rate limiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this call the Kubernetes API

Technically yes, practically afaik not.
The standard client has a cache built-in, and that cache is populated from things like the Watch events. Meaning if we get the object here, it should very likely be cached :)

(At least that's the assumption I've always made. Given that I've never ran into rate-limiting with that client in my career, I would expect this assumption to be correct)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do Get the object from the API in response to events, or does that just pull a cached document out of the event or something along those lines?

Copy link
Contributor Author

@tommyknows tommyknows Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what happens "internally" (within controller-runtime), whether they get the actual document with the event or not. For us, we just get the NamespacedName based on an event, and the actual object is already cached within the client (transparently to us). So:

We do Get the object from the API in response to events

We don't, but I assume that the controller-runtime does that before invoking the Reconcile method (or gets it directly with the event).

or does that just pull a cached document out of the event or something along those lines?

We just pull a cached document out of it, but again that happens transparently :)


From our (code's) PoV, whether this thing is cached or not doesn't matter, and we don't really need to know.

controller-runtime handles this, and I assume that when you Watch and reconcile specific resources, they will already be cached before it starts the Reconcile call :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just pull a cached document out of it, but again that happens transparently :)

You gotta admit, that was an impressive total guess of mine 😂

All SGTM

Copy link

@craigfurman craigfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tommyknows and I just paired a bit to help me fully understand what's going on. Great work!

I'll approve this now, feel free to trim loose ends, squash and merge whenever you like. Let's be biased towards merging and iterating 💪

Dangling threads can be spun out into issues (and there may already be existing relevant issues, e.g. ensure eventual consistency of deletions)

This commit adds a very rough first implementation of the
kubernetes-scanner, plus tests.

It also contains some documentation about what and how we are building
this controller.
TODO: maybe instead cache / save them in the image...
@tommyknows tommyknows merged commit 339edda into main Feb 27, 2023
@tommyknows tommyknows deleted the feat/implementation branch February 27, 2023 15:22
@snyksec
Copy link

snyksec commented Mar 17, 2023

🎉 This PR is included in version 1.0.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants