Add support for the Kubernetes Resource Model #46

micahhausler · 2021-08-27T19:02:12Z

Description

Added proposal 0026 Support for the Kubernetes Resource Model.

rawkode · 2021-08-27T19:30:41Z

Rather than needing a fully fledged Kubernetes cluster, it would be cool to see this work with kcp

https://github.com/kcp-dev/kcp

micahhausler · 2021-08-27T19:43:16Z

Rather than needing a fully fledged Kubernetes cluster, it would be cool to see this work with kcp

https://github.com/kcp-dev/kcp

That would be super cool. As long as KCP supports CRDs, admission webhooks, finalizers, watches, leases, and RBAC (which I think it does support all of those), it should be portable.

proposals/0025/README.md

andy-v-h · 2021-09-01T21:58:07Z

I like that idea of creating in tree k8s support. Specifically making it easier to do a hitless reload on hegel by updating a ConfigMap, for example. However, I'm not sure a CRD based approach is the direction we would be best. My primary concern is the additional load on etcd and resiliency there. A scenario I find concerning is where you have relatively small amounts of compute available for running tinkerbell, and requiring it to rely on etcd vs being able to use an external DB service. And in a DR scenario, how would a recovered tinkerbell not reprov all of my hosts?

micahhausler · 2021-09-02T15:55:23Z

I like that idea of creating in tree k8s support. Specifically making it easier to do a hitless reload on hegel by updating a ConfigMap, for example. However, I'm not sure a CRD based approach is the direction we would be best. My primary concern is the additional load on etcd and resiliency there.

ConfigMaps in Kubernetes are unstructured and don't allow for strong typing. This causes lots of practical validation problems. CRDs are roughly equivalent in terms of efficiency and load, and they allow for much easier client development. etcd specifically can be operated in many different modes, from single node (lower resiliency) more typically 3 nodes for increased availability. Neither etcd nor Kubernetes need to be operated on the same compute that Tinkerbell is operated on. This proposal is agnostic to where Kubernetes is run, and operators can connect Tinkerbell to Kubernetes in a managed cloud environment where compute resource is less of a concern.

A scenario I find concerning is where you have relatively small amounts of compute available for running tinkerbell, and requiring it to rely on etcd vs being able to use an external DB service.

Neither Kubernetes nor etcd would be required to run on the same compute as Tinkerbell. This would be equivalent to connecting to an external DB service.

And in a DR scenario, how would a recovered tinkerbell not reprov all of my hosts?

Check out the Migrating between Kubernetes clusters, Backups, and Restoration section under Administrative Tasks but the short version is that the .status of Workflows would be included in backups and restoration and machines would not be re-provisioned.

Does that address your concerns?

splaspood · 2021-09-10T17:21:32Z

I would love to see tinkerbell support this as one of many possible data backends, forcing us to create a well defined interface between tink server and n number of possible underlying data stores.

displague · 2021-09-15T15:37:28Z

Like @splaspood, I wonder what the tinkerbell services would look like if the backend could be toggled between a local filesystem (manifest.d/ style directories), postgres, and Kubernetes resources.

In order to support these three different storage models, a pluggable data interface would benefit from a Tinkerbell data event system.

Rather than having tinkerbell poll a database, or watch a directory, or watch a set of kubernetes CRs and configmaps, tinkerbell could await data events while the data interface modules take on the polling and watching.

Does this mean that tinkerbell takes on another grpc service? Perhaps.
Does tinkerbell need caching or relationship mapping between resources? Maybe? (I don't think so.)

What events would tinkerbell need to listen for from pluggable data modules?
What features does tinkerbell need to update data stored in generic data modules?

displague · 2021-09-15T16:18:07Z

Reconsidering my last comment, rather than invent a new format of event drivers and data passing (which are some of the problems we are trying to solve with this proposal), by taking the K8s API as the data and eventing backend (not necessarily with containers involved) we inherit a standard and well-evolved set of tools and opinions that bring stability (to the API and data backend).

We solve the pluggable data backend because we can use services like https://github.com/k3s-io/kine to provide a postgresql, MySQL, and yet-to-be-added backends.

micahhausler · 2021-09-16T14:04:46Z

Reconsidering my last comment, rather than invent a new format of event drivers and data passing (which are some of the problems we are trying to solve with this proposal), by taking the K8s API as the data and eventing backend (not necessarily with containers involved) we inherit a standard and well-evolved set of tools and opinions that bring stability (to the API and data backend).

We solve the pluggable data backend because we can use services like https://github.com/k3s-io/kine to provide a postgresql, MySQL, and yet-to-be-added backends.

YES! This is exactly it!

jacobweinstock · 2021-09-21T15:13:36Z

This is a great proposal and the WIP demo at the community meeting was really great. I do like the idea of putting this behind a flag.

micahhausler · 2021-09-28T15:33:02Z

Here's the current code I demoed in the community meeting today:

Signed-off-by: Micah Hausler <mhausler@amazon.com> ## Description This includes 4 changes that are preparatory for future work to migrate to the Kubernetes data model. * Propagate stream context in streaming API. Previously `context.Background()` was used, but the stream provides a context object * Refactored shadowed "context" package name. By naming a method variable "context", no `context` package calls could be made in the method * Added `context.Context` to `GetWorkflowsForWorker()` database API. This plumbs down the context from the API call into the `d.instance.QueryContext()` call. * Refactor the database interface. I added a new interface `WorkerWorkflow` with the methods that get used by APIs the Tink Worker invokes. This is essentially a no-op for now. ## Why is this needed See tinkerbell/proposals#46 ## How Has This Been Tested? Locally ran tests. ## How are existing users impacted? What migration steps/scripts do we need? No impact to existing users ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [ ] added unit or e2e tests - [ ] provided instructions on how to upgrade

displague

Thanks for this excellent and well-thought-out proposal, @micahhausler.

I've left some comments as I read through the document, many of the questions that I raise are answered elsewhere. This suggests to me that some information could be reordered for clarity. There were a few details that I felt were missing and I've said as much (this may be hard to gather as I left follow-up comments).

I'd recommend going through this feedback and the previous feedback and make sure that any details that were given in feedback comments and responses have made their way into the proposal.

This proposal, to me, is one of the most significant to be raised. Well done!

proposals/0026/README.md

displague · 2021-10-07T20:22:31Z

proposals/0026/README.md

+    At the time of writing, any Tink worker is authorized to list all hardware data, including possibly sensitive metadata for other hardware.
+    Today, the Tinkerbell Cluster API (CAPI) provider stores machine CA private key data in Tinkerbell hardware metadata.
+    Ideally, some authorizer that could make decisions based on authentication identity would be used (The Kubernetes analog would be the [Node Authorizer][node-authorizer]).
+    The authenticatoin method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential.


Suggested change

The authenticatoin method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential.

The authentication method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential.

displague · 2021-10-08T11:45:32Z

proposals/0026/README.md

+
+[![current-architecture](./tink-arch-1.png)](.tink-arch-2.png)
+
+In this proposal, all non-worker clients of Tinkerbell would become Kubernetes clients.


Suggested change

In this proposal, all non-worker clients of Tinkerbell would become Kubernetes clients.

In this proposal, all non-worker clients of Tinkerbell may become Kubernetes clients.

This really comes down to wether or not PostgreSQL support is preserved at all and then whether the user chooses it.

displague · 2021-10-08T11:53:20Z

proposals/0026/README.md

+1. User creates a Workflow CRD object in Kubernetes.
+   All the user would need to specify in the object `.spec` would be a reference to the Template object, and mappings for devices.
+   This is analogous to the current `tink workflow create -t $TEMPLATE_ID -r "{\"device_1\":\"$MAC_ADDR\"}` command.
+1. The Tinkerbell API would include a Kubernetes workflow controller that would watch for created workflows, and fill out the `.status` with most of the logic that currently exists in the [`CreateWorkflow()`][createwfrpc] RPC.


The format of status is not discussed here, but something to consider and then call out if utilized is conditioned statuses, https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties

crossplane/crossplane-runtime#198 includes some relatable examples, benefits, and counter-point.

Do you want to see an API schema in this proposal? I was thinking it would be better to discuss and iterate on specific API structures in tinkerbell/tink#542

Speaking of Crossplane, it looks like @displague started on https://github.com/crossplane-contrib/provider-tinkerbell a while back, which would at least provide a KRM compliant shim to Tinkerbell. Perhaps an alternative that wouldn't require rearchitecting the project?

(Going a bit off-topic to consider what this KRM implementation means for a Crossplane Runtime Model (XRM) implementation, and perhaps how they could be the same)

Speaking of Crossplane, it looks like @displague started on a Tinkerbell Crossplane provider a while back, which would provide a KRM compliant shim to Tinkerbell

Typically the XRM spec mirrors an external cloud provider API spec as the aforementioned Tinkerell Crossplane Provider does. But if the Tinkerbell spec is implemented as KRM, but does not implement XRM, then a Crossplane Provider would need to interact with Tinkerbell through a KRM mirroring spec. The cluster could have two nearly identical resources within the same cluster to represent a composable resource.

We could overlook this, as Pods and CronJobs or ReplicaSets have a thin wrapping relationship.

However, this suggests to me that it should be possible for a controller to implement an XRM compatible interface while adopting the KRM. The benefits of doing so would be that Crossplane compositions (XR) can be applied, managed by Crossplane. The underlying service, Tinkerbell bare metal orchestration, would transparently serve as a Crossplane Provider. The Tinkerbell service would not be dependent on Crossplane's installation to add this layer of compatibility.

One complication that comes to mind is the expectation that Crossplane Resources have a Provider. In this approach, resources would be the provider. Perhaps, like the Crossplane Helm provider, the provider configuration would refer to the host cluster? Would the default be to report with the local cluster? Would alternatives provider configuration offer connection details for a non-KRM Tinkerbell backplane? Another KRM Tinkerbell backplane?

Adopting Crossplane's packaging mechanisms could provide additional benefits especially for service updates. Although, in this approach, it's not clear to me what benefits this holds over Helm or Kustomize manifests.

What does this mean in practice? The Tinkerbell KRM implementation (starting with tinkerbell/tink#542) would adopt XRM types like Referencers, ResourceStatus, ResourceSpec, and ProviderConfig.

https://crossplane.io/docs/v1.4/contributing/provider_development_guide.html#provider-development-guide
https://blog.crossplane.io/crossplane-vs-cloud-infrastructure-addons/ (by @negz) provides the relevant Crossplane context.
https://danielmangum.com/posts/crossplane-infrastructure-llvm/ (by @hasheddan discusses the versatility of the Crossplane resource model)

This is likely too speculative to be actionable. This would be the first controller to act in this XRM compatible way without being first-and-foremost a Crossplane provider.

proposals/0026/README.md

displague · 2021-10-08T14:00:25Z

proposals/0026/README.md

+
+## APIs
+
+The following [Tinkerbell Workflow APIs][wf-apis] are used by the Tink worker, and would remain.


Why can these services not be initially replaced by the Kubernetes model? Do you see them eventually be removed?

Do status conditions, events, logs, and additional CRDs offer the means to move these entities to the Kubernetes API?

.. I think I see.

These are the APIs needed for the provisioning machine to get actions and update state. That could be more clearly defined.

Why would Tink workers not become Kubernetes clients, for example? Each machine could use a unique service account which would open up more possibilities (potentially reducing the machine threat model).

Why can these services not be initially replaced by the Kubernetes model? Do you see them eventually be removed?

The main reason to only include support for worker APIs is to shorten the migration period and amount of effort required. The ultimate goal is to make all non-worker Tink clients talk to Kubernetes, so instead of a two-step migration where boots/hegel still talk to Tink API, just go directly to the new model. So yes, I do see them eventually removed.

These are the APIs needed for the provisioning machine to get actions and update state. That could be more clearly defined.

Will do.

Why would Tink workers not become Kubernetes clients, for example? Each machine could use a unique service account which would open up more possibilities (potentially reducing the machine threat model)

I am not inclined to make Tink workers K8s clients at this time. The main reason is I want to limit what an individual (unauthenticated/untrusted) worker can do or see in the API. K8s authz lacks the ability to easily do something like "This worker should only be able to view a specific CRD that matches the worker's name." We basically want attribute based authz, and K8s is all role-based. Yes, you could create a specific RBAC role for each worker, but that gets messy. It's much easier to have a trusted deputy like the Tink API that can correctly and securely propagate the state changes in the storage layer.

proposals/0026/README.md

jacobweinstock

I feel that this has been vetted quite a bit and has been open for comment for a while now. I believe @micahhausler has addressed all comments.

thebsdbox

Awesome, I'd love to see this moved forward.

micahhausler · 2021-11-12T21:59:30Z

I've updated the proposal and addressed comments

proposals/0026/README.md

micahhausler · 2021-11-16T17:46:18Z

@displague Fixed cutoffs and did a once-over of the whole thing

displague

Thanks for the details and revisiting the document for follow-ups, @micahhausler!

proposals/0026/README.md

Signed-off-by: Micah Hausler <mhausler@amazon.com>

micahhausler · 2021-11-17T19:04:34Z

Updated the status to Published

jacobweinstock · 2021-11-18T17:34:54Z

Hey @tstromberg, when/if you have some cycle would you mind checking this out again, please?

tstromberg

Let it be merged! Thank you for addressing the bootstrapping concerns.

Signed-off-by: Micah Hausler <mhausler@amazon.com> ## Description The new backend uses the Kubernetes CRD types in Tinkerbell core. Boots could use a bit of a refactor/organization in how the backends work, but for now I've tried to fit in to the existing structure. ## Why is this needed This is the boots side of work for tinkerbell/proposals#46 Fixes: # ## How Has This Been Tested? Tested with real bare metal workers using an image built from this branch. I will add E2E tests in a follow up PR, or can add to this branch ## How are existing users impacted? What migration steps/scripts do we need? No impact to existing users ## Checklist: I have: - [ ] updated the documentation and/or roadmap (if required) - [ ] added unit or e2e tests - [ ] provided instructions on how to upgrade

micahhausler force-pushed the 0025 branch from d02c914 to 56c0834 Compare August 27, 2021 19:03

tstromberg suggested changes Aug 30, 2021

View reviewed changes

proposals/0025/README.md Outdated Show resolved Hide resolved

proposals/0025/README.md Outdated Show resolved Hide resolved

proposals/0025/README.md Outdated Show resolved Hide resolved

micahhausler force-pushed the 0025 branch 3 times, most recently from e059283 to bf1b5b4 Compare August 30, 2021 19:48

micahhausler changed the title ~~Added Tinkerbell on Kubernetes Proposal~~ Add support for the Kubernetes Resource Model Aug 30, 2021

micahhausler mentioned this pull request Sep 6, 2021

Abstract Packet client tinkerbell/smee#205

Merged

3 tasks

mmlb requested a review from tstromberg September 21, 2021 15:17

tstromberg added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Sep 21, 2021

This was referenced Sep 29, 2021

Added Kubernetes API types tinkerbell/tink#542

Merged

Database interface refactor tinkerbell/tink#544

Merged

micahhausler force-pushed the 0025 branch from d99920e to 60b52f3 Compare October 5, 2021 14:54

displague reviewed Oct 8, 2021

View reviewed changes

jacobweinstock previously approved these changes Nov 2, 2021

View reviewed changes

thebsdbox previously approved these changes Nov 2, 2021

View reviewed changes

micahhausler dismissed stale reviews from thebsdbox and jacobweinstock via 957b75c November 12, 2021 21:39

micahhausler force-pushed the 0025 branch 2 times, most recently from 957b75c to 2f74f34 Compare November 12, 2021 21:51

displague reviewed Nov 15, 2021

View reviewed changes

proposals/0026/README.md Outdated Show resolved Hide resolved

displague reviewed Nov 15, 2021

View reviewed changes

proposals/0026/README.md Outdated Show resolved Hide resolved

micahhausler force-pushed the 0025 branch from 2f74f34 to 382e5ee Compare November 16, 2021 17:45

displague previously approved these changes Nov 16, 2021

View reviewed changes

proposals/0026/README.md Outdated Show resolved Hide resolved

displague added the ready-to-merge Signal to Mergify to merge the PR. label Nov 16, 2021

micahhausler dismissed displague’s stale review via 3a56f47 November 17, 2021 18:52

micahhausler force-pushed the 0025 branch from 382e5ee to 3a56f47 Compare November 17, 2021 18:52

[0025] Add Kubernetes Resource Model Proposal

2f2c1ec

Signed-off-by: Micah Hausler <mhausler@amazon.com>

micahhausler force-pushed the 0025 branch from 3a56f47 to 2f2c1ec Compare November 17, 2021 18:58

jacobweinstock approved these changes Nov 18, 2021

View reviewed changes

displague approved these changes Nov 23, 2021

View reviewed changes

tstromberg approved these changes Nov 24, 2021

View reviewed changes

mergify bot merged commit 8390df9 into tinkerbell:main Nov 24, 2021

pokearu mentioned this pull request Mar 15, 2022

Integrate PBNJ into Tinkerbell k8s model tinkerbell/pbnj#121

Closed

micahhausler mentioned this pull request Apr 6, 2022

Add kubernetes-based backend tinkerbell/smee#250

Merged

3 tasks

micahhausler mentioned this pull request Apr 13, 2022

Kubernetes Resource Model - docker-compose extension tinkerbell/playground#131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the Kubernetes Resource Model #46

Add support for the Kubernetes Resource Model #46

micahhausler commented Aug 27, 2021 •

edited

Loading

rawkode commented Aug 27, 2021

micahhausler commented Aug 27, 2021

andy-v-h commented Sep 1, 2021

micahhausler commented Sep 2, 2021

splaspood commented Sep 10, 2021

displague commented Sep 15, 2021 •

edited

Loading

displague commented Sep 15, 2021 •

edited

Loading

micahhausler commented Sep 16, 2021

jacobweinstock commented Sep 21, 2021

micahhausler commented Sep 28, 2021

displague left a comment

displague Oct 7, 2021

displague Oct 8, 2021

displague Oct 8, 2021

displague Oct 8, 2021

micahhausler Oct 8, 2021

negz Oct 11, 2021

displague Oct 25, 2021 •

edited

Loading

displague Oct 25, 2021

displague Oct 8, 2021

micahhausler Oct 8, 2021

jacobweinstock left a comment

thebsdbox left a comment

micahhausler commented Nov 12, 2021

micahhausler commented Nov 16, 2021

displague left a comment

micahhausler commented Nov 17, 2021

jacobweinstock commented Nov 18, 2021 •

edited

Loading

tstromberg left a comment •

edited

Loading

	The authenticatoin method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential.
	The authentication method in this model could be a network identity, a TPM, or some other bootstrapped authentication credential.


		[![current-architecture](./tink-arch-1.png)](.tink-arch-2.png)

		In this proposal, all non-worker clients of Tinkerbell would become Kubernetes clients.


		## APIs

		The following [Tinkerbell Workflow APIs][wf-apis] are used by the Tink worker, and would remain.

Add support for the Kubernetes Resource Model #46

Add support for the Kubernetes Resource Model #46

Conversation

micahhausler commented Aug 27, 2021 • edited Loading

Description

rawkode commented Aug 27, 2021

micahhausler commented Aug 27, 2021

andy-v-h commented Sep 1, 2021

micahhausler commented Sep 2, 2021

splaspood commented Sep 10, 2021

displague commented Sep 15, 2021 • edited Loading

displague commented Sep 15, 2021 • edited Loading

micahhausler commented Sep 16, 2021

jacobweinstock commented Sep 21, 2021

micahhausler commented Sep 28, 2021

displague left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

displague Oct 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobweinstock left a comment

Choose a reason for hiding this comment

thebsdbox left a comment

Choose a reason for hiding this comment

micahhausler commented Nov 12, 2021

micahhausler commented Nov 16, 2021

displague left a comment

Choose a reason for hiding this comment

micahhausler commented Nov 17, 2021

jacobweinstock commented Nov 18, 2021 • edited Loading

tstromberg left a comment • edited Loading

Choose a reason for hiding this comment

micahhausler commented Aug 27, 2021 •

edited

Loading

displague commented Sep 15, 2021 •

edited

Loading

displague commented Sep 15, 2021 •

edited

Loading

displague Oct 25, 2021 •

edited

Loading

jacobweinstock commented Nov 18, 2021 •

edited

Loading

tstromberg left a comment •

edited

Loading