Provisioning a group of non-Kubernetes hosts using ClusterAPI and Virtual Kubelets #2073
Replies: 3 comments 2 replies
-
Hi! |
Beta Was this translation helpful? Give feedback.
-
Thank you for the thorough writeup! It is an interesting approach, or perhaps rather workaround for the missing functionality in Metal3. Hopefully people will find it useful. That said, I think this discussion also shows that the demand for non-Kubernetes hosts is still there. I know there are multiple people who want this and I think we may be able to move forward with it this year. Hopefully we can agree on a design already before the summer. 🙂 |
Beta Was this translation helpful? Give feedback.
-
Hello @lentzi90, First of all, thank you for taking the time to reply.
What I described above is a PoC so by no means it should be used in production as it is. Additional effort would be necessary in both design/implementation/testing to adopt such approach in production.
That is great, I would like to participate to the conversation if possible. Is there a workgroup or issue on this already? |
Beta Was this translation helpful? Give feedback.
-
Context:
Administrators of on-premise infrastructure often need to provision bare-metal hosts that are not part of a Kubernetes
cluster with custom data, such as SSH keys, packages, and configuration files.
Goal: Provision a set of Metal3 BareMetalHost with associated userData.
Challenge: Currently, manipulating BareMetalHosts (BMH) allows for provisioning and deprovisioning of individual hosts,
but there is no support for large-scale orchestration. At the same time, ClusterAPI (CAPI) and the Metal3 ClusterAPI provider (CAPM3) enable the provisioning of multiple hosts but such capabilities are dedicated to the management of Kubernetes nodes.
A previous proposal tackled this problem by introducing a new operator but this requires extensive development and the replication of host provisioning functionalities already implemented by CAPI/CAPM3.
In this post we tackle two main questions:
In the rest of this post I report our findings trying to tackle the two questions above. I would like to open a discussion on this approach and know if you any feedback.
TLDR:
We found that the CAPI Machine and MachineDeployment objects, while suitable in theory,
as part of their design they expect a Kubernetes Node in the target cluster for each Machine associated with a MachineDeployment.
Given that no Kubernetes Node object is created when provisioning non-Kubernetes hosts, the CAPI MachineDeployment remains stuck in the ScalingUp phase.
We found a solution to the challenge posed by the absence of a Kubernetes Node object: utilizing Virtual Kubelets.
By employing Virtual Kubelets, we can register hosts as Kubernetes Nodes without enabling Pod execution.
This allows each CAPI Machine to become Ready and the MachineDeployment to progress from the ScalingUp phase to the Running phase.
To streamline this process, a CAPI Bootstrap Provider for Virtual Kubelets automates the configuration
of virtual-kubelets, allowing users to provision a group of BareMetalHosts with ClusterAPI objects.
This approach allows us to reuse existing CAPI/CAPM3 resources and we avoid implementing an additional operator to manage a group of non-Kubernetes hosts. Also virtual kubelets could be used by administrators to implement custom management operations on the provisioned hosts. I have left additional considerations at the end of the post.
Following, an in-depht account of our investigation and how we ended up with the idea of a CAPI Bootstrap Provider for Virtual Kubelets.
We have 3 sections:
Environment setup
As a first step, we use the metal3-dev-env project provided by Metal3 to
setup our testing environment.
For testing, we consider two additional virtual machines, supplementing the one dedicated to the Kubernetes control plane and workers nodes.
These extra machines will be used to test the management of non-Kubernetes hosts.
Following the variables defined in the
config_<USER>.sh
:We follow the metal3-dev-env documentation and we deploy the target cluster.
The objects created are reported here. (expand for the yaml definition)
Following a visual representation of the objects created so far.
The CAPI
Cluster
refers aKubeadmControlPlane
object (not defined yet), and a provisioning infrastructure for the cluster (Metal3Cluster
).Note: To simplify our explanation for the rest of the post we do not consider resources like
IPPools
,Metal3DataTemplate
)As a next step we install the Kubernetes control plane.
The objects created are reported here. (expand for the yaml definition)
Note: We omit the ssh key and use the tag
<KEY>
insteadHere a visual representation:
The KubeadmControlPlane:
Each CAPI Machine is associated with:
The KubeadmConfig is generated by the Cluster API Bootstrap Provider Kubeadm (CABPK), which creates a K8s Secret containing a cloud-config or ignition file to customize BareMetalHosts during boot.
By design, each KubeadmConfig provide a reference to the secret in
status.dataSecretName
For each Metal3Machine, CAPM3 selects a BareMetalHost for provisioning and sets the secret generated by the KubeadmConfig
as userData.
As a final step we deploy the workers for the target cluster:
The workers are deployed via a MachineDeployment (expand for the yaml definition)
Following, a simplified representation of the relationship between the various objects created:
A MachineDeployment oversees a group of CAPI Machines that are part of a specific CAPI cluster. To tailor the installation process, it leverages two templates:
Notably, such approach is consistent with the method used by the KubeadmControlPlane to provision the target cluster control plane.
The Cluster API Bootstrap Provider Kubeadm (CABPK) generates a customized KubeadmConfig for each CAPI Machine, based on the provided KubeadmConfigTemplate. This process also creates a Secret that contains a cloud-config or ignition configuration.
The Secret is then assigned as userData to the corresponding BareMetalHost, enabling the host to be initialized with a tailored configuration.
The current state of the system is as follows (snapshot below):
It's interesting to see that at this stage we have many secrets tagged with the name of the provisioned cluster:
userData
while provisioning theBareMetalHosts
.<CLUSTER_NAME>-kubeconfig
that can be used to access the cluster.user@adcpu28:~/workspace/metal3-dev-env$ kubectl get secrets NAME TYPE DATA AGE secret/node-0-bmc-secret Opaque 2 4h45m secret/node-1-bmc-secret Opaque 2 4h45m secret/node-2-bmc-secret Opaque 2 4h45m secret/node-3-bmc-secret Opaque 2 4h45m secret/node-4-bmc-secret Opaque 2 4h45m secret/node-5-bmc-secret Opaque 2 4h45m secret/node-6-bmc-secret Opaque 2 4h45m secret/test1-6nvzx-bss44 cluster.x-k8s.io/secret 2 173m secret/test1-6nvzx-bss44-metadata infrastructure.cluster.x-k8s.io/secret 1 173m secret/test1-6nvzx-bss44-networdata infrastructure.cluster.x-k8s.io/secret 1 173m secret/test1-6nvzx-ns9fl cluster.x-k8s.io/secret 2 173m secret/test1-6nvzx-ns9fl-metadata infrastructure.cluster.x-k8s.io/secret 1 173m secret/test1-6nvzx-ns9fl-networdata infrastructure.cluster.x-k8s.io/secret 1 173m secret/test1-bhcf4 cluster.x-k8s.io/secret 2 3h57m secret/test1-bhcf4-metadata infrastructure.cluster.x-k8s.io/secret 1 3h57m secret/test1-bhcf4-networdata infrastructure.cluster.x-k8s.io/secret 1 3h57m secret/test1-ca cluster.x-k8s.io/secret 2 3h57m secret/test1-cw695 cluster.x-k8s.io/secret 2 3h46m secret/test1-cw695-metadata infrastructure.cluster.x-k8s.io/secret 1 3h46m secret/test1-cw695-networdata infrastructure.cluster.x-k8s.io/secret 1 3h46m secret/test1-djvs9 cluster.x-k8s.io/secret 2 3h51m secret/test1-djvs9-metadata infrastructure.cluster.x-k8s.io/secret 1 3h51m secret/test1-djvs9-networdata infrastructure.cluster.x-k8s.io/secret 1 3h51m secret/test1-etcd cluster.x-k8s.io/secret 2 3h57m secret/test1-kubeconfig cluster.x-k8s.io/secret 1 3h57m secret/test1-proxy cluster.x-k8s.io/secret 2 3h57m secret/test1-sa cluster.x-k8s.io/secret
Examining the target cluster, we observe that it comprises 5 nodes (3 for the control plane and 3 worker nodes), all of which are in a Ready state.
Investigating the use of ClusterAPIs for non-Kubernetes nodes
Based on the current state we can draw several conclusions:
Our ultimate goal is to provision a set of hosts with an identical configurations. While the MachineDeployment is a promising resource, using KubeadmConfigTemplates would result in hosts joining as worker nodes and allowing pod execution.
To avoid this, we explored an alternative solution using a simple bootstrap provider, dubbed MockBootstrap, which enables users to reference a secret containing the cloud-config or ignition file used for the provisioning of BareMetalHosts.
Note: MockBootstrap was created solely for the purpose of exploring ClusterAPI mechanisms and will be replaced by a solution leveraging virtual-kubelets. This section aims to demonstrate the necessity of virtual-kubelets.
MockBootstrap implements the bootstrap behaviour described by CAPI and has a straightforward implementation.
A bootstrap provider must create config resources that report a Ready status when the resource is prepared for use and specify a userDataSecret that will be injected as userData to the BareMetalHost during provisioning. MockBootstrap introduces two custom resources:
Examples of these resources are provided below.
We can now create a new MachineDeployment named
testmock1
that utilizes MockBootstrap by referencing a MockConfigTemplate.This MockConfigTemplate, in turn, references a secret that will be used as userData during the provisioning of the BareMetalHosts. Following a visual representation.
Expand to see the definition of the cloud-init used for testing
For testing purposes, the secret referenced as
dataSecretName
contains a simple cloud-config consisting of two main actions: it adds an SSH key to the host; and it creates a file that serves as a quick validation check to confirm that the initialisation process was successful.Expand for the full definition of the MachineDeployment and related objects.
After creating the new MachineDeployment, the system's state has been updated and we see the following:
We can look at the current conditions reported by the Metal3Machine and investigate why we do not have a ProviderID.
The controller responsible for managing the Metal3Machine is attempting to set a ProviderID to a Kubernetes Node
in the target cluster, but is unable to find it. This is expected, as the provisioned host is not a Kubernetes node.
Expand to see the validation of the provisioned BareMetalHosts
To confirm that the host was provisioned correctly, we test accessing the host.
We successfully connect to the host using the SSH key specified in the testuserdata secret.
Additionally, we verify that the file FOO.txt, as described in the cloud-config, has been created on the host.
This confirms that the provisioning process was successful.
Also looking at the current
spec
section in the BareMetalHostnode-1
that is one of the two hosts selected for the MachineDeploymenttest1mock
we see the same image set byMetal3MachineTemplate
and thetestuserdata
secret set in theMockConfig
As a double check, we examine the current spec section of the BareMetalHost node-1, one of the two hosts selected for the MachineDeployment test1mock, and we observe that:
This confirms that the configuration defined in the Metal3MachineTemplate and MockConfig has been correctly propagated to the BareMetalHost.
Considerations: While we have successfully provisioned a group of hosts using a MachineDeployment, there are some important caveats to note:
Running
state because the related controller never find the respective Node object in the target cluster.Enabling Cluster-APIs management for non-Kubernetes hosts using Virtual Kubelets
To utilise CAPI resources for provisioning a group of hosts, we need to register the hosts as Nodes without allowing any Pod execution.
To achieve this, we investigate the use of virtual-kubelets.
A virtual-kubelet is an open-source implementation of the Kubernetes kubelet that enables us to masquerade platforms, hosts, or components as a kubelet.
The project provides a pluggable interface for developers to implement custom providers (not to be confused with CAPI providers) that define how kubelet actions should be handled.
The project includes a mock provider that joins the cluster as a node with the role "agent" and accepts Pod definitions, but does not run any containers. (A more suitable implementation would return an error at the creation of a Pod but for the purpose of evaluating such method the mock provider works fine.
To leverage virtual-kubelets with CAPI, we prototyped a new bootstrap provider called ClusterAPI Bootstrap Provider Virtual Kubelets (CABPV).
This provider:
CABPV being a bootstrap provider, would introduce a template (VirtualKubeletConfigTemplate) and a config object (VirtualKubeletConfig). These include:
spec.template.spec.virtual-kubelet.url
: allows dynamic installation of the virtual-kubelet during provisioning if it's not already available in the imagespec.template.spec.virtual-kubelet.provider
: for use-cases where users develop their own virtual-kubelet implementing custom operations (more on this in the final considerations)Here an example:
The manifest used to deploy a MachineDeployment (named
test1virtkube
) leveraging CABPV is similar to the previous one,but in this case it references a VirtualKubeletConfigTemplate instead of a MockConfigTemplate.
Expand to inspect the manifest used to test CABPV.
Below the state of the system after creating the new MachineDeployment.
Upon inspecting the cluster, we observe that 2 new Node objects have been created, labeled with the role
kubernetes.io/role: agent
.These Node objects were dynamically created when the virtual-kubelet successfully joined the cluster.
Expand to see the state of a Node registered via the virtual-kubelet.
Following a visual representation of the objects created.
Compared to the MachineDeployment, now we reference a
VirtualKubeletConfigTemplate
astemplate.bootstrap.configRef
and on each host provisioned there is a virtual-kubelet that registers the host as a Node in the target cluster.
Expand to see more information about the userData secret generated by the bootstrap provider
We can look at one of the secrets used as
userData
during the provisioning of a BareMetalHost.We truncate long files for ease of reading.
The bootstrap provider:
"metal3.io/uuid":"{{ ds.meta_data.uuid }}"
enabling the CAPM3 to tag the Kubernetes node object correctlyKey Considerations
Considerations:
What to investigate next? Our primary objective was to provision BareMetalHosts, which led us to focus on Metal3. However, CABPK is used by multiple providers and this raises an interesting question: Can we leverage this method to provision hosts across various providers?
Beta Was this translation helpful? Give feedback.
All reactions