-
Notifications
You must be signed in to change notification settings - Fork 117
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Draft proposal to explore how we can enable the CAPM3 IPAM flow in a DHCP-less environment
- Loading branch information
Showing
1 changed file
with
397 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,397 @@ | ||
<!-- | ||
This work is licensed under a Creative Commons Attribution 3.0 | ||
Unported License. | ||
http://creativecommons.org/licenses/by/3.0/legalcode | ||
--> | ||
|
||
# DHCP-less network config templating | ||
|
||
Discuss options and outline a proposal to enable DHCP-less network config templating, | ||
leveraging existing CAPM3 IPAM support. | ||
|
||
## Status | ||
|
||
implementable | ||
|
||
## Summary | ||
|
||
Metal<sup>3</sup> provides an IPAM controller which can be used to enable | ||
deployment with static-IPs instead of DHCP, however currently it is not | ||
possible to use this functionality in a fully DHCP-less environment because | ||
it does not support network configuration in the pre-provisioning phase. | ||
|
||
This proposal outlines some additional network config templating to enable | ||
use of the existing IPAM solution for the pre-provisioning phase, using | ||
a similar approach to the existing templating of `networkData` | ||
|
||
## Motivation | ||
|
||
Infrastructure management via Metal<sup>3</sup> in DHCP-less environments | ||
is common, but today our upstream features only partially solve for this use-case. | ||
|
||
Since there are several groups in the community who require this functionality, | ||
it makes sense to collaborate and ensure we can support this use-case. | ||
|
||
### Goals | ||
|
||
Enable e2e integration of the CAPM3 IPAM components such that it's possible | ||
to deploy in a DHCP-less environment using static network confgiguration | ||
managed via Metal<sup>3</sup> resources. | ||
|
||
### Non-Goals | ||
|
||
Existing methods used to configure networking via downstream customizations (such | ||
as a custom PreprovisioningImageController) are valid and will still sometimes | ||
be required, this doesn't aim to replace such methods - the approach here may be | ||
complimentary for those users wishing to combine CAPM3 IPAM features with | ||
a PreprovisioningImageController. | ||
|
||
This proposal will focus on the Metal<sup>3</sup> components only - there are | ||
also OS dependencies and potential related areas of work in Ironic, these will | ||
be mentioned in the Dependencies section but not covered in detail here. | ||
|
||
This proposal will only consider the Metal<sup>3</sup> IPAM controller - | ||
there are other options but none are currently integrated via CAPM3. | ||
|
||
## Proposal | ||
|
||
Implement a new CAPM3 controller to handle setting the BareMetalHost `preProvisioningNetworkDataName` | ||
in an automated way via existing Metal<sup>3</sup> IPAM resources. | ||
|
||
This will be achieved via an approach similar to the existing templating of `networkData` | ||
but adjusted to account for the lack of any `Machine` at the pre-provisioning step | ||
of the deployment flow. | ||
|
||
### User Stories | ||
|
||
#### Static network configuration (no IPAM) | ||
|
||
As a user I want to manage my networkConfiguration statically as part of my | ||
BareMetalHost inventory. | ||
|
||
In this case the network configuration is provided via a Secret which is | ||
either manually created or templated outside the scope of Metal<sup>3</sup> | ||
|
||
The BareMetalHost API already supports two interfaces for passing network configuration: | ||
|
||
* `networkData` - this data is passed to the deployed OS via Ironic via a | ||
configuration drive partition. It is then typically read on firstboot by | ||
a tool such as `cloud-init` which supports the OpenStack network data format. | ||
* `preprovisioningNetworkDataName` - this data is designed to allow passing data | ||
during the preprovisioning phase, e.g to configure networking for the IPA deploy | ||
ramdisk. | ||
|
||
The `preprovisioningNetworkDataName` API was added initially to enable [image | ||
building workflows](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#preprovisioningimage), and a [recent BMO change](https://github.com/metal3-io/baremetal-operator/pull/1380) landed to enable this flow without any custom PreprovisioningImage controller. | ||
|
||
#### IPAM configuration | ||
|
||
As a user I wish to make use use of the Metal<sup>3</sup> IPAM solution, in a | ||
DHCP-less environment. | ||
|
||
Metal<sup>3</sup> provides an [IPAM controller](https://github.com/metal3-io/ip-address-manager) | ||
which can be used to allocate IPs used as part of the Metal3Machine lifecycle. | ||
|
||
Some gaps exist which prevent realizing this flow in a fully DHCP-less environment, | ||
so the main focus of the proposal will be how to solve for this use-case. | ||
|
||
##### IPAM Scenario 1 - common IPPool | ||
|
||
An environment where a common configuration is desired for the pre-provisionining | ||
phase and the provisioned BareMetalHost (e.g scenario where hosts are permanentaly | ||
assigned to specific clusters) | ||
|
||
##### IPAM Scenario 2 - decoupled preprovisioning/provisioning IPPool | ||
|
||
An environment where a decoupled configuration is desired for the pre-provisionining | ||
phase and the provisioned BareMetalHost (e.g BMaaS scenario where end-user network configuration | ||
differs from the commissioning phase where a different configuration is desired for inspection/cleaning) | ||
|
||
## Design Details | ||
|
||
`Metal3MachineTemplate` and `Metal3DataTemplate` are used to apply networkData to specific BareMetalHost resources, | ||
but they are by design coupled to the CAPI Machine lifecycle. | ||
|
||
This is a problem for the pre-provisioning use-case since at this point we're preparing the BareMetalHost for | ||
use, there is not yet any Machine. | ||
|
||
To resolve this below we outline a proposal to add two new resources with similar behavior for the pre-provisioning | ||
phase `Metal3PreProvisioningTemplate` and `Metal3PreProvisioningDataTemplate` | ||
|
||
### API overview | ||
|
||
The current flow in the provisioning phase is as follows (only the most relevant fields are included for clarity): | ||
|
||
```yaml | ||
apiVersion: ipam.metal3.io/v1alpha1 | ||
kind: IPPool | ||
metadata: | ||
name: pool-1 | ||
spec: | ||
clusterName: cluster | ||
|
||
--- | ||
|
||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3DataTemplate | ||
metadata: | ||
name: data-template | ||
spec: | ||
clusterName: cluster | ||
networkData: | ||
networks: | ||
ipv4: | ||
- id: eth0 | ||
ipAddressFromIPPool: pool-1 | ||
|
||
--- | ||
|
||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3MachineTemplate | ||
metadata: | ||
name: machine-template | ||
spec: | ||
template: | ||
spec: | ||
dataTemplate: | ||
name: data-template | ||
hostSelector: | ||
matchLabels: | ||
cluster-role: control-plane | ||
|
||
--- | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineDeployment | ||
metadata: | ||
name: machine-deployment | ||
spec: | ||
clusterName: cluster | ||
replicas: 1 | ||
template: | ||
spec: | ||
clusterName: cluster | ||
infrastructureRef: | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3MachineTemplate | ||
name: machine-template | ||
``` | ||
In this flow when a Metal3Machine is provisioned via the `MachineDeployment`, BareMetalHost resources labeled | ||
`cluster-role: control-plane` will have `networkData` defined with an IP derived from the `pool-1` `IPPool`. | ||
|
||
In CAPM3 an IPClaim is created to reserve and IP from the IPPool for each Machine, and an IPAddress resource | ||
contains the data used for templating of the `networkData` | ||
|
||
#### Preprovisioning - Common IPPool | ||
|
||
```yaml | ||
apiVersion: ipam.metal3.io/v1alpha1 | ||
kind: IPPool | ||
metadata: | ||
name: pool-1 | ||
spec: | ||
clusterName: cluster | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3PreprovisioningDataTemplate | ||
metadata: | ||
name: preprov-data-template | ||
spec: | ||
preprovisioningNetworkData: | ||
networks: | ||
ipv4: | ||
- id: eth0 | ||
ipAddressFromIPPool: pool-1 | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3PreprovisioningTemplate | ||
metadata: | ||
name: preprov-template | ||
spec: | ||
template: | ||
spec: | ||
dataTemplate: | ||
name: preprov-data-template | ||
hostSelector: | ||
matchLabels: | ||
pre-provisioning: foo | ||
``` | ||
|
||
In this flow there is no `MachineDeployment`, BareMetalHost resources labeled to match the | ||
preprov-template hostSelector will have preprovisioningNetworkDataName assigned using the same process | ||
outlined above for `networkData` above. | ||
|
||
There are a few things to consider: | ||
|
||
To avoid the risk of multiple Metal3PreprovisioningTemplate resources matching BareMetalHost resources (which would be ambiguous) | ||
a BMH must match *exactly one* Metal3PreprovisioningTemplate for the conroller to take action, if more than one matches it will be | ||
reflected as ignored via the Metal3PreprovisioningTemplate status. | ||
|
||
The preprovisioningNetworkDataName is used by default for networkData in the baremetal-operator, so in this configuration it's not | ||
strictly necessary to specify networkData via Metal3DataTemplate, however we'll want to delete the IPClaim after preprovisioning | ||
in the decoupled flow below so it seems likely we'll want to behave consistently and rely on the IP Reuse functionality if a | ||
consistent IP is required between pre-provisioning and provisioning phases. | ||
|
||
#### Preprovisioning Decoupled IPPool | ||
|
||
```yaml | ||
apiVersion: ipam.metal3.io/v1alpha1 | ||
kind: IPPool | ||
metadata: | ||
name: pool-1 | ||
spec: | ||
clusterName: cluster | ||
--- | ||
apiVersion: ipam.metal3.io/v1alpha1 | ||
kind: IPPool | ||
metadata: | ||
name: preprovisioning-pool | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3PreprovisioningDataTemplate | ||
metadata: | ||
name: preprov-data-template | ||
spec: | ||
preprovisioningNetworkData: | ||
networks: | ||
ipv4: | ||
- id: eth0 | ||
ipAddressFromIPPool: preprovisioning-pool | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3PreprovisioningTemplate | ||
metadata: | ||
name: preprov-template | ||
spec: | ||
template: | ||
spec: | ||
dataTemplate: | ||
name: preprov-data-template | ||
hostSelector: | ||
matchLabels: | ||
pre-provisioning: foo | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3DataTemplate | ||
metadata: | ||
name: data-template | ||
spec: | ||
clusterName: cluster | ||
networkData: | ||
networks: | ||
ipv4: | ||
- id: eth0 | ||
ipAddressFromIPPool: pool-1 | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: Metal3MachineTemplate | ||
metadata: | ||
name: machine-template | ||
spec: | ||
template: | ||
spec: | ||
dataTemplate: | ||
name: data-template | ||
hostSelector: | ||
matchLabels: | ||
cluster-role: control-plane | ||
``` | ||
|
||
In this flow we have `preprovisioning-pool` which is not associated with any cluster, this is used to provide an IPAddress during | ||
the pre-provisioning phase as described above. To reduce the required size of the pool, the IPClaim will be deleted after the | ||
preprovisioning phase is completed, e.g the BMH resource becomes available. | ||
|
||
In the provisioning phase another pool, associated with a cluster is used to template networkData as in the existing process. | ||
|
||
#### Assumptions and Open Questions | ||
|
||
TODO | ||
|
||
### Inspection on initial registration | ||
|
||
On initial registration of a host, inspection is triggered immediately but this process cannot complete without preprovisioning network configuration in a DHCP-less environment (because the IPA ramdisk can't connect back to the Ironic API). | ||
|
||
To resolve this we can add a new BareMetalHost API `PreprovisioningNetworkDataRequired` which defaults to false, but when set to true will describe that the host cannot move from Registering -> Inspecting until `preprovisioningNetworkDataName` has been set. | ||
|
||
An alternative could be to require that the BareMetalHost resources are created with the existing [paused annotation](https://github.com/metal3-io/baremetal-operator/blob/main/docs/api.md#pausing-reconciliation), set to a pre-determined value (e.g `metal3.io/preprovisioning`) which can then be removed by the new controller after `preprovisioningNetworkDataName` has been set, then inspection will be able to succeed. | ||
|
||
### Implementation Details/Notes/Constraints | ||
|
||
#### IP Reuse | ||
|
||
A related issue has been previously addressed via the [IP Reuse](https://github.com/metal3-io/cluster-api-provider-metal3/blob/main/docs/ip_reuse.md) functionality - this means we can couple IPClaims to the BareMetalHost resources which will enable consistent IP allocations for pre-provisioning and subsequent provisioning operations (provided the same IPPool is used for both steps) | ||
|
||
### Risks and Mitigations | ||
|
||
- TODO | ||
|
||
### Work Items | ||
|
||
TODO | ||
|
||
### Dependencies | ||
|
||
#### Firstboot agent support | ||
|
||
An agent in the IPA ramdisk image is required to consume the network data provided via the processes outlined above. | ||
|
||
The Ironic DHCP-less documentation describes using glean (a minimal python-based cloud-init alternative), but we don't | ||
currently have any community-supported IPA ramdisk image containing this tool. | ||
|
||
There are several other options such as cloud-init, or even custom scripts/tooling which may be coupled to the OS, so we | ||
do not define a specific solution as part of this proposal. | ||
|
||
#### Potential config-drive conflict on redeployment | ||
|
||
|
||
### Test Plan | ||
|
||
TODO | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
TODO | ||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Drawbacks | ||
|
||
TODO | ||
|
||
## Alternatives | ||
|
||
|
||
### Kanod | ||
|
||
One possibility is to manage the lifecycle of `preprovisioningNetworkDataName` outside of | ||
the Metal<sup>3</sup> core components - such an approach has been successfully demonstrated | ||
in the [Kanod community](https://gitlab.com/Orange-OpenSource/kanod/) which is related to | ||
the [Sylva](https://sylvaproject.org) project. | ||
|
||
The design proposal here has been directly inspired by this work, but I think directly integrating | ||
this functionality into CAPM3 has the following advantages: | ||
|
||
* We can close a functional gap which potentially impacts many Metal<sup>3</sup> users, not only those involved with Kanod/Sylva | ||
* Directly integrating into CAPM3 means we can use a common approach for `networkData` and `preprovisioningNetworkData` | ||
|
||
## References | ||
|
||
TODO |