From 8b96f8910f4e3fe40a86a029108f3121f51ee173 Mon Sep 17 00:00:00 2001 From: Stefan Prodan Date: Tue, 24 Sep 2024 20:42:16 +0300 Subject: [PATCH] Add ResourceGroup API docs Signed-off-by: Stefan Prodan --- docs/api/v1/resourcegroup.md | 440 +++++++++++++++++++++++++++++++++++ 1 file changed, 440 insertions(+) create mode 100644 docs/api/v1/resourcegroup.md diff --git a/docs/api/v1/resourcegroup.md b/docs/api/v1/resourcegroup.md new file mode 100644 index 0000000..585e779 --- /dev/null +++ b/docs/api/v1/resourcegroup.md @@ -0,0 +1,440 @@ +# Resource Group CRD + +**ResourceGroup** is a declarative API for generating a group of Kubernetes objects +based on a matrix of input values and a set of templated resources. + +The ResourceGroup API offers a high-level abstraction for defining and managing +Flux resources and related Kubernetes objects as a single unit. It is designed +to reduce the complexity of Kustomize overlays by providing a compact way +of defining different configurations for a set of workloads per tenant and/or environment. + +Use cases: + +- Application definition: Bundle a set of Kubernetes resources (Flux HelmRelease, OCIRepository, Alert, Provider, Receiver, ImagePolicy) into a single deployable unit. +- Multi-instance provisioning: Generate multiple instances of the same application with different configurations. +- Multi-cluster provisioning: Generate multiple instances of the same application for each target cluster that are deployed by Flux from a management cluster. +- Multi-tenancy provisioning: Generate a set of resources (Namespace, ServiceAccount, RoleBinding) for each tenant with specific roles and permissions. + +## Example + +The following example shows a ResourceGroup that generates an application instance consisting of a +Flux HelmRelease and OCIRepository for each tenant with a specific version and replica count. + +```yaml +apiVersion: fluxcd.controlplane.io/v1 +kind: ResourceGroup +metadata: + name: podinfo + namespace: default + annotations: + fluxcd.controlplane.io/reconcile: "enabled" + fluxcd.controlplane.io/reconcileEvery: "30m" + fluxcd.controlplane.io/reconcileTimeout: "5m" +spec: + commonMetadata: + labels: + app.kubernetes.io/name: podinfo + inputs: + - tenant: "team1" + version: "6.7.x" + replicas: "2" + - tenant: "team2" + version: "6.6.x" + replicas: "3" + resources: + - apiVersion: source.toolkit.fluxcd.io/v1beta2 + kind: OCIRepository + metadata: + name: podinfo-<< inputs.tenant >> + namespace: default + spec: + interval: 10m + url: oci://ghcr.io/stefanprodan/charts/podinfo + ref: + semver: << inputs.version | quote >> + - apiVersion: helm.toolkit.fluxcd.io/v2 + kind: HelmRelease + metadata: + name: podinfo-<< inputs.tenant >> + namespace: default + spec: + interval: 1h + releaseName: podinfo-<< inputs.tenant >> + chartRef: + kind: OCIRepository + name: podinfo-<< inputs.tenant >> + values: + replicaCount: << inputs.replicas | int >> +``` + +You can run this example by saving the manifest into `podinfo.yaml`. + +1. Apply the ResourceGroup on the cluster: + + ```shell + kubectl apply -f podinfo.yaml + ``` + +2. Wait for the ResourceGroup to reconcile the generated resources: + + ```shell + kubectl wait resourcegroup/podinfo --for=condition=ready --timeout=5m + ``` + +3. Run `kubectl get resourcegroup` to see the status of the resource: + + ```console + $ kubectl get resourcegroup + NAME AGE READY STATUS + podinfo 59s True Reconciliation finished in 52s + ``` + +4. Run `kubectl describe resourcegroup` to see the reconciliation status conditions and events: + + ```console + $ kubectl describe resourcegroup podinfo + Status: + Conditions: + Last Transition Time: 2024-09-24T09:58:53Z + Message: Reconciliation finished in 52s + Observed Generation: 1 + Reason: ReconciliationSucceeded + Status: True + Type: Ready + Events: + Type Reason Age From Message + ---- ------ ---- ---- ------- + Normal ApplySucceeded 72s flux-operator HelmRelease/default/podinfo-team1 created + HelmRelease/default/podinfo-team2 created + OCIRepository/default/podinfo-team1 created + OCIRepository/default/podinfo-team2 created + Normal ReconciliationSucceeded 72s flux-operator Reconciliation finished in 52s + ``` + +5. Run `kubectl events` to see the events generated by the flux-operator: + + ```shell + kubectl events --for resourcegroup/podinfo + ``` + +6. Run `kubectl delete` to remove the ResourceGroup and its generated resources: + + ```shell + kubectl delete resourcegroup podinfo + ``` + +## Writing a ResourceGroup spec + +As with all other Kubernetes config, a ResourceGroup needs `apiVersion`, +`kind`, and `metadata` fields. The name of a ResourceGroup object must be a +valid [DNS subdomain name](https://kubernetes.io/docs/concepts/overview/working-with-objects/names#dns-subdomain-names). + +A ResourceGroup also needs a +[`.spec` section](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status). + +### Inputs configuration + +The `.spec.inputs` field is optional and specifies a list of input values +to be used in the resources templates. + +Example inputs: + +```yaml +spec: + inputs: + - tenant: team1 + version: "6.7.x" + replicas: "2" + - tenant: team2 + version: "6.6.x" + replicas: "3" +``` + +An input value is a key-value pair of strings, where the key is the input name +which can be referenced in the resource templates using the `<< inputs.name >>` syntax. + +### Resources configuration + +The `.spec.resources` field is optional and specifies the list of Kubernetes resource +to be generated and reconciled on the cluster. + +Example of plain resources without any templating: + +```yaml +spec: + resources: + - apiVersion: v1 + kind: Namespace + metadata: + name: apps + - apiVersion: v1 + kind: ServiceAccount + metadata: + name: flux + namespace: apps +``` + +#### Templating resources + +The resources can be templated using the `<< inputs.name >>` syntax. The templating engine +is based on Go text template. The `<< >>` delimiters are used instead of `{{ }}` to avoid +conflicts with Helm templating and allow ResourceGroups to be included in Helm charts. + +Example of templated resources: + +```yaml +spec: + inputs: + - tenant: team1 + role: admin + - tenant: team2 + role: cluster-admin + resources: + - apiVersion: v1 + kind: Namespace + metadata: + name: << inputs.tenant >> + - apiVersion: v1 + kind: ServiceAccount + metadata: + name: flux + namespace: << inputs.tenant >> + - apiVersion: rbac.authorization.k8s.io/v1 + kind: RoleBinding + metadata: + name: flux + namespace: << inputs.tenant >> + subjects: + - kind: ServiceAccount + name: flux + namespace: << inputs.tenant >> + roleRef: + kind: ClusterRole + name: << inputs.role >> + apiGroup: rbac.authorization.k8s.io +``` + +The above example will generate a `Namespace`, `ServiceAccount` and `RoleBinding` for each tenant +with the specified role. + +#### Templating functions + +The templating engine supports [slim-sprig](https://go-task.github.io/slim-sprig/) functions. + +It is recommended to use the `quote` function when templating strings to avoid issues with +special characters e.g. `<< inputs.version | quote >>`. + +When templating integers, use the `int` function to convert the string to an integer +e.g. `<< inputs.replicas | int >>`. + +When templating booleans, use the `bool` function to convert the string to a boolean +e.g. `<< inputs.enabled | bool >>`. + +When using integer or boolean inputs as metadata label values, use the `quote` function to convert +the value to a string e.g. `<< inputs.enabled | quote >>`. + +When using multi-line strings containing YAML, use the `nindent` function to properly format the string +e.g.: + +```yaml +spec: + inputs: + - tenant: team1 + layerSelector: | + mediaType: "application/vnd.cncf.helm.chart.content.v1.tar+gzip" + operation: copy + resources: + - apiVersion: source.toolkit.fluxcd.io/v1beta2 + kind: OCIRepository + metadata: + name: << inputs.tenant >> + spec: + layerSelector: << inputs.layerSelector | nindent 4 >> +``` + +#### Resource deduplication + +The flux-operator deduplicates resources based on the +`apiVersion`, `kind`, `metadata.name` and `metadata.namespace` fields. + +This allows defining shared resources that are applied only once, regardless of the number of inputs. + +Example of a shared Flux source: + +```yaml +spec: + inputs: + - tenant: "team1" + replicas: "2" + - tenant: "team2" + replicas: "3" + resources: + - apiVersion: source.toolkit.fluxcd.io/v1beta2 + kind: OCIRepository + metadata: + name: podinfo + namespace: default + spec: + interval: 10m + url: oci://ghcr.io/stefanprodan/charts/podinfo + ref: + semver: '*' + - apiVersion: helm.toolkit.fluxcd.io/v2 + kind: HelmRelease + metadata: + name: podinfo-<< inputs.tenant >> + namespace: default + spec: + interval: 1h + releaseName: podinfo-<< inputs.tenant >> + chartRef: + kind: OCIRepository + name: podinfo + values: + replicaCount: << inputs.replicas | int >> +``` + +In the above example, the `OCIRepository` resource is created only once +and referred by all `HelmRelease` resources. + +### Common metadata + +The `.spec.commonMetadata` field is optional and specifies common metadata to be applied to all resources. + +It has two optional fields: + +- `labels`: A map used for setting [labels](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/) + on an object. Any existing label will be overridden if it matches with a key in + this map. +- `annotations`: A map used for setting [annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) + on an object. Any existing annotation will be overridden if it matches with a key + in this map. + +Example common metadata: + +```yaml +spec: + commonMetadata: + labels: + app.kubernetes.io/name: podinfo + annotations: + fluxcd.controlplane.io/prune: disabled +``` + +In the above example, all resources generated by the ResourceGroup +will not be pruned by the garbage collection process as the `fluxcd.controlplane.io/prune` +annotation is set to `disabled`. + +### Reconciliation configuration + +The reconciliation of behaviour of a ResourceGroup can be configured using the following annotations: + +- `fluxcd.controlplane.io/reconcile`: Enable or disable the reconciliation loop. Default is `enabled`, set to `disabled` to pause the reconciliation. +- `fluxcd.controlplane.io/reconcileEvery`: Set the reconciliation interval used for drift detection and correction. Default is `1h`. +- `fluxcd.controlplane.io/reconcileTimeout`: Set the reconciliation timeout including health checks. Default is `5m`. + +### Health check configuration + +The `.spec.wait` field is optional and instructs the flux-operator to perform +a health check on all applied resources and waits for them to become ready. The health +check is enabled by default and can be disabled by setting the `.spec.wait` field to `false`. + +The health check is performed for the following resources types: + +- Kubernetes built-in kinds: Deployment, DaemonSet, StatefulSet, + PersistentVolumeClaim, Service, Ingress, CustomResourceDefinition. +- Flux kinds: HelmRelease, OCIRepository, Kustomization, GitRepository, etc. +- Custom resources that are compatible with [kstatus](https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus). + +By default, the wait timeout is `5m` and can be changed with the +`fluxcd.controlplane.io/reconcileTimeout` annotation, set on the ResourceGroup object. + +## ResourceGroup Status + +### Conditions + +A ResourceGroup enters various states during its lifecycle, reflected as Kubernetes Conditions. +It can be [reconciling](#reconciling-fluxinstance) while applying the +resources on the cluster, it can be [ready](#ready-fluxinstance), or it can [fail during +reconciliation](#failed-fluxinstance). + +The ResourceGroup API is compatible with the **kstatus** specification, +and reports `Reconciling` and `Stalled` conditions where applicable to +provide better (timeout) support to solutions polling the ResourceGroup to +become `Ready`. + +#### Reconciling ResourceGroup + +The flux-operator marks a ResourceGroup as _reconciling_ when it starts +the reconciliation of the same. The Condition added to the ResourceGroup's +`.status.conditions` has the following attributes: + +- `type: Reconciling` +- `status: "True"` +- `reason: Progressing` | `reason: ProgressingWithRetry` + +The Condition `message` is updated during the course of the reconciliation to +report the action being performed at any particular moment such as +building manifests, detecting drift, etc. + +The `Ready` Condition's `status` is also marked as `Unknown`. + +#### Ready ResourceGroup + +The flux-operator marks a ResourceGroup as _ready_ when the resources were +built and applied on the cluster and all health checks are observed to be passing. + +When the ResourceGroup is "ready", the flux-operator sets a Condition with the +following attributes in the ResourceGroup’s `.status.conditions`: + +- `type: Ready` +- `status: "True"` +- `reason: ReconciliationSucceeded` + +#### Failed ResourceGroup + +The flux-operator may get stuck trying to reconcile and apply a +ResourceGroup without completing. This can occur due to some of the following factors: + +- The templating of the resources fails. +- The resources are invalid and cannot be applied. +- Garbage collection fails. +- Running health checks fails. + +When this happens, the flux-operator sets the `Ready` Condition status to False +and adds a Condition with the following attributes to the ResourceGroup’s +`.status.conditions`: + +- `type: Ready` +- `status: "False"` +- `reason: BuildFailed | HealthCheckFailed | ReconciliationFailed` + +The `message` field of the Condition will contain more information about why +the reconciliation failed. + +While the ResourceGroup has one or more of these Conditions, the flux-operator +will continue to attempt a reconciliation with an +exponential backoff, until it succeeds and the ResourceGroup is marked as [ready](#ready-fluxinstance). + +### Inventory status + +In order to perform operations such as drift detection, garbage collection, upgrades, etc., +the flux-operator needs to keep track of all Kubernetes objects that are +reconciled as part of a ResourceGroup. To do this, it maintains an inventory +containing the list of Kubernetes resource object references that have been +successfully applied and records it in `.status.inventory`. The inventory +records are in the format `Id: ___, V: `. + +Example: + +```text +Status: + Inventory: + Entries: + Id: default_podinfo__ServiceAccount + V: v1 + Id: default_podinfo__Service + V: v1 + Id: default_podinfo_apps_Deployment + V: v1 +```