diff --git a/enhancements/machine-api/control-plane.md b/enhancements/machine-api/control-plane.md new file mode 100644 index 0000000000..3d7dda9e34 --- /dev/null +++ b/enhancements/machine-api/control-plane.md @@ -0,0 +1,315 @@ +--- +title: Managing Control Plane machines +authors: + - enxebre +reviewers: + - hexfusion + - jeremyeder + - abhinavdahiya + - joelspeed + - smarterclayton + - derekwaynecarr +approvers: + - hexfusion + - jeremyeder + - abhinavdahiya + - joelspeed + - smarterclayton + - derekwaynecarr + +creation-date: 2020-04-02 +last-updated: yyyy-mm-dd +status: provisional +see-also: + - https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20191017-kubeadm-based-control-plane.md + - https://github.com/openshift/enhancements/blob/master/enhancements/etcd/cluster-etcd-operator.md + - https://github.com/openshift/enhancements/blob/master/enhancements/etcd/disaster-recovery-with-ceo.md + - https://github.com/openshift/enhancements/blob/master/enhancements/kube-apiserver/auto-cert-recovery.md + - https://github.com/openshift/machine-config-operator/blob/master/docs/etcd-quorum-guard.md + - https://github.com/openshift/cluster-kube-scheduler-operator + - https://github.com/openshift/cluster-kube-controller-manager-operator + - https://github.com/openshift/cluster-openshift-controller-manager-operator + - https://github.com/openshift/cluster-kube-apiserver-operator + - https://github.com/openshift/cluster-openshift-apiserver-operator +replaces: +superseded-by: +--- + +# Managing Control Plane machines + +## Release Signoff Checklist + +- [ ] Enhancement is `provisional` +- [ ] Design details are appropriately documented from clear requirements +- [ ] Test plan is defined +- [ ] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Glossary + +Control Plane: The collection of stateless and stateful processes which enable a Kubernetes cluster to meet minimum operational requirements. This includes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and etcd. + +## Summary + +This proposal outlines a first step towards providing a single entity to fully manage all the aspects of the Control Plane compute: + - Ensures that Control Plane Machines are recreated on a deletion request at any time. + - Ensures that Control Plane Machines are auto repaired when a node goes unready (Machine Health Check). + +Particularly: +- This introduces a new CRD `ControlPlane` which provides a simple single semantic/entity that adopts master machines and backs them with well known controllers (machineSet and MHC): + - The ControlPlane controller creates and manages a MachineSet to back each Control Plane Machine that is found at any time. + - The ControlPlane controller creates and manages a Machine Health Check resource to monitor the Control Plane Machines. + +This proposal assumes that all etcd operational aspects are managed by the cluster-etcd-operator orthogonally in a safe manner while manipulating the compute resources. + +The contract between the etcd operations and the compute resources is given by the PDBs that blocks machine's deletion. +Depends on https://issues.redhat.com/browse/ETCD-74?jql=project%20%3D%20ETCD. + +## Motivation + +The Control Plane is the most critical and sensitive entity of a running cluster. Today OCP Control Plane instances are "pets" and therefore fragile. There are multiple scenarios where adjusting the compute capacity which is backing the Control Plane components might be desirable either for resizing or repairing. + +Currently there is nothing that automates or eases this task. The steps for the Control Plane to be resized in any manner or to recover from a tolerable failure (a etcd quorum is not lost but a single node goes unready) are completely manual. + +Different teams are following different "Standard Operating Procedure" documents scattered around with manual steps resulting in loss of information, confusion and extra efforts for engineers and users. + +### Goals + +- To have a declarative mechanism to ensure that existing Control Plane Machines are recreated on a deletion request at any time. +- To auto repair unhealthy Control Plane Nodes. + +### Non-Goals / Future work + +- To integrate with any existing etcd topology e.g external clusters. Stacked etcd with dynamic member identities, local storage and the Cluster etcd Operator are an assumed invariant. +- To manage individual Control Plane components. Self hosted Control Plane components that are self managed by their operators is an assumed invariant: + - Rolling OS upgrades and config changes at the software layer are managed by the Machine Config Operator. + - etcd Guard that ensures a PDB to honour quorum is managed by the Machine Config Operator. + - Each individual Control Plane component i.e Kube API Server, Scheduler and controller manager are self hosted and managed by their operators. + - The Cluster etcd Operator manages certificates, report healthiness and add etcd members as "Master" nodes join the cluster. + - The Kubelet is tied to the OS and therefore is managed by the Machine Config Operator. +- To integrate with any Control Plane components topology e.g Remote hosted pod based Control Plane. +- Automated disaster recovery of a cluster that has lost quorum. +- To manage OS upgrades. This is managed by the Machine Config Operator. +- To manage configuration changes at the software layer. This is managed by the Machine Config Operator. +- To manage the life cycle of Control Plane components +- To automate the provisioning and decommission of the bootstrapping instance managed by the installer. +- To provide autoscaling support for the Control Plane. + - This proposal is a necessary first step for enabling Control Plane autoscaling. + - It focuses on settling on the primitives to abstract away the Control Plane as a single entity. + - In a follow up RFE we will discuss how/when to auto scale the the controC Plane Machines based on relevant cluster metrics e.g number of workers, number of etcd objects, etc. +- To support fully automated rolling upgrades for the Control Plane compute resources. Same reason as above. + +## Proposal + +Currently the installer chooses the failure domains out of a particular provider availability and it creates a Control Plane Machine resource for each of them. This introduces a `ControlPlane` CRD and controller that will adopt and manage the lifecycle of those Machines. On new clusters the installer will instantiate a ControlPlane resource. + +This is a first step towards the longer term goal of providing a single entity to fully manage all aspects of the controlPlane. This iteration proposes: +- A simple single semantic/entity that adopts master machines and backs them with well known controllers (machineSet and MHC). +- To keep the user facing API surface intentionally narrowed. See [#api-changes](#api-changes) + +Although is out of the scope for the first implementation, to provide long term vision and aligment this sketchs how a possible second iteration could look like: +- Abstract the `failureDomain` semantic from providers to the core machine object. +- Introduce an `InfrastructureTemplate/providerSpec` reference and FailureDomains in the ControlPlane API. +- This would provide a single provider config to be reused and to be changed across any control plane machine. +- This would give the `ControlPlane` controller all the semantics it needs to fully automate vertical rolling upgrades across multiple failure domains while provider config changes would need to happen in one single place. + +The lifecycle of the compute resources still remains decoupled and orthogonal to the lifecycle and management of the Control Plane components hosted by the compute resources. All of these components, including etcd are expected to keep self managing themselves as the cluster shrink and expand the Control Plane compute resources. + +### User Stories [optional] + +#### Story 1 +- As an operator installer a new OCP cluster I want flexibility to run [large or small clusters](https://kubernetes.io/docs/setup/best-practices/cluster-large/#size-of-master-and-master-components) so I need the ability to vertically resize the control plane in a declarative, automated and seamless manner. + +This proposal satisfies this by providing a semi-automated process to vertically resize Control Plane Machines by enforcing recreation. + +#### Story 2 +- As an operator running an existing OCP cluster, I want to have a seamless path for my Control Plane Machines to be adopted and become self managed. + +This proposal enables this by providing the ControlPlane resource. + +#### Story 3 +- As an operator of an OCP Dedicated Managed Platform, I want to give users flexibility to add as many workers nodes as they want or to enable autoscaling on worker nodes so I need to have ability to resize the control plane instances in a declarative and seamless manner to react quickly to cluster growth. + +This proposal enables this by providing a semi-automated vertical resizing process as described in "Declarative Vertical scaling". + +#### Story 4 +- As a SRE, I want to have consumable API primitives in place to resize Control Plane compute resources so I can develop upper level automation tooling atop. E.g Automate Story 3 to support a severe growth peak of the number of worker nodes. + +#### Story 5 +- As an operator, I want faulty nodes to be remediated automatically. This includes having self healing Control Plane machines. + +#### Story 6 +- As a multi cluster operator, I want to have a universal user experience for managing the Control Plane in a declarative manner across any cloud provider, bare metal and any flavour of the product that have in common the topology assumed in this doc. + +### Implementation Details/Notes/Constraints [optional] + +To satisfy the goals, motivation and stories above, this proposes to let the installer to create a ControlPlane object to adopt and manage the lifecycle of the Control Plane Machines. + +The ControlPlane CRD will be exposed by the Machine API Operator (MAO) to the Cluster Version Operator (CVO). +The ControlPlane controller will be managed by the Machine API Operator. + +#### Bootstrapping +Currently during a regular IPI bootstrapping process the installer uses Terraform to create a bootstrapping instance and 3 master instances. Then it creates Machine resources to "adopt" the existing master instances. + +Additionally it will create a ControlPlane resource to manage the lifecycle of those Machines: + - The ControlPlane controller will create MachineSets to adopt those machines by looking up known labels (Adopting behaviour already exists in machineSet logic). + - `machine.openshift.io/cluster-api-machineset": -