From 44e6225776f8b1a4f3d35d26b2d43599a6c1c232 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Wed, 23 Feb 2022 17:44:24 +0100 Subject: [PATCH 01/34] Enhancement for enabling single-node-openshift with workers Reopening #996 since it got accidentally merged without sufficient reviews --- .../single-node-openshift-with-workers.md | 509 ++++++++++++++++++ 1 file changed, 509 insertions(+) create mode 100644 enhancements/single-node/single-node-openshift-with-workers.md diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md new file mode 100644 index 0000000000..607cfa6cbf --- /dev/null +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -0,0 +1,509 @@ +--- +title: single-node-openshift-with-workers +authors: + - "@omertuc" +reviewers: + - "@romfreiman" + - "@eranco74" + - "@tsorya" + - "@dhellmann" + - "@Miciah" + - "@bparees" + - "@JoelSpeed" + - "@staebler" + - "@derekwaynecarr" +approvers: + - TBD +api-approvers: # in case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers) + - TBD +creation-date: 2022-01-06 +last-updated: 2022-02-12 +tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement + - https://issues.redhat.com/browse/MGMT-8414 +see-also: + - "https://github.com/openshift/enhancements/tree/master/enhancements/single-node" + - "https://github.com/openshift/enhancements/blob/master/enhancements/installer/single-node-installation-bootstrap-in-place.md" + - "https://github.com/openshift/enhancements/blob/master/enhancements/external-control-plane-topology.md" +replaces: +superseded-by: +--- + +# Single Node OpenShift With Workers + +## Summary + +This enhancement aims to enable adding workers to a single control-plane node +cluster by attempting to tackle multiple issues that arise when worker nodes +are added to such clusters. + +## Motivation + +Today OCP supports deploying a cluster which only has a single control-plane +node and no additional workers. + +There's been recent demand for OCP to also formally support deployment of +single control-plane node clusters with additional (non control-plane) worker +nodes. + +This is already easily done on (unsupported) cloud IPI-deployed single +control-plane node clusters by increasing the replica count of one of the +worker machinesets (there are multiple machinesets provided by default on cloud +clusters, one for each AZ, and in cloud single control-plane node clusters +they're both simply scaled to 0 by default). Doing this results in a single +control-plane node cluster with additional workers and that cluster works as +expected without any known issues. + +Even on Assisted Installer installed single control-plane node clusters it's +trivial to add more workers by leveraging the Assisted Installer's day-2 worker +installation capabilities (after some minor DNS configuration issues which will +be improved by the Assisted Installer team, separately from this enhancement). + +OCP installations have an `infrastructures.config.openshift.io` CR +automatically deployed by the installer. This CR has two topology parameters in +its status called "Control Plane Topology" and "Infrastructure Topology" which, +as of today, may have one of three values: + +- SingleReplica +- HighlyAvailable +- External (this value is unrelated to this enhancement and is only mentioned here for completeness' sake) + +(See "see-also" enhancements links for more information about the topology +parameters and their possible values) + +The "Control Plane Topology" parameter is used by control-plane operators (such +as the Cluster etcd Operator) to determine how many replicas they should give +their various Deployments / StatefulSets. The value of this parameter in single +control-plane node clusters is always simply "SingleReplica". + +The "Infrastructure Topology" parameter is used by infrastructure operators +(such as the Cluster Ingress Operator or Cluster Monitoring Operator) to +determine how many replicas they should give their various Deployments / +StatefulSets. The value of this parameter is a function of how many workers +were present during installation. + +On "none"-platform single control-plane node clusters, when adding +workers, the resulting cluster has an issue with the behavior of the ingress +pod, the following paragraphs explain the background for this issue and the +issue itself. + +One of the benefits of installing single-node clusters is the simplicity of not +having to deal with load-balancing and virtual IPs, as these don't provide much +value when there's only a single node behind them. + +As a result, current common ways of installing single control-plane node +clusters today (mainly the Assisted Installer) avoid the usage of load +balancers or virtual IPs for API and ingress. There has also been some recent +effort to determine how single control-plane node cluster deployments on clouds +may be adjusted in order to reduce their costs, and one of the conclusions is +that getting rid of the load balancer installed by default by the IPI installer +results in major cost savings. + +A user installing a single control-plane node cluster on "none"-platform will +be tempted to simply point their DNS entries directly at the IP address of the +single node that they just installed. + +Similarly, in the Assisted Installer, the user is able to complete the +installation without needing to define any DNS entries. This is currently made +possible by injecting a `MachineConfig` manifest targeting the "master" pool +containing configuration for a dnsmasq server. The dnsmasq server is configured +with DNS entries for `api..`, `api-int..` and +`*.apps..` which all point to the single control-plane node's IP +address. This allows the installation process and the resulting cluster to +conveniently work without the user having to think about and configure DNS for +their cluster (of course external access to the cluster requires the user to +configure DNS, but this can be done separately after the installation has +completed). + +The issue with those approaches is that they assume that +`*.apps..` should always point at the single control-plane +node's IP address. This is of course correct when there's just that single node +in the cluster, but once you start adding worker nodes to the cluster it starts +causing a potential problem - the `router-default` deployment created by the +Cluster Ingress Operator, which is responsible for load balancing ingress +traffic, targets the "worker" pool using a node selector. As a result, under +some circumstances, that deployment's pod may eventually find itself running +also on the newly added worker nodes, as those nodes obviously also belong to +the worker pool (the reason the control-plane node was also in the worker pool +is that when during installation there are no worker nodes, the OpenShift +installer sets the Scheduler CR `.spec.mastersSchedulable` to `true` and as a +result the control-plane node is in both the "master" and "worker" pools). + +This pod floating between the nodes is a problem because user ingress traffic +has to be directed at the node currently holding the `router-default` pods, and +since the DNS entries have been "naively" pointed at the original control-plane +node's IP address (which may no longer be running those pods), ingress traffic +may no longer work. This can be temporarily solved if DNS is adjusted to point +at the correct node currently holding the pod or a load-balancer / some virtual +IP solution is put in place and then the DNS entry can be directed at that +load-balancer / virtual IP instead of at the node. This enhancement will try +to prevent this floating altogether so these workarounds will not be needed. + +Finally, performing bootstrap-in-place installation of a "none"-platform single +control-plane node cluster today using the OpenShift Installer does not provide +the user with a worker node ignition file. As a result, the user is not able to +manually add worker nodes to the resulting cluster. All current official OCP +documentation points the user to use "the worker ignition file that get +generated during installation", but that is not currently generated in such +bootstrap-in-place installation. + +### Goals + +- Deal with the "floating ingress" issue encountered in "none"-platform +single control-plane node clusters which have worker nodes added to them. + +- Define how the installer may be modified to generate the worker ignition +manifest, even when doing bootstrap-in-place "none"-platform single +control-plane node cluster installation + +### Non-Goals + +- Deal with the scalability of single control-plane node clusters that have +additional workers added to them. The absence of multiple control-plane nodes +means that the number of workers/pods that can be supported on such a topology is +even more limited than in a regular 3 control-plane node cluster. Relevant +documentation may have to point out that the pod limits (or other control-plane +related limits) that apply to a single control-plane node cluster with no +workers also apply globally across all additional added workers. If a large +amount of workers/pods is desired the user would have to re-install the cluster +as a regular multi control-plane node cluster. + +- Deal with clusters that have less nodes than they had during day-1 +installation. The rationale for this being a non-goal is that it seems that +removing the "initial" nodes of a cluster causes a lot of issues even in a +regular, multi-node installation. So dealing with similar issues in single +control-plane node clusters is also out of scope for this enhancement. For +example, installing a cluster with 3 control-plane nodes and 3 workers results +in unschedulable control-plane nodes. Scaling down the worker machine-sets from +3 to 0 post-installation does not magically make the control-plane nodes +schedulable, and as a result a lot of infrastructure workloads fail to schedule +and operators degrade. User intervention is required to fix this. + +- Deal with expansion of the single-node control-plane by adding more +control-plane nodes + +- Deal with expansion of clusters that have been installed before the +implementation of this enhancement (if possible, their expansion may be +addressed by an upgrade followed with documentation outlining the steps that +the user has to take to enable expansion, e.g. patch the default +`IngressController` to have node selector targetting the master pool rather +than the worker pool). + +- Adjust the baremetal platform to support single control-plane node +installations. The baremetal platform solves the "floating ingress" issue by +using virtual IP addresses/keepalived. + +- Deal with users who want their ingress traffic to not go through the single +control-plane node. + +- Deal with non-cloud, non-"none" platforms such as baremetal, vSphere, etc. + +## Proposal + +- Create a new `IngressPlacement` API parameter. + +- Make sure worker node ignition files are generated even in bootstrap-in-place +single control-plane node "none"-platform installation. + +### User Stories + +- As an OpenShift cluster administrator, I want to install a single control-plane +node cluster with additional worker nodes, so that my cluster will be able to +handle larger computation demands. + +- As an OpenShift cluster administrator, I want to add worker nodes to my +existing single control-plane node cluster, so that it'll be able to meet +growing computation demands. + +### API Extensions + +Introduce a new topology field in the Ingress config CR +(`config.openshift.io/v1/ingresses`) called `IngressPlacement`. + +### Implementation Details/Notes/Constraints + +This new field will have one of these values - `ControlPlane` or `Workers`. + +In addition, continue to allow the `.spec.replicas` and `.spec.nodePlacement` +parameters in `operator.openshift.io/v1/ingresscontrollers` CRs to be omitted, +but change the defaulting behavior for these fields. + +The value of the `IngressPlacement` field will affect the defaulting behavior +of the `IngressController`'s `.spec.replicas` and `.spec.nodePlacement` +parameters. In the absence of an `IngressController` resource created by the +user/installer, or when the user/installer creates an `IngressController` with +these two parameters omitted, the Cluster Ingress Operator will choose the +default values for those parameters based on the value of `IngressPlacement`. + +If the value of `IngressPlacement` itself is omitted by the installer, it +is defaulted to `Workers`. This would be useful to maintain the current behavior +if the API change is merged before the installer change to set this field, or if +we want this newer OCP version to be backwards-compatible with older installers. + +When the value of `IngressPlacement` is `Workers`, the defaulting behavior of +`.spec.replicas` and `.spec.nodePlacement` will be the same as it is today: +`.spec.replicas` will be chosen according to the value of +`InfrastructureTopology`, namely `1` when `SingleReplica` or `2` when +`HighlyAvailable`. `.spec.nodePlacement` will always just be: + +```yaml +nodePlacement: + nodeSelector: + matchLabels: + kubernetes.io/os: linux + node-role.kubernetes.io/worker: '' +``` + +However, if the value of `IngressPlacement` is `ControlPlane`, the defaulting +behavior will be different: `.spec.replicas` will be chosen instead according +to the value of `ControlPlaneTopology`; again, `1` when `SingleReplica` or `2` +when `HighlyAvailable`. `.spec.nodePlacement` will be always just be: + +```yaml +nodePlacement: + nodeSelector: + matchLabels: + kubernetes.io/os: linux + node-role.kubernetes.io/master: '' +``` + +(Note that the `kubernetes.io/os: linux` label is mentioned just because it's +the current behavior, it has no importance in this enhancement) + +The installer will detect situations in which it's unlikely the user will want +to set up a load-balancer. For now, those situations include installation of +single control-plane node cluster deployments on "on-prem" platforms such as +"none" or "vSphere" (although today single control-plane node clusters are only +possible on the "none" platform). In those situations, the installer will set +`IngressPlacement` to be `ControlPlane`. Since there's just a single +control-plane node, `ControlPlane` topology would be `SingleReplica` and the +combined effect would be that the `IngressController` will have just a single +replica and be pinned to the single control-plane node. This will then ensure +that the `router-default` deployment created by the Cluster Ingress Operator +will always run on the single control-plane node, and as a result any +`*.apps..` DNS entries which originally pointed at the single +control-plane node will remain correct even in the face of newly added worker +nodes. + +In any other situations, the installer will set `IngressPlacement` to +`Workers`, resulting in the same default behavior as before this enhancement, +namely that `IngressController` pod replicas are scheduled on worker nodes and +determined according to the `InfrastructureTopology`. + +In the future, when IPI-installed single control-plane node clusters in the +cloud no longer provision a load-balancer by default, they would also benefit +from having the installer set the `IngressPlacement` to `ControlPlane`. + +### Risks and Mitigations + +This should make no noticable difference on "regular" single control-plane node +clusters which do not have any day-1 or day-2 worker nodes. The only difference +for those clusters would be the `IngressController` targeting the "master" pool +rather than the "worker" pool, but since the single control-plane node is +already both in the "master" and "worker" pools, that should make no practical +difference. + +I do not believe this enhancement has any security implications. + +## Design Details + +### Open Questions + +None that I can think of at the moment + +### Test Plan + +- Add unit tests to the Cluster Ingress Operator to make sure the +IngressController resource is generated as exepected. + +- Add periodic nightly tests which install single-node in the cloud, add a few +worker nodes to it, then run conformance tests to make sure we don't run into +any problems not described in this enhancement. + +- Add periodic nightly tests which install a single-node "none"-platform cluster, +add worker nodes to it, and check that ingress traffic still works as expected +and recovers even after the `router-default` pod gets deleted and rescheduled. +Make sure this is still true even after upgrades. + +- Add tests on both cloud / "none"-platform that check that a single-node cluster +with additional workers recovers after the single control-plane node reboots by +running conformance tests post-reboot. + +- Add tests on both cloud / "none"-platform that check that a single-node cluster +with additional workers recovers after an upgrade by running conformance tests +post-upgrade. + +TODO: Describe day-1-workers tests? + +### Graduation Criteria + +**Note:** *Section not required until targeted at a release.* + +TBD + +#### Dev Preview -> Tech Preview + +TBD + +#### Tech Preview -> GA + +TBD + +#### Removing a deprecated feature + +N/A + +### Upgrade / Downgrade Strategy + +In the non-goals section it's mentioned that this enhancement does not apply to +clusters which have been installed prior to the enhancement, so their upgrade +is not discussed. + +This enhancement, to the best of my knowledge, should have no problems +persisting across cluster upgrades. The Test Plan section describes how this +will be tested. + +### Version Skew Strategy + +Does not apply, to the best of my understanding. + +### Operational Aspects of API Extensions + +TBD + +Describe the impact of API extensions (mentioned in the proposal section, i.e. CRDs, +admission and conversion webhooks, aggregated API servers, finalizers) here in detail, +especially how they impact the OCP system architecture and operational aspects. + +- For conversion/admission webhooks and aggregated apiservers: what are the SLIs (Service Level + Indicators) an administrator or support can use to determine the health of the API extensions + + Examples (metrics, alerts, operator conditions) + - authentication-operator condition `APIServerDegraded=False` + - authentication-operator condition `APIServerAvailable=True` + - openshift-authentication/oauth-apiserver deployment and pods health + +- What impact do these API extensions have on existing SLIs (e.g. scalability, API throughput, + API availability) + + Examples: + - Adds 1s to every pod update in the system, slowing down pod scheduling by 5s on average. + - Fails creation of ConfigMap in the system when the webhook is not available. + - Adds a dependency on the SDN service network for all resources, risking API availability in case + of SDN issues. + - Expected use-cases require less than 1000 instances of the CRD, not impacting + general API throughput. + +- How is the impact on existing SLIs to be measured and when (e.g. every release by QE, or + automatically in CI) and by whom (e.g. perf team; name the responsible person and let them review + this enhancement) + +#### Failure Modes + +TBD + +- Describe the possible failure modes of the API extensions. +- Describe how a failure or behaviour of the extension will impact the overall cluster health + (e.g. which kube-controller-manager functionality will stop working), especially regarding + stability, availability, performance and security. +- Describe which OCP teams are likely to be called upon in case of escalation with one of the failure modes + and add them as reviewers to this enhancement. + +#### Support Procedures + +TBD + +Describe how to +- detect the failure modes in a support situation, describe possible symptoms (events, metrics, + alerts, which log output in which component) + + Examples: + - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz". + - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed". + - The metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")` + will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire. + +- disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`) + + - What consequences does it have on the cluster health? + + Examples: + - Garbage collection in kube-controller-manager will stop working. + - Quota will be wrongly computed. + - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data. + Disabling the conversion webhook will break garbage collection. + + - What consequences does it have on existing, running workloads? + + Examples: + - New namespaces won't get the finalizer "xyz" and hence might leak resource X + when deleted. + - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod + communication after some minutes. + + - What consequences does it have for newly created workloads? + + Examples: + - New pods in namespace with Istio support will not get sidecars injected, breaking + their networking. + +- Does functionality fail gracefully and will work resume when re-enabled without risking + consistency? + + Examples: + - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence + will not block the creation or updates on objects when it fails. When the + webhook comes back online, there is a controller reconciling all objects, applying + labels that were not applied during admission webhook downtime. + - Namespaces deletion will not delete all objects in etcd, leading to zombie + objects when another namespace with the same name is created. + +## Implementation History + +Not yet applicable + +## Drawbacks + +- The pinning of the `IngressController` to the "master" pool is another change +which would make single-node clusters slightly different from multi-node +clusters, and any such difference is naturally not ideal. + +- The proposed defaulting behavior for the discussed `IngressController` +parameters is complicated and dependent on three different parameters (infra +topology, control-plane topology, and ingress placement) - such complexity +would probably have to be documented in the CRD definitions and may confuse +users. + +## Alternatives + +- Even when users need to add just one extra worker, require them to add yet +another worker so they could just form a compact 3-node cluster where all +nodes are both workers and control-plane nodes. This kind of topology is +already supported by OCP. This will avoid the need for OCP to support yet +another topology. It has the obvious downside of requiring a "useless" node +the user didn't really need. It also means the user now has to run more +control-plane workloads to facilitate HA - for example, 2 extra replicas of +the API server which consume a lot of memory resources. From an engineering +perspective, it would require us to make the "Control-plane Topology" +parameter dynamic and make sure all operators know to react to changes in that +parameter (it will have to change from "SingleReplica" to "HighlyAvailable" +once those 2 new control-plane nodes join the cluster). I am not aware of the +other engineering difficulties we would encounter when attempting to support +the expansion of a single-node control-plane into a three-node control-plane, +but I think they would not be trivial. + +- Adjust the "baremetal" platform to support single control-plane node +installations and make users and the Assisted Installer (the current popular, +supported method to install single control-plane node clusters) use that +platform instead of the "none" platform. The baremetal platform solves the +issue described in this enhancement with virtual IP addresses/keepalived. This +approach was dismissed due to much higher development efforts and additional +processes that would need to run on the already resource constrained single +control-plane node. Furthermore, even once the baremetal platform is adjusted +to support single-node clusters, the Assisted Installer would have to go +through a lot of development effort in order to make it use the baremetal +platform rather than the "none" platform currently used for single node +installations. This may happen in the future. + +## Infrastructure Needed [optional] + +N/A From c07cf378ec782bf8f23ba108a9d0699f06171b3d Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Feb 2022 17:43:48 +0100 Subject: [PATCH 02/34] possible -> supported --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 607cfa6cbf..08c7ff5657 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -273,7 +273,7 @@ The installer will detect situations in which it's unlikely the user will want to set up a load-balancer. For now, those situations include installation of single control-plane node cluster deployments on "on-prem" platforms such as "none" or "vSphere" (although today single control-plane node clusters are only -possible on the "none" platform). In those situations, the installer will set +supported on the "none" platform). In those situations, the installer will set `IngressPlacement` to be `ControlPlane`. Since there's just a single control-plane node, `ControlPlane` topology would be `SingleReplica` and the combined effect would be that the `IngressController` will have just a single From 119d9576319a87d418ed252c604dfc130a56889f Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Feb 2022 18:29:21 +0100 Subject: [PATCH 03/34] on-prem -> just none --- .../single-node-openshift-with-workers.md | 24 +++++++++---------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 08c7ff5657..cae05ed630 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -270,19 +270,17 @@ nodePlacement: the current behavior, it has no importance in this enhancement) The installer will detect situations in which it's unlikely the user will want -to set up a load-balancer. For now, those situations include installation of -single control-plane node cluster deployments on "on-prem" platforms such as -"none" or "vSphere" (although today single control-plane node clusters are only -supported on the "none" platform). In those situations, the installer will set -`IngressPlacement` to be `ControlPlane`. Since there's just a single -control-plane node, `ControlPlane` topology would be `SingleReplica` and the -combined effect would be that the `IngressController` will have just a single -replica and be pinned to the single control-plane node. This will then ensure -that the `router-default` deployment created by the Cluster Ingress Operator -will always run on the single control-plane node, and as a result any -`*.apps..` DNS entries which originally pointed at the single -control-plane node will remain correct even in the face of newly added worker -nodes. +to set up a load-balancer. For now, those situations only include installation +of single control-plane node cluster deployments on the "none" platform. In +those situations, the installer will set `IngressPlacement` to be +`ControlPlane`. Since there's just a single control-plane node, `ControlPlane` +topology would be `SingleReplica` and the combined effect would be that the +`IngressController` will have just a single replica and be pinned to the single +control-plane node. This will then ensure that the `router-default` deployment +created by the Cluster Ingress Operator will always run on the single +control-plane node, and as a result any `*.apps..` DNS entries +which originally pointed at the single control-plane node will remain correct +even in the face of newly added worker nodes. In any other situations, the installer will set `IngressPlacement` to `Workers`, resulting in the same default behavior as before this enhancement, From 99d4f44a82e6b72ce2dab824fc80665403a7595e Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Feb 2022 19:34:37 +0100 Subject: [PATCH 04/34] Added operational aspects of API extensions --- .../single-node-openshift-with-workers.md | 97 ++++--------------- 1 file changed, 18 insertions(+), 79 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index cae05ed630..cd7c3487b1 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -366,94 +366,33 @@ Does not apply, to the best of my understanding. ### Operational Aspects of API Extensions -TBD - -Describe the impact of API extensions (mentioned in the proposal section, i.e. CRDs, -admission and conversion webhooks, aggregated API servers, finalizers) here in detail, -especially how they impact the OCP system architecture and operational aspects. - -- For conversion/admission webhooks and aggregated apiservers: what are the SLIs (Service Level - Indicators) an administrator or support can use to determine the health of the API extensions +- The "ControlPlane" IngressPlacement parameter value may not be used if the cluster's +ControlPlaneTopology parameter is set to "External". - Examples (metrics, alerts, operator conditions) - - authentication-operator condition `APIServerDegraded=False` - - authentication-operator condition `APIServerAvailable=True` - - openshift-authentication/oauth-apiserver deployment and pods health +- This API change only affects the defaulting behavior of the IngressController CR, +it does not add any new capabilities to OCP, or gives any more flexibility than there +already was. -- What impact do these API extensions have on existing SLIs (e.g. scalability, API throughput, - API availability) +- For administrators and support engineers, the IngressController is still the source +of truth and where you need to look for understanding the router placement in practice. +Nothing has changed in that regard. - Examples: - - Adds 1s to every pod update in the system, slowing down pod scheduling by 5s on average. - - Fails creation of ConfigMap in the system when the webhook is not available. - - Adds a dependency on the SDN service network for all resources, risking API availability in case - of SDN issues. - - Expected use-cases require less than 1000 instances of the CRD, not impacting - general API throughput. - -- How is the impact on existing SLIs to be measured and when (e.g. every release by QE, or - automatically in CI) and by whom (e.g. perf team; name the responsible person and let them review - this enhancement) +- New single control-plane node clusters with workers #### Failure Modes -TBD - -- Describe the possible failure modes of the API extensions. -- Describe how a failure or behaviour of the extension will impact the overall cluster health - (e.g. which kube-controller-manager functionality will stop working), especially regarding - stability, availability, performance and security. -- Describe which OCP teams are likely to be called upon in case of escalation with one of the failure modes - and add them as reviewers to this enhancement. +- The "ControlPlane" IngressPlacement parameter value may not be used if the cluster's +ControlPlaneTopology parameter is set to "External". This should ideally be blocked by +admission controllers for the Ingress config CR, if there are any. On top of that, the +Ingress Operator should refuse to create an IngressController CR or reconcile an existing +one while IngressPlacement is set to "ControlPlane" and the ControlPlaneTopology parameter +is set to "External". This will harm cluster health and disable ingress traffic to the cluster. #### Support Procedures -TBD - -Describe how to -- detect the failure modes in a support situation, describe possible symptoms (events, metrics, - alerts, which log output in which component) - - Examples: - - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz". - - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed". - - The metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")` - will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire. - -- disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`) - - - What consequences does it have on the cluster health? - - Examples: - - Garbage collection in kube-controller-manager will stop working. - - Quota will be wrongly computed. - - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data. - Disabling the conversion webhook will break garbage collection. - - - What consequences does it have on existing, running workloads? - - Examples: - - New namespaces won't get the finalizer "xyz" and hence might leak resource X - when deleted. - - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod - communication after some minutes. - - - What consequences does it have for newly created workloads? - - Examples: - - New pods in namespace with Istio support will not get sidecars injected, breaking - their networking. - -- Does functionality fail gracefully and will work resume when re-enabled without risking - consistency? - - Examples: - - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence - will not block the creation or updates on objects when it fails. When the - webhook comes back online, there is a controller reconciling all objects, applying - labels that were not applied during admission webhook downtime. - - Namespaces deletion will not delete all objects in etcd, leading to zombie - objects when another namespace with the same name is created. +- For administrators and support engineers, the IngressController is still the source +of truth and where you need to look for understanding the router placement in practice. +Nothing has changed in that regard. ## Implementation History From bcaece063dac7a8930293986c5285a0dca9cddd8 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Feb 2022 19:36:16 +0100 Subject: [PATCH 05/34] typo --- enhancements/single-node/single-node-openshift-with-workers.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index cd7c3487b1..da1bdc1a1a 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -377,8 +377,6 @@ already was. of truth and where you need to look for understanding the router placement in practice. Nothing has changed in that regard. -- New single control-plane node clusters with workers - #### Failure Modes - The "ControlPlane" IngressPlacement parameter value may not be used if the cluster's From 4e257d6e371f25f085d6c0c656670d2375250647 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Tue, 1 Mar 2022 09:33:33 +0100 Subject: [PATCH 06/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Miciah Dashiel Butler Masters --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index da1bdc1a1a..10ca0eae66 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -370,7 +370,7 @@ Does not apply, to the best of my understanding. ControlPlaneTopology parameter is set to "External". - This API change only affects the defaulting behavior of the IngressController CR, -it does not add any new capabilities to OCP, or gives any more flexibility than there +it does not add any new capabilities to OCP, or give any more flexibility than there already was. - For administrators and support engineers, the IngressController is still the source From 6d465f29d22852a8ea6f91cf792e9e2ca229ec93 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Wed, 2 Mar 2022 20:24:28 +0100 Subject: [PATCH 07/34] Ingress operator should not fail to reconcile invalid combination --- .../single-node-openshift-with-workers.md | 36 +++++++++---------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index da1bdc1a1a..6eeb2321d6 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -63,9 +63,9 @@ automatically deployed by the installer. This CR has two topology parameters in its status called "Control Plane Topology" and "Infrastructure Topology" which, as of today, may have one of three values: -- SingleReplica -- HighlyAvailable -- External (this value is unrelated to this enhancement and is only mentioned here for completeness' sake) +- `SingleReplica` +- `HighlyAvailable` +- `External` (this value is unrelated to this enhancement and is only mentioned here for completeness' sake) (See "see-also" enhancements links for more information about the topology parameters and their possible values) @@ -73,7 +73,7 @@ parameters and their possible values) The "Control Plane Topology" parameter is used by control-plane operators (such as the Cluster etcd Operator) to determine how many replicas they should give their various Deployments / StatefulSets. The value of this parameter in single -control-plane node clusters is always simply "SingleReplica". +control-plane node clusters is always simply `SingleReplica`. The "Infrastructure Topology" parameter is used by infrastructure operators (such as the Cluster Ingress Operator or Cluster Monitoring Operator) to @@ -366,29 +366,29 @@ Does not apply, to the best of my understanding. ### Operational Aspects of API Extensions -- The "ControlPlane" IngressPlacement parameter value may not be used if the cluster's -ControlPlaneTopology parameter is set to "External". +- The `ControlPlane` value of the `IngressPlacement` parameter value may not be +used when the cluster's `ControlPlaneTopology` parameter is set to `External`. -- This API change only affects the defaulting behavior of the IngressController CR, +- This API change only affects the defaulting behavior of the `IngressController` CR, it does not add any new capabilities to OCP, or gives any more flexibility than there already was. -- For administrators and support engineers, the IngressController is still the source -of truth and where you need to look for understanding the router placement in practice. -Nothing has changed in that regard. +- For administrators and support engineers, the `IngressController` is still the +source of truth and where you need to look if you seek to understand the router +placement in practice. Nothing has changed in that regard. #### Failure Modes -- The "ControlPlane" IngressPlacement parameter value may not be used if the cluster's -ControlPlaneTopology parameter is set to "External". This should ideally be blocked by -admission controllers for the Ingress config CR, if there are any. On top of that, the -Ingress Operator should refuse to create an IngressController CR or reconcile an existing -one while IngressPlacement is set to "ControlPlane" and the ControlPlaneTopology parameter -is set to "External". This will harm cluster health and disable ingress traffic to the cluster. +- The `ControlPlane` value of the `IngressPlacement` parameter value may not be +used if the cluster's `ControlPlaneTopology` parameter is set to `External`. +The Ingress Operator would treat such combination as if the `IngressPlacement` +value is actually `Workers` instead, and log a warning to indicate that this +invalid combination has been detected. The Ingress Operator should not fail +to reconcile `IngressController` CRs due to this invalid combination. #### Support Procedures -- For administrators and support engineers, the IngressController is still the source +- For administrators and support engineers, the `IngressController` is still the source of truth and where you need to look for understanding the router placement in practice. Nothing has changed in that regard. @@ -420,7 +420,7 @@ control-plane workloads to facilitate HA - for example, 2 extra replicas of the API server which consume a lot of memory resources. From an engineering perspective, it would require us to make the "Control-plane Topology" parameter dynamic and make sure all operators know to react to changes in that -parameter (it will have to change from "SingleReplica" to "HighlyAvailable" +parameter (it will have to change from `SingleReplica` to `HighlyAvailable` once those 2 new control-plane nodes join the cluster). I am not aware of the other engineering difficulties we would encounter when attempting to support the expansion of a single-node control-plane into a three-node control-plane, From 8a0b65ee59fcc6bcbee7905fa295970b872e3326 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Fri, 4 Mar 2022 11:54:54 +0100 Subject: [PATCH 08/34] Remark that ingress on control plane with external topology may be allowed in the future --- enhancements/single-node/single-node-openshift-with-workers.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index da02ac1ba5..f9db0abf3b 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -367,7 +367,8 @@ Does not apply, to the best of my understanding. ### Operational Aspects of API Extensions - The `ControlPlane` value of the `IngressPlacement` parameter value may not be -used when the cluster's `ControlPlaneTopology` parameter is set to `External`. +used when the cluster's `ControlPlaneTopology` parameter is set to `External`, +this combination is currently considered invalid. This may change in the future. - This API change only affects the defaulting behavior of the `IngressController` CR, it does not add any new capabilities to OCP, or give any more flexibility than there From 4b0814a2862ec86c801d20cf4816b34daa7feefc Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Fri, 4 Mar 2022 17:45:58 +0100 Subject: [PATCH 09/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Doug Hellmann --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index f9db0abf3b..7ffe528f07 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -216,7 +216,7 @@ growing computation demands. ### API Extensions -Introduce a new topology field in the Ingress config CR +Introduce a new field in the Ingress config CR (`config.openshift.io/v1/ingresses`) called `IngressPlacement`. ### Implementation Details/Notes/Constraints From 33e45b0945e835be2290edf62dc74b753d95475f Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 21 Mar 2022 10:34:08 +0100 Subject: [PATCH 10/34] Renamed IngressPlacement to DefaultIngressPlacement and moved it into the Ingress status --- .../single-node-openshift-with-workers.md | 216 ++++++++++++------ 1 file changed, 144 insertions(+), 72 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 7ffe528f07..2c8572c179 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -59,7 +59,7 @@ installation capabilities (after some minor DNS configuration issues which will be improved by the Assisted Installer team, separately from this enhancement). OCP installations have an `infrastructures.config.openshift.io` CR -automatically deployed by the installer. This CR has two topology parameters in +automatically deployed by the installer. This CR has two topology fields in its status called "Control Plane Topology" and "Infrastructure Topology" which, as of today, may have one of three values: @@ -68,17 +68,17 @@ as of today, may have one of three values: - `External` (this value is unrelated to this enhancement and is only mentioned here for completeness' sake) (See "see-also" enhancements links for more information about the topology -parameters and their possible values) +fields and their possible values) -The "Control Plane Topology" parameter is used by control-plane operators (such +The "Control Plane Topology" field is used by control-plane operators (such as the Cluster etcd Operator) to determine how many replicas they should give -their various Deployments / StatefulSets. The value of this parameter in single +their various Deployments / StatefulSets. The value of this field in single control-plane node clusters is always simply `SingleReplica`. -The "Infrastructure Topology" parameter is used by infrastructure operators +The "Infrastructure Topology" field is used by infrastructure operators (such as the Cluster Ingress Operator or Cluster Monitoring Operator) to determine how many replicas they should give their various Deployments / -StatefulSets. The value of this parameter is a function of how many workers +StatefulSets. The value of this field is a function of how many workers were present during installation. On "none"-platform single control-plane node clusters, when adding @@ -159,8 +159,8 @@ control-plane node cluster installation - Deal with the scalability of single control-plane node clusters that have additional workers added to them. The absence of multiple control-plane nodes -means that the number of workers/pods that can be supported on such a topology is -even more limited than in a regular 3 control-plane node cluster. Relevant +means that the number of workers/pods that can be supported on such a topology +is even more limited than in a regular 3 control-plane node cluster. Relevant documentation may have to point out that the pod limits (or other control-plane related limits) that apply to a single control-plane node cluster with no workers also apply globally across all additional added workers. If a large @@ -193,13 +193,17 @@ installations. The baremetal platform solves the "floating ingress" issue by using virtual IP addresses/keepalived. - Deal with users who want their ingress traffic to not go through the single -control-plane node. +control-plane node. Such users can easily achieve that goal by modifying the +`IngressController` to target just the worker, non-control-plane node(s) +(perhaps by labeling all of them with a unique label that is not applied to the +control-plane node that is also a worker and then have the `IngressController` +target that label). - Deal with non-cloud, non-"none" platforms such as baremetal, vSphere, etc. ## Proposal -- Create a new `IngressPlacement` API parameter. +- Create a new `DefaultRouterPlacement` API field. - Make sure worker node ignition files are generated even in bootstrap-in-place single control-plane node "none"-platform installation. @@ -216,32 +220,75 @@ growing computation demands. ### API Extensions -Introduce a new field in the Ingress config CR -(`config.openshift.io/v1/ingresses`) called `IngressPlacement`. +Introduce a new status field in the Ingress config CR +(`config.openshift.io/v1/ingresses`) called `DefaultRouterPlacement`. -### Implementation Details/Notes/Constraints +In addition, continue to allow the `.spec.replicas` and `.spec.nodePlacement` +fields in `operator.openshift.io/v1/ingresscontrollers` CRs to be omitted, but +change the defaulting behavior for these fields based on the new status field. + +The sections below go into detail about the field, its possible values, and the +behavior expected from each of them, but in practice following is the proposed +addition to `openshift/api`'s `config/v1/types_ingress.go` file: + +```go +type IngressStatus struct { + // ... existing fields omitted + + // defaultRouterPlacement is set by the installer and lets it control which + // nodes will host the ingress router pods by default, the options being + // control-plane nodes or worker nodes. + // + // This field works by dictating how the Ingress Operator will set the default + // values of future IngressController resources' replicas and nodePlacement + // fields. + // + // The value of replicas is set based on the value of a chosen field in the + // Infrastructure CR. If defaultRouterPlacement was set to ControlPlane, the + // chosen field will be controlPlaneTopology. If it was set to Workers the + // chosen field will be infrastructureTopology. Replicas will then be set to 1 + // or 2 with respect to the chosen field's value (which can be either + // SingleReplica or HighlyAvailable). + // + // The value of nodePlacement is adjusted based on defaultRouterPlacement - If + // defaultRouterPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" + // label will be added. If defaultRouterPlacement is set to Workers the + // "node-role.kubernetes.io/master" label will be added. + // + // +kubebuilder:validation:Enum:="ControlPlane";"Workers" + // +kubebuidler:default:="Workers" + // +optional + DefaultRouterPlacement DefaultRouterPlacement `json:"defaultRouterPlacement"` +} + +// TopologyMode defines the topology mode of the control/infra nodes. +type DefaultRouterPlacement string + +const ( + // "Workers" is for having router pods placed on worker nodes by default + Workers DefaultRouterPlacement = "Workers" + + // "ControlPlane" is for having router pods placed on control-plane nodes by default + ControlPlane DefaultRouterPlacement = "ControlPlane" +) +``` -This new field will have one of these values - `ControlPlane` or `Workers`. +### Implementation Details/Notes/Constraints -In addition, continue to allow the `.spec.replicas` and `.spec.nodePlacement` -parameters in `operator.openshift.io/v1/ingresscontrollers` CRs to be omitted, -but change the defaulting behavior for these fields. +This new field will have one of these values - `ControlPlane` or `Workers`. +There are no current plans for any more values, but more may be added in the +future. -The value of the `IngressPlacement` field will affect the defaulting behavior -of the `IngressController`'s `.spec.replicas` and `.spec.nodePlacement` -parameters. In the absence of an `IngressController` resource created by the +The value of the `DefaultRouterPlacement` field will affect the defaulting +behavior of `IngressController`'s `.spec.replicas` and `.spec.nodePlacement` +fields. In the absence of an `IngressController` resource created by the user/installer, or when the user/installer creates an `IngressController` with -these two parameters omitted, the Cluster Ingress Operator will choose the -default values for those parameters based on the value of `IngressPlacement`. +these fields omitted, the Cluster Ingress Operator will choose the default +values for those fields based on the value of `DefaultRouterPlacement`. -If the value of `IngressPlacement` itself is omitted by the installer, it -is defaulted to `Workers`. This would be useful to maintain the current behavior -if the API change is merged before the installer change to set this field, or if -we want this newer OCP version to be backwards-compatible with older installers. - -When the value of `IngressPlacement` is `Workers`, the defaulting behavior of -`.spec.replicas` and `.spec.nodePlacement` will be the same as it is today: -`.spec.replicas` will be chosen according to the value of +When the value of `DefaultRouterPlacement` is `Workers`, the defaulting +behavior of `.spec.replicas` and `.spec.nodePlacement` will be the same as it +is today: `.spec.replicas` will be chosen according to the value of `InfrastructureTopology`, namely `1` when `SingleReplica` or `2` when `HighlyAvailable`. `.spec.nodePlacement` will always just be: @@ -253,10 +300,11 @@ nodePlacement: node-role.kubernetes.io/worker: '' ``` -However, if the value of `IngressPlacement` is `ControlPlane`, the defaulting -behavior will be different: `.spec.replicas` will be chosen instead according -to the value of `ControlPlaneTopology`; again, `1` when `SingleReplica` or `2` -when `HighlyAvailable`. `.spec.nodePlacement` will be always just be: +However, if the value of `DefaultRouterPlacement` is `ControlPlane`, the +defaulting behavior will be different: `.spec.replicas` will be chosen instead +according to the value of `ControlPlaneTopology`; again, `1` when +`SingleReplica` or `2` when `HighlyAvailable`. `.spec.nodePlacement` will be +always just be: ```yaml nodePlacement: @@ -272,7 +320,7 @@ the current behavior, it has no importance in this enhancement) The installer will detect situations in which it's unlikely the user will want to set up a load-balancer. For now, those situations only include installation of single control-plane node cluster deployments on the "none" platform. In -those situations, the installer will set `IngressPlacement` to be +those situations, the installer will set `DefaultRouterPlacement` to be `ControlPlane`. Since there's just a single control-plane node, `ControlPlane` topology would be `SingleReplica` and the combined effect would be that the `IngressController` will have just a single replica and be pinned to the single @@ -282,14 +330,21 @@ control-plane node, and as a result any `*.apps..` DNS entries which originally pointed at the single control-plane node will remain correct even in the face of newly added worker nodes. -In any other situations, the installer will set `IngressPlacement` to +In any other situations, the installer will set `DefaultRouterPlacement` to `Workers`, resulting in the same default behavior as before this enhancement, -namely that `IngressController` pod replicas are scheduled on worker nodes and -determined according to the `InfrastructureTopology`. +namely that `IngressController` pods are scheduled on worker nodes and their +number of replicas determined according to the `InfrastructureTopology`. + +If the value of `DefaultRouterPlacement` itself is omitted, it is defaulted to +`Workers`. This would be useful in order to maintain the current behavior if +the API/Ingress PR is merged before the installer PR to set this field. + +The installer may not set `DefaultRouterPlacement` to `ControlPlane` when the +cluster's `ControlPlaneTopology` is set to `External`. In the future, when IPI-installed single control-plane node clusters in the cloud no longer provision a load-balancer by default, they would also benefit -from having the installer set the `IngressPlacement` to `ControlPlane`. +from having the installer set the `DefaultRouterPlacement` to `ControlPlane`. ### Risks and Mitigations @@ -300,6 +355,15 @@ rather than the "worker" pool, but since the single control-plane node is already both in the "master" and "worker" pools, that should make no practical difference. +On multi-node clusters, the installer will never set the +`DefaultRouterPlacement` field to `ControlPlane`, so there are no risks to +discuss for multi-node clusters in this enhancement. However, any future +enhancement that will consider making this field configurable by allowing +the user to set it during installation, should take into account that the +documentation / installation process for the load balancers would have to take +this field into account when choosing which nodes the load balancer should +target. + I do not believe this enhancement has any security implications. ## Design Details @@ -352,13 +416,19 @@ N/A ### Upgrade / Downgrade Strategy -In the non-goals section it's mentioned that this enhancement does not apply to -clusters which have been installed prior to the enhancement, so their upgrade -is not discussed. +The new `DefaultRouterPlacement` API field will have a default value of +`Workers`. The behavior of the Ingress Operator when the +`DefaultRouterPlacement` field has the value `Workers` should be identical +to its current behavior (before this enhancement). This means that clusters +that go through an upgrade from a version before this enhancement to a later +version which includes this enhancement will maintain their current behavior, +this enhancement should not change anything for them. -This enhancement, to the best of my knowledge, should have no problems -persisting across cluster upgrades. The Test Plan section describes how this -will be tested. +If the value of `DefaultRouterPlacement` is empty (TODO: Could this possibly +happen if the the Ingress Operator reads a resource created according to the +old Ingress CRD that didn't have this value? Not sure how defaults work in this +scenario, in any case, it's not crucial) the Ingress Operator should make sure +to treat it as if it were to have the value `Workers`. ### Version Skew Strategy @@ -366,13 +436,12 @@ Does not apply, to the best of my understanding. ### Operational Aspects of API Extensions -- The `ControlPlane` value of the `IngressPlacement` parameter value may not be -used when the cluster's `ControlPlaneTopology` parameter is set to `External`, -this combination is currently considered invalid. This may change in the future. +- Since the `DefaultRouterPlacement` field is part of the status of the +Ingress config CR, it is clear that users should not modify this field. -- This API change only affects the defaulting behavior of the `IngressController` CR, -it does not add any new capabilities to OCP, or give any more flexibility than there -already was. +- This API change only affects the defaulting behavior of the `IngressController` +CR, it does not add any new capabilities to OCP, or give any more flexibility +than there already was. - For administrators and support engineers, the `IngressController` is still the source of truth and where you need to look if you seek to understand the router @@ -380,12 +449,15 @@ placement in practice. Nothing has changed in that regard. #### Failure Modes -- The `ControlPlane` value of the `IngressPlacement` parameter value may not be -used if the cluster's `ControlPlaneTopology` parameter is set to `External`. -The Ingress Operator would treat such combination as if the `IngressPlacement` -value is actually `Workers` instead, and log a warning to indicate that this -invalid combination has been detected. The Ingress Operator should not fail -to reconcile `IngressController` CRs due to this invalid combination. +- The `ControlPlane` value of the `DefaultRouterPlacement` field value may +not be used if the cluster's `ControlPlaneTopology` field is set to +`External`. The Ingress Operator would treat such combination as if the +`DefaultRouterPlacement` value is actually `Workers` instead, and log a warning +to indicate that this invalid combination has been detected. The Ingress +Operator should not fail to reconcile `IngressController` CRs due to this +invalid combination. This should not happen under any circumstance since +the installer will never set this combination, but the Ingress Operator +should still handle it gracefully regardless. #### Support Procedures @@ -404,7 +476,7 @@ which would make single-node clusters slightly different from multi-node clusters, and any such difference is naturally not ideal. - The proposed defaulting behavior for the discussed `IngressController` -parameters is complicated and dependent on three different parameters (infra +fields is complicated and dependent on three different fields (infra topology, control-plane topology, and ingress placement) - such complexity would probably have to be documented in the CRD definitions and may confuse users. @@ -412,20 +484,20 @@ users. ## Alternatives - Even when users need to add just one extra worker, require them to add yet -another worker so they could just form a compact 3-node cluster where all -nodes are both workers and control-plane nodes. This kind of topology is -already supported by OCP. This will avoid the need for OCP to support yet -another topology. It has the obvious downside of requiring a "useless" node -the user didn't really need. It also means the user now has to run more -control-plane workloads to facilitate HA - for example, 2 extra replicas of -the API server which consume a lot of memory resources. From an engineering -perspective, it would require us to make the "Control-plane Topology" -parameter dynamic and make sure all operators know to react to changes in that -parameter (it will have to change from `SingleReplica` to `HighlyAvailable` -once those 2 new control-plane nodes join the cluster). I am not aware of the -other engineering difficulties we would encounter when attempting to support -the expansion of a single-node control-plane into a three-node control-plane, -but I think they would not be trivial. +another worker so they could just form a compact 3-node cluster where all nodes +are both workers and control-plane nodes. This kind of topology is already +supported by OCP. This will avoid the need for OCP to support yet another +topology. It has the obvious downside of requiring a "useless" node the user +didn't really need. It also means the user now has to run more control-plane +workloads to facilitate HA - for example, 2 extra replicas of the API server +which consume a lot of memory resources. From an engineering perspective, it +would require us to make the "Control-plane Topology" field dynamic and make +sure all operators know to react to changes in that field (it will have to +change from `SingleReplica` to `HighlyAvailable` once those 2 new control-plane +nodes join the cluster). I am not aware of the other engineering difficulties +we would encounter when attempting to support the expansion of a single-node +control-plane into a three-node control-plane, but I think they would not be +trivial. - Adjust the "baremetal" platform to support single control-plane node installations and make users and the Assisted Installer (the current popular, From 93259e8fa7390d04154ec7b0c1bc944b37a6a480 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Tue, 22 Mar 2022 20:09:45 +0100 Subject: [PATCH 11/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Miciah Dashiel Butler Masters --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 2c8572c179..8940e3af28 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -256,7 +256,7 @@ type IngressStatus struct { // "node-role.kubernetes.io/master" label will be added. // // +kubebuilder:validation:Enum:="ControlPlane";"Workers" - // +kubebuidler:default:="Workers" + // +kubebuilder:default:="Workers" // +optional DefaultRouterPlacement DefaultRouterPlacement `json:"defaultRouterPlacement"` } From 7ba14c815891af15c6166d45db75e453582791f7 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Tue, 22 Mar 2022 20:26:21 +0100 Subject: [PATCH 12/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Miciah Dashiel Butler Masters --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 8940e3af28..16983e81eb 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -261,7 +261,7 @@ type IngressStatus struct { DefaultRouterPlacement DefaultRouterPlacement `json:"defaultRouterPlacement"` } -// TopologyMode defines the topology mode of the control/infra nodes. +// DefaultRouterPlacement defines the default placement of ingress router pods. type DefaultRouterPlacement string const ( From b2735befa83c30be03b8805327f2461dfe83f140 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Wed, 23 Mar 2022 10:01:08 +0100 Subject: [PATCH 13/34] DefaultRouterPlacement -> DefaultPlacement to avoid using the word Router --- .../single-node-openshift-with-workers.md | 56 +++++++++---------- 1 file changed, 28 insertions(+), 28 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 16983e81eb..3d2cbdfa1c 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -203,7 +203,7 @@ target that label). ## Proposal -- Create a new `DefaultRouterPlacement` API field. +- Create a new `DefaultPlacement` API field. - Make sure worker node ignition files are generated even in bootstrap-in-place single control-plane node "none"-platform installation. @@ -221,7 +221,7 @@ growing computation demands. ### API Extensions Introduce a new status field in the Ingress config CR -(`config.openshift.io/v1/ingresses`) called `DefaultRouterPlacement`. +(`config.openshift.io/v1/ingresses`) called `DefaultPlacement`. In addition, continue to allow the `.spec.replicas` and `.spec.nodePlacement` fields in `operator.openshift.io/v1/ingresscontrollers` CRs to be omitted, but @@ -235,7 +235,7 @@ addition to `openshift/api`'s `config/v1/types_ingress.go` file: type IngressStatus struct { // ... existing fields omitted - // defaultRouterPlacement is set by the installer and lets it control which + // defaultPlacement is set by the installer and lets it control which // nodes will host the ingress router pods by default, the options being // control-plane nodes or worker nodes. // @@ -244,32 +244,32 @@ type IngressStatus struct { // fields. // // The value of replicas is set based on the value of a chosen field in the - // Infrastructure CR. If defaultRouterPlacement was set to ControlPlane, the + // Infrastructure CR. If defaultPlacement was set to ControlPlane, the // chosen field will be controlPlaneTopology. If it was set to Workers the // chosen field will be infrastructureTopology. Replicas will then be set to 1 // or 2 with respect to the chosen field's value (which can be either // SingleReplica or HighlyAvailable). // - // The value of nodePlacement is adjusted based on defaultRouterPlacement - If - // defaultRouterPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" - // label will be added. If defaultRouterPlacement is set to Workers the + // The value of nodePlacement is adjusted based on defaultPlacement - If + // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" + // label will be added. If defaultPlacement is set to Workers the // "node-role.kubernetes.io/master" label will be added. // // +kubebuilder:validation:Enum:="ControlPlane";"Workers" // +kubebuilder:default:="Workers" // +optional - DefaultRouterPlacement DefaultRouterPlacement `json:"defaultRouterPlacement"` + DefaultPlacement DefaultPlacement `json:"defaultPlacement"` } -// DefaultRouterPlacement defines the default placement of ingress router pods. -type DefaultRouterPlacement string +// DefaultPlacement defines the default placement of ingress router pods. +type DefaultPlacement string const ( // "Workers" is for having router pods placed on worker nodes by default - Workers DefaultRouterPlacement = "Workers" + Workers DefaultPlacement = "Workers" // "ControlPlane" is for having router pods placed on control-plane nodes by default - ControlPlane DefaultRouterPlacement = "ControlPlane" + ControlPlane DefaultPlacement = "ControlPlane" ) ``` @@ -279,14 +279,14 @@ This new field will have one of these values - `ControlPlane` or `Workers`. There are no current plans for any more values, but more may be added in the future. -The value of the `DefaultRouterPlacement` field will affect the defaulting +The value of the `DefaultPlacement` field will affect the defaulting behavior of `IngressController`'s `.spec.replicas` and `.spec.nodePlacement` fields. In the absence of an `IngressController` resource created by the user/installer, or when the user/installer creates an `IngressController` with these fields omitted, the Cluster Ingress Operator will choose the default -values for those fields based on the value of `DefaultRouterPlacement`. +values for those fields based on the value of `DefaultPlacement`. -When the value of `DefaultRouterPlacement` is `Workers`, the defaulting +When the value of `DefaultPlacement` is `Workers`, the defaulting behavior of `.spec.replicas` and `.spec.nodePlacement` will be the same as it is today: `.spec.replicas` will be chosen according to the value of `InfrastructureTopology`, namely `1` when `SingleReplica` or `2` when @@ -300,7 +300,7 @@ nodePlacement: node-role.kubernetes.io/worker: '' ``` -However, if the value of `DefaultRouterPlacement` is `ControlPlane`, the +However, if the value of `DefaultPlacement` is `ControlPlane`, the defaulting behavior will be different: `.spec.replicas` will be chosen instead according to the value of `ControlPlaneTopology`; again, `1` when `SingleReplica` or `2` when `HighlyAvailable`. `.spec.nodePlacement` will be @@ -320,7 +320,7 @@ the current behavior, it has no importance in this enhancement) The installer will detect situations in which it's unlikely the user will want to set up a load-balancer. For now, those situations only include installation of single control-plane node cluster deployments on the "none" platform. In -those situations, the installer will set `DefaultRouterPlacement` to be +those situations, the installer will set `DefaultPlacement` to be `ControlPlane`. Since there's just a single control-plane node, `ControlPlane` topology would be `SingleReplica` and the combined effect would be that the `IngressController` will have just a single replica and be pinned to the single @@ -330,21 +330,21 @@ control-plane node, and as a result any `*.apps..` DNS entries which originally pointed at the single control-plane node will remain correct even in the face of newly added worker nodes. -In any other situations, the installer will set `DefaultRouterPlacement` to +In any other situations, the installer will set `DefaultPlacement` to `Workers`, resulting in the same default behavior as before this enhancement, namely that `IngressController` pods are scheduled on worker nodes and their number of replicas determined according to the `InfrastructureTopology`. -If the value of `DefaultRouterPlacement` itself is omitted, it is defaulted to +If the value of `DefaultPlacement` itself is omitted, it is defaulted to `Workers`. This would be useful in order to maintain the current behavior if the API/Ingress PR is merged before the installer PR to set this field. -The installer may not set `DefaultRouterPlacement` to `ControlPlane` when the +The installer may not set `DefaultPlacement` to `ControlPlane` when the cluster's `ControlPlaneTopology` is set to `External`. In the future, when IPI-installed single control-plane node clusters in the cloud no longer provision a load-balancer by default, they would also benefit -from having the installer set the `DefaultRouterPlacement` to `ControlPlane`. +from having the installer set the `DefaultPlacement` to `ControlPlane`. ### Risks and Mitigations @@ -356,7 +356,7 @@ already both in the "master" and "worker" pools, that should make no practical difference. On multi-node clusters, the installer will never set the -`DefaultRouterPlacement` field to `ControlPlane`, so there are no risks to +`DefaultPlacement` field to `ControlPlane`, so there are no risks to discuss for multi-node clusters in this enhancement. However, any future enhancement that will consider making this field configurable by allowing the user to set it during installation, should take into account that the @@ -416,15 +416,15 @@ N/A ### Upgrade / Downgrade Strategy -The new `DefaultRouterPlacement` API field will have a default value of +The new `DefaultPlacement` API field will have a default value of `Workers`. The behavior of the Ingress Operator when the -`DefaultRouterPlacement` field has the value `Workers` should be identical +`DefaultPlacement` field has the value `Workers` should be identical to its current behavior (before this enhancement). This means that clusters that go through an upgrade from a version before this enhancement to a later version which includes this enhancement will maintain their current behavior, this enhancement should not change anything for them. -If the value of `DefaultRouterPlacement` is empty (TODO: Could this possibly +If the value of `DefaultPlacement` is empty (TODO: Could this possibly happen if the the Ingress Operator reads a resource created according to the old Ingress CRD that didn't have this value? Not sure how defaults work in this scenario, in any case, it's not crucial) the Ingress Operator should make sure @@ -436,7 +436,7 @@ Does not apply, to the best of my understanding. ### Operational Aspects of API Extensions -- Since the `DefaultRouterPlacement` field is part of the status of the +- Since the `DefaultPlacement` field is part of the status of the Ingress config CR, it is clear that users should not modify this field. - This API change only affects the defaulting behavior of the `IngressController` @@ -449,10 +449,10 @@ placement in practice. Nothing has changed in that regard. #### Failure Modes -- The `ControlPlane` value of the `DefaultRouterPlacement` field value may +- The `ControlPlane` value of the `DefaultPlacement` field value may not be used if the cluster's `ControlPlaneTopology` field is set to `External`. The Ingress Operator would treat such combination as if the -`DefaultRouterPlacement` value is actually `Workers` instead, and log a warning +`DefaultPlacement` value is actually `Workers` instead, and log a warning to indicate that this invalid combination has been detected. The Ingress Operator should not fail to reconcile `IngressController` CRs due to this invalid combination. This should not happen under any circumstance since From 16a0597b81779dcceef6f0c6dd7251d2c3bbe742 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Wed, 23 Mar 2022 15:49:06 +0100 Subject: [PATCH 14/34] Default doc remark / enum namespacing --- .../single-node/single-node-openshift-with-workers.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 3d2cbdfa1c..ddc1abff11 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -254,6 +254,8 @@ type IngressStatus struct { // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" // label will be added. If defaultPlacement is set to Workers the // "node-role.kubernetes.io/master" label will be added. + // + // When omitted, the default value is Workers // // +kubebuilder:validation:Enum:="ControlPlane";"Workers" // +kubebuilder:default:="Workers" @@ -266,10 +268,10 @@ type DefaultPlacement string const ( // "Workers" is for having router pods placed on worker nodes by default - Workers DefaultPlacement = "Workers" + DefaultPlacementWorkers DefaultPlacement = "Workers" // "ControlPlane" is for having router pods placed on control-plane nodes by default - ControlPlane DefaultPlacement = "ControlPlane" + DefaultPlacementControlPlane DefaultPlacement = "ControlPlane" ) ``` From b65b4675e04e763fe94caf5c3c78f2009f01895c Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:58:59 +0100 Subject: [PATCH 15/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index ddc1abff11..55d1ddc3d3 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -235,7 +235,7 @@ addition to `openshift/api`'s `config/v1/types_ingress.go` file: type IngressStatus struct { // ... existing fields omitted - // defaultPlacement is set by the installer and lets it control which + // defaultPlacement is set at installation time to control which // nodes will host the ingress router pods by default, the options being // control-plane nodes or worker nodes. // From ee31cb47ea62856995d9c7f67f39b38bb913c963 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:59:08 +0100 Subject: [PATCH 16/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 55d1ddc3d3..89fe94b7c6 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -236,7 +236,7 @@ type IngressStatus struct { // ... existing fields omitted // defaultPlacement is set at installation time to control which - // nodes will host the ingress router pods by default, the options being + // nodes will host the ingress router pods by default. The options are // control-plane nodes or worker nodes. // // This field works by dictating how the Ingress Operator will set the default From 1092e58c14d774687d86377b7406f1501c1a2bc4 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:59:17 +0100 Subject: [PATCH 17/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 89fe94b7c6..0e90988a7c 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -245,7 +245,7 @@ type IngressStatus struct { // // The value of replicas is set based on the value of a chosen field in the // Infrastructure CR. If defaultPlacement was set to ControlPlane, the - // chosen field will be controlPlaneTopology. If it was set to Workers the + // chosen field will be controlPlaneTopology. If it is set to Workers the // chosen field will be infrastructureTopology. Replicas will then be set to 1 // or 2 with respect to the chosen field's value (which can be either // SingleReplica or HighlyAvailable). From e2ae215de06d0bcf284f04d3a1860d5786bc97f8 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:59:31 +0100 Subject: [PATCH 18/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- .../single-node/single-node-openshift-with-workers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 0e90988a7c..c0203275a6 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -247,8 +247,8 @@ type IngressStatus struct { // Infrastructure CR. If defaultPlacement was set to ControlPlane, the // chosen field will be controlPlaneTopology. If it is set to Workers the // chosen field will be infrastructureTopology. Replicas will then be set to 1 - // or 2 with respect to the chosen field's value (which can be either - // SingleReplica or HighlyAvailable). + // or 2 based whether the chosen field's value is SingleReplica or + // HighlyAvailable, respectively. // // The value of nodePlacement is adjusted based on defaultPlacement - If // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" From fd8e9511b8d2ccee894d2f39c7c474f1fa968c95 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:59:38 +0100 Subject: [PATCH 19/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index c0203275a6..05fff1f13a 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -250,7 +250,7 @@ type IngressStatus struct { // or 2 based whether the chosen field's value is SingleReplica or // HighlyAvailable, respectively. // - // The value of nodePlacement is adjusted based on defaultPlacement - If + // The value of nodePlacement is adjusted based on defaultPlacement. If // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" // label will be added. If defaultPlacement is set to Workers the // "node-role.kubernetes.io/master" label will be added. From 192e0bc172121edbd75b9aee378bcead9fe517ac Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 10:59:48 +0100 Subject: [PATCH 20/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- .../single-node/single-node-openshift-with-workers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 05fff1f13a..f15ae4a563 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -251,9 +251,9 @@ type IngressStatus struct { // HighlyAvailable, respectively. // // The value of nodePlacement is adjusted based on defaultPlacement. If - // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/worker" + // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/master" // label will be added. If defaultPlacement is set to Workers the - // "node-role.kubernetes.io/master" label will be added. + // "node-role.kubernetes.io/worker" label will be added. // // When omitted, the default value is Workers // From d32507f0978d4cb9c302f510ad8f69067f310528 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 11:00:00 +0100 Subject: [PATCH 21/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- .../single-node/single-node-openshift-with-workers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index f15ae4a563..d824c352f4 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -254,8 +254,8 @@ type IngressStatus struct { // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/master" // label will be added. If defaultPlacement is set to Workers the // "node-role.kubernetes.io/worker" label will be added. - // - // When omitted, the default value is Workers + // + // When omitted, the default value is Workers // // +kubebuilder:validation:Enum:="ControlPlane";"Workers" // +kubebuilder:default:="Workers" From 305632322486d91bfe3ab92a775f1de893efb8be Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 11:00:11 +0100 Subject: [PATCH 22/34] Update enhancements/single-node/single-node-openshift-with-workers.md Co-authored-by: Matthew Staebler --- enhancements/single-node/single-node-openshift-with-workers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index d824c352f4..472977767e 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -244,7 +244,7 @@ type IngressStatus struct { // fields. // // The value of replicas is set based on the value of a chosen field in the - // Infrastructure CR. If defaultPlacement was set to ControlPlane, the + // Infrastructure CR. If defaultPlacement is set to ControlPlane, the // chosen field will be controlPlaneTopology. If it is set to Workers the // chosen field will be infrastructureTopology. Replicas will then be set to 1 // or 2 based whether the chosen field's value is SingleReplica or From 97dabe8784c82f7a0b8b651a4e181a89c6d3b888 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 13:49:13 +0100 Subject: [PATCH 23/34] Improvements to API Extensions section --- .../single-node-openshift-with-workers.md | 134 +++++++++++++----- 1 file changed, 101 insertions(+), 33 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 472977767e..a33683c4db 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -225,7 +225,31 @@ Introduce a new status field in the Ingress config CR In addition, continue to allow the `.spec.replicas` and `.spec.nodePlacement` fields in `operator.openshift.io/v1/ingresscontrollers` CRs to be omitted, but -change the defaulting behavior for these fields based on the new status field. +change the "controller-defaulting" behavior for these fields to be based on +this new status field. This means that when the Cluster Ingress Operator +creates router `Deployments` out of `IngressController` resources it will +adjust how it behaves in the absence of these parameters. The exact details of +this behavior are described below. + +The goal of this new `DefaultPlacement` field is to give the installer a way to +affect the general placement of the `IngressController`, regardless of whether +an `IngressController` is created during installation with the `.spec.replicas` +and `.spec.nodePlacement` omitted or whether no `IngressController` was created +during an installation. When no `IngressController` is created during +installation the Cluster Ingress Operator creates a default `IngressController` +- in this case we can go one of two ways - + +- The Cluster Ingress Operator leaves those fields empty in the newly created +`IngressController` + +- The cluster Ingress Operator populates these fields in the newly created +`IngressController` + +Choosing one way or the other is arbitrary and shouldn't matter due to the +immutable nature of this new field - the eventual `Deployment` that gets +created would be identical in any case. I personally prefer the second option +since it makes the fields explicit in the `IngressController` itself and not +just in `Deployments` that get created. The sections below go into detail about the field, its possible values, and the behavior expected from each of them, but in practice following is the proposed @@ -239,21 +263,12 @@ type IngressStatus struct { // nodes will host the ingress router pods by default. The options are // control-plane nodes or worker nodes. // - // This field works by dictating how the Ingress Operator will set the default - // values of future IngressController resources' replicas and nodePlacement - // fields. - // - // The value of replicas is set based on the value of a chosen field in the - // Infrastructure CR. If defaultPlacement is set to ControlPlane, the - // chosen field will be controlPlaneTopology. If it is set to Workers the - // chosen field will be infrastructureTopology. Replicas will then be set to 1 - // or 2 based whether the chosen field's value is SingleReplica or - // HighlyAvailable, respectively. - // - // The value of nodePlacement is adjusted based on defaultPlacement. If - // defaultPlacement is set to ControlPlane the "node-role.kubernetes.io/master" - // label will be added. If defaultPlacement is set to Workers the - // "node-role.kubernetes.io/worker" label will be added. + // This field works by dictating how the Cluster Ingress Operator will + // consider unset replicas and nodePlacement fields in IngressController + // resources when creating the corresponding Deployments. + // + // See the documentation for the IngressController replicas and nodePlacement + // fields for more information. // // When omitted, the default value is Workers // @@ -261,6 +276,8 @@ type IngressStatus struct { // +kubebuilder:default:="Workers" // +optional DefaultPlacement DefaultPlacement `json:"defaultPlacement"` + + // ... existing fields omitted } // DefaultPlacement defines the default placement of ingress router pods. @@ -275,6 +292,57 @@ const ( ) ``` +And the following documentation update to the `openshift/api`'s `operator/v1/types_ingress.go` file: + +```go +type IngressControllerSpec struct { + // ... existing fields omitted + + // replicas is the desired number of ingress controller replicas. If unset, + // the default depends on the value of the defaultPlacement field in the + // cluster config.openshift.io/v1/ingresses status. + // + // The value of replicas is set based on the value of a chosen field in the + // Infrastructure CR. If defaultPlacement is set to ControlPlane, the + // chosen field will be controlPlaneTopology. If it is set to Workers the + // chosen field will be infrastructureTopology. Replicas will then be set to 1 + // or 2 based whether the chosen field's value is SingleReplica or + // HighlyAvailable, respectively. + // + // +optional + Replicas *int32 `json:"replicas,omitempty"` + + // ... existing fields omitted +} + +type NodePlacement struct { + // ... existing fields omitted + + // nodeSelector is the node selector applied to ingress controller + // deployments. + // + // If set, the specified selector is used and replaces the default. + // + // If unset, the default depends on the value of the defaultPlacement + // field in the cluster config.openshift.io/v1/ingresses status. + // + // When defaultPlacement is Workers, the default is: + // + // kubernetes.io/os: linux + // node-role.kubernetes.io/worker: '' + // + // When defaultPlacement is ControlPlane, the default is: + // + // kubernetes.io/os: linux + // node-role.kubernetes.io/master: '' + // + // +optional + NodeSelector *metav1.LabelSelector `json:"nodeSelector,omitempty"` + + // ... existing fields omitted +} +``` + ### Implementation Details/Notes/Constraints This new field will have one of these values - `ControlPlane` or `Workers`. @@ -418,19 +486,19 @@ N/A ### Upgrade / Downgrade Strategy -The new `DefaultPlacement` API field will have a default value of -`Workers`. The behavior of the Ingress Operator when the -`DefaultPlacement` field has the value `Workers` should be identical -to its current behavior (before this enhancement). This means that clusters -that go through an upgrade from a version before this enhancement to a later -version which includes this enhancement will maintain their current behavior, -this enhancement should not change anything for them. +The new `DefaultPlacement` API field will have a default value of `Workers`. +The behavior of the Cluster Ingress Operator when the `DefaultPlacement` field +has the value `Workers` should be identical to its current behavior (before +this enhancement). This means that clusters that go through an upgrade from a +version before this enhancement to a later version which includes this +enhancement will maintain their current behavior, this enhancement should not +change anything for them. -If the value of `DefaultPlacement` is empty (TODO: Could this possibly -happen if the the Ingress Operator reads a resource created according to the +If the value of `DefaultPlacement` is empty (TODO: Could this possibly happen +if the the Cluster Ingress Operator reads a resource created according to the old Ingress CRD that didn't have this value? Not sure how defaults work in this -scenario, in any case, it's not crucial) the Ingress Operator should make sure -to treat it as if it were to have the value `Workers`. +scenario, in any case, it's not crucial) the Cluster Ingress Operator should +make sure to treat it as if it were to have the value `Workers`. ### Version Skew Strategy @@ -452,13 +520,13 @@ placement in practice. Nothing has changed in that regard. #### Failure Modes - The `ControlPlane` value of the `DefaultPlacement` field value may -not be used if the cluster's `ControlPlaneTopology` field is set to -`External`. The Ingress Operator would treat such combination as if the -`DefaultPlacement` value is actually `Workers` instead, and log a warning -to indicate that this invalid combination has been detected. The Ingress +not be used if the cluster's `ControlPlaneTopology` field is set to `External`. +The Cluster Ingress Operator would treat such combination as if the +`DefaultPlacement` value is actually `Workers` instead, and log a warning to +indicate that this invalid combination has been detected. The Cluster Ingress Operator should not fail to reconcile `IngressController` CRs due to this -invalid combination. This should not happen under any circumstance since -the installer will never set this combination, but the Ingress Operator +invalid combination. This should not happen under any circumstance since the +installer will never set this combination, but the Cluster Ingress Operator should still handle it gracefully regardless. #### Support Procedures From 2075061b12c25da1146562b75b4278dbf15ce030 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 14:17:35 +0100 Subject: [PATCH 24/34] Small remark about admins/support engineers --- .../single-node-openshift-with-workers.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index a33683c4db..b6051f5467 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -232,11 +232,12 @@ adjust how it behaves in the absence of these parameters. The exact details of this behavior are described below. The goal of this new `DefaultPlacement` field is to give the installer a way to -affect the general placement of the `IngressController`, regardless of whether -an `IngressController` is created during installation with the `.spec.replicas` -and `.spec.nodePlacement` omitted or whether no `IngressController` was created -during an installation. When no `IngressController` is created during -installation the Cluster Ingress Operator creates a default `IngressController` +affect the general placement of the `IngressController`. This is regardless of +whether an `IngressController` is created during installation with the +`.spec.replicas` and `.spec.nodePlacement` omitted or whether no +`IngressController` was created during an installation at all. Today, when no +`IngressController` is created during installation, the Cluster Ingress Operator +creates a default `IngressController` - in this case we can go one of two ways - - The Cluster Ingress Operator leaves those fields empty in the newly created @@ -515,7 +516,10 @@ than there already was. - For administrators and support engineers, the `IngressController` is still the source of truth and where you need to look if you seek to understand the router -placement in practice. Nothing has changed in that regard. +placement in practice. However, now the `replicas` and `nodePlacement` on +`IngressController` may be empty, so administrators and support engineers +can refer to the corresponding Deployment that got created if they wish to +know the practical values. #### Failure Modes From d6f14cb4879b8cabe1beb50d2afc6b1e3926fb9d Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 14:19:12 +0100 Subject: [PATCH 25/34] Formatting --- enhancements/single-node/single-node-openshift-with-workers.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index b6051f5467..11f9714151 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -237,8 +237,7 @@ whether an `IngressController` is created during installation with the `.spec.replicas` and `.spec.nodePlacement` omitted or whether no `IngressController` was created during an installation at all. Today, when no `IngressController` is created during installation, the Cluster Ingress Operator -creates a default `IngressController` -- in this case we can go one of two ways - +creates a default `IngressController`, in this case we can go one of two ways - - The Cluster Ingress Operator leaves those fields empty in the newly created `IngressController` From 0af91396162d296524841da8655ad909b2210c71 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 14:23:58 +0100 Subject: [PATCH 26/34] Phrasing, removed redundancy --- .../single-node/single-node-openshift-with-workers.md | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 11f9714151..3239e06cc6 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -349,16 +349,9 @@ This new field will have one of these values - `ControlPlane` or `Workers`. There are no current plans for any more values, but more may be added in the future. -The value of the `DefaultPlacement` field will affect the defaulting -behavior of `IngressController`'s `.spec.replicas` and `.spec.nodePlacement` -fields. In the absence of an `IngressController` resource created by the -user/installer, or when the user/installer creates an `IngressController` with -these fields omitted, the Cluster Ingress Operator will choose the default -values for those fields based on the value of `DefaultPlacement`. - When the value of `DefaultPlacement` is `Workers`, the defaulting behavior of `.spec.replicas` and `.spec.nodePlacement` will be the same as it -is today: `.spec.replicas` will be chosen according to the value of +is today: `.spec.replicas` will be treated according to the value of `InfrastructureTopology`, namely `1` when `SingleReplica` or `2` when `HighlyAvailable`. `.spec.nodePlacement` will always just be: @@ -371,7 +364,7 @@ nodePlacement: ``` However, if the value of `DefaultPlacement` is `ControlPlane`, the -defaulting behavior will be different: `.spec.replicas` will be chosen instead +defaulting behavior will be different: `.spec.replicas` will be treated instead according to the value of `ControlPlaneTopology`; again, `1` when `SingleReplica` or `2` when `HighlyAvailable`. `.spec.nodePlacement` will be always just be: From bc8b0d208c0636e39b585b7f6e89612338d43ceb Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Thu, 24 Mar 2022 16:47:56 +0100 Subject: [PATCH 27/34] Indentation --- .../single-node-openshift-with-workers.md | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 3239e06cc6..b1a0d7e026 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -257,18 +257,18 @@ addition to `openshift/api`'s `config/v1/types_ingress.go` file: ```go type IngressStatus struct { - // ... existing fields omitted + // ... existing fields omitted // defaultPlacement is set at installation time to control which // nodes will host the ingress router pods by default. The options are // control-plane nodes or worker nodes. // // This field works by dictating how the Cluster Ingress Operator will - // consider unset replicas and nodePlacement fields in IngressController - // resources when creating the corresponding Deployments. - // - // See the documentation for the IngressController replicas and nodePlacement - // fields for more information. + // consider unset replicas and nodePlacement fields in IngressController + // resources when creating the corresponding Deployments. + // + // See the documentation for the IngressController replicas and nodePlacement + // fields for more information. // // When omitted, the default value is Workers // @@ -277,7 +277,7 @@ type IngressStatus struct { // +optional DefaultPlacement DefaultPlacement `json:"defaultPlacement"` - // ... existing fields omitted + // ... existing fields omitted } // DefaultPlacement defines the default placement of ingress router pods. @@ -296,7 +296,7 @@ And the following documentation update to the `openshift/api`'s `operator/v1/typ ```go type IngressControllerSpec struct { - // ... existing fields omitted + // ... existing fields omitted // replicas is the desired number of ingress controller replicas. If unset, // the default depends on the value of the defaultPlacement field in the @@ -312,11 +312,11 @@ type IngressControllerSpec struct { // +optional Replicas *int32 `json:"replicas,omitempty"` - // ... existing fields omitted + // ... existing fields omitted } type NodePlacement struct { - // ... existing fields omitted + // ... existing fields omitted // nodeSelector is the node selector applied to ingress controller // deployments. @@ -339,7 +339,7 @@ type NodePlacement struct { // +optional NodeSelector *metav1.LabelSelector `json:"nodeSelector,omitempty"` - // ... existing fields omitted + // ... existing fields omitted } ``` From 584f9dfeeeea3972a889fb4f33ac6404e87aa57c Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Fri, 25 Mar 2022 11:35:41 +0100 Subject: [PATCH 28/34] Indentation --- .../single-node/single-node-openshift-with-workers.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index b1a0d7e026..6b3a7471b0 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -330,11 +330,11 @@ type NodePlacement struct { // // kubernetes.io/os: linux // node-role.kubernetes.io/worker: '' - // - // When defaultPlacement is ControlPlane, the default is: - // - // kubernetes.io/os: linux - // node-role.kubernetes.io/master: '' + // + // When defaultPlacement is ControlPlane, the default is: + // + // kubernetes.io/os: linux + // node-role.kubernetes.io/master: '' // // +optional NodeSelector *metav1.LabelSelector `json:"nodeSelector,omitempty"` From 2b5392f977cc13630511f8fb983e59169068f1e0 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Fri, 25 Mar 2022 11:37:26 +0100 Subject: [PATCH 29/34] Indentation --- .../single-node/single-node-openshift-with-workers.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 6b3a7471b0..b875492386 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -324,9 +324,9 @@ type NodePlacement struct { // If set, the specified selector is used and replaces the default. // // If unset, the default depends on the value of the defaultPlacement - // field in the cluster config.openshift.io/v1/ingresses status. - // - // When defaultPlacement is Workers, the default is: + // field in the cluster config.openshift.io/v1/ingresses status. + // + // When defaultPlacement is Workers, the default is: // // kubernetes.io/os: linux // node-role.kubernetes.io/worker: '' From bb8b7004b55ac5043c8e30f8336d2c37d57ad338 Mon Sep 17 00:00:00 2001 From: Doug Hellmann Date: Mon, 28 Mar 2022 09:23:20 -0400 Subject: [PATCH 30/34] update reviewer and approver metadata --- .../single-node/single-node-openshift-with-workers.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index b875492386..e11996da23 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -7,15 +7,16 @@ reviewers: - "@eranco74" - "@tsorya" - "@dhellmann" - - "@Miciah" + - "@Miciah" - edge networking - "@bparees" - - "@JoelSpeed" - - "@staebler" + - "@JoelSpeed" - API + - "@staebler" - installer - "@derekwaynecarr" + - "@cgwalters" - MCO approvers: - - TBD + - "@dhellmann" api-approvers: # in case of new or modified APIs or API extensions (CRDs, aggregated apiservers, webhooks, finalizers) - - TBD + - "@JoelSpeed" creation-date: 2022-01-06 last-updated: 2022-02-12 tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement From 184dff87266e9d9cef7c6348565fa0e0569ea1c4 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Mar 2022 15:43:20 +0200 Subject: [PATCH 31/34] Improve tests section, remove TODO --- .../single-node-openshift-with-workers.md | 31 +++++++++++++------ 1 file changed, 22 insertions(+), 9 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index e11996da23..4e6ce3a8b4 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -438,8 +438,21 @@ None that I can think of at the moment ### Test Plan +#### General tests + - Add unit tests to the Cluster Ingress Operator to make sure the -IngressController resource is generated as exepected. +IngressController resource and corresponding Deployment is generated as +exepected. + +- Add tests on both cloud / "none"-platform that check that a single-node cluster +with additional workers recovers after the single control-plane node reboots by +running conformance tests post-reboot. + +- Add tests on both cloud / "none"-platform that check that a single-node cluster +with additional workers recovers after an upgrade by running conformance tests +post-upgrade. + +#### Day-2 workers tests - Add periodic nightly tests which install single-node in the cloud, add a few worker nodes to it, then run conformance tests to make sure we don't run into @@ -448,17 +461,17 @@ any problems not described in this enhancement. - Add periodic nightly tests which install a single-node "none"-platform cluster, add worker nodes to it, and check that ingress traffic still works as expected and recovers even after the `router-default` pod gets deleted and rescheduled. -Make sure this is still true even after upgrades. -- Add tests on both cloud / "none"-platform that check that a single-node cluster -with additional workers recovers after the single control-plane node reboots by -running conformance tests post-reboot. +#### Day-1 workers tests -- Add tests on both cloud / "none"-platform that check that a single-node cluster -with additional workers recovers after an upgrade by running conformance tests -post-upgrade. +- Add periodic nightly tests which install a single control-plane node cluster +with a few additional workers in the cloud, then run conformance tests to make +sure we don't run into any problems not described in this enhancement. -TODO: Describe day-1-workers tests? +- Add periodic nightly tests which install a single control-plane "none"-platform +cluster with additional workers and check that ingress traffic still works as +expected and recovers even after the `router-default` pod gets deleted and +rescheduled. ### Graduation Criteria From 3b9ca8348334f256f639d93191f5e230012dc86f Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Mar 2022 15:43:36 +0200 Subject: [PATCH 32/34] Remove TODO - this value can be unset, and ingress operator should handle it --- .../single-node/single-node-openshift-with-workers.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 4e6ce3a8b4..09d9ea9a58 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -501,10 +501,7 @@ version before this enhancement to a later version which includes this enhancement will maintain their current behavior, this enhancement should not change anything for them. -If the value of `DefaultPlacement` is empty (TODO: Could this possibly happen -if the the Cluster Ingress Operator reads a resource created according to the -old Ingress CRD that didn't have this value? Not sure how defaults work in this -scenario, in any case, it's not crucial) the Cluster Ingress Operator should +If the value of `DefaultPlacement` is unset the Cluster Ingress Operator should make sure to treat it as if it were to have the value `Workers`. ### Version Skew Strategy From 27baec252606f21243ca6ea865d46b416c09f517 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Mar 2022 15:57:43 +0200 Subject: [PATCH 33/34] Add alternative for direct values set by the installer --- .../single-node-openshift-with-workers.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index 09d9ea9a58..d9ff96da5a 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -560,6 +560,18 @@ users. ## Alternatives +- Instead of adding this new `DefaultPlacement` field to the Ingress config CR, +add two new status fields to the Ingress config CR, one for the `replicas` +value recommended by the installation process and one for the `nodePlacement` +value recommended by the installation process. This way instead of the +installation process deciding on a generalized `defaultPlacement` with values +of `Workers` and `ControlPlane` it would instead give more specific +instructions to the Cluster Ingress Operator of how it thinks those unset +parameters should be defaulted. Due to the complexity of the nodePlacement +parameter, having the installation process dictate its value directly requires +the installation process to be aware of things like the [Linux label](https://github.com/openshift/cluster-ingress-operator/blob/a6c16b7eefe6d7045f3109cd265c4e891f5791ef/pkg/operator/controller/ingress/deployment.go#L569) +that the Ingress Operator would otherwise set by default on Deployments. + - Even when users need to add just one extra worker, require them to add yet another worker so they could just form a compact 3-node cluster where all nodes are both workers and control-plane nodes. This kind of topology is already From 9ef9aac54541c72b764019dd756b7c7f725f6980 Mon Sep 17 00:00:00 2001 From: Omer Tuchfeld Date: Mon, 28 Mar 2022 16:00:17 +0200 Subject: [PATCH 34/34] Add a paragraph about populating the fields in the IngressControllers themselves --- .../single-node/single-node-openshift-with-workers.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/enhancements/single-node/single-node-openshift-with-workers.md b/enhancements/single-node/single-node-openshift-with-workers.md index d9ff96da5a..9e7b862b42 100644 --- a/enhancements/single-node/single-node-openshift-with-workers.md +++ b/enhancements/single-node/single-node-openshift-with-workers.md @@ -517,12 +517,15 @@ Ingress config CR, it is clear that users should not modify this field. CR, it does not add any new capabilities to OCP, or give any more flexibility than there already was. -- For administrators and support engineers, the `IngressController` is still the +For administrators and support engineers, the `IngressController` is still the source of truth and where you need to look if you seek to understand the router placement in practice. However, now the `replicas` and `nodePlacement` on -`IngressController` may be empty, so administrators and support engineers -can refer to the corresponding Deployment that got created if they wish to -know the practical values. +`IngressController` may be empty, so administrators and support engineers can +refer to the corresponding Deployment that got created if they wish to know the +practical values. This could potentially be improved if the Cluster Ingress +Operator would always strive to reconcile these defaults to also appear in the +`IngressController` CRs themselves. The decision is left for the Cluster Ingress +Operator's implementation. #### Failure Modes