-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
single-node production deployments #504
Conversation
This enhancement describes a new single-node cluster profile for production use in "edge" deployments that are not considered to be resource-constrained, such as telecommunications bare metal environments. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@hexfusion FYI |
2. Similarly, telco workloads typically require special network setups | ||
for a host to boot, including bonded interfaces, access to multiple | ||
VLANs, and static IPs. How do we anticipate configuring those? | ||
3. The machine-config-operator works by (almost always) rebooting a host. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. MCO is absolutely necessary as it is used by PAO (optional OLM operator) and NTO to apply the computed host OS and kernel tuning values. It is also used to allocate hugepages for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, you either need to describe a way to pre-deploy an image including optional OLM operators and reboots or allow day-2 operations that do the same. Not all Telco deployments are the same (RT vs. non-RT, different networking, different CNFs, different hugepages, different NUMA topology...).
|
||
### Open Questions | ||
|
||
1. Telco workloads frequently require a realtime kernel. How will a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all telco workloads require RT kernel. Today the user uses the second day procedure that involves either MachineConfig or the Performance Addon Operator (that does that via MCO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Using RT kernel works "day 1" too)
1. Telco workloads frequently require a realtime kernel. How will a | ||
user specify whether to use the realtime or regular kernel? Should | ||
we assume they always want the realtime version? | ||
2. Similarly, telco workloads typically require special network setups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OLM needs to be enabled as the sriov operator is installed that way.
[without waiting for 3 master nodes](https://github.com/openshift/cluster-etcd-operator/blob/98590e6ecfe282735c4eff01432ae40b29f81202/pkg/etcdenvvar/etcd_env.go#L72)) | ||
|
||
In addition, some components are not relevant for this cluster | ||
profile (e.g. console, cluster-autoscaler, marketplace?) and shouldn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
marketplace/OLM is necessary. Both Performance Addon Operator, sriov and ptp operators are deployed that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 OLM is necessary for Telco mobile network use cases.
In addition, not relevant:
- The console is optional
- The ingressVIP and apiVIP are not relevant in this type of cluster
Other known gaps:
- Metrics service should provide the ability to be exported to an external Kafka or Prometheus instance.
- No local logging necessary but centralized logging is expected (e.g. exporting logs to centralized Kafka bus or ElasticSearch cluster). It is okay to maintain a short term buffer (e.g. past few hours) for logging in case of disconnection from the external logging target so relevant logs are available after disconnection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need an installation that understands OLM's API surface, that does not necessarily mean you require OLM running actively on the machine at all times. I could see an alternative where OLM's controllers are run in a one-shot install / upgrade mode as opposed to constantly running. It may come down to where resource utilization needs to be cut.
If upgrades initially will happen by reimaging the machine, then is there a reason to have OLM continuously running and checking its catalog for updates? I think this depends on whether there are likely to be updates to the optional components independent of reimaging this single node cluster. The other purpose OLM serves if it is running is to provide general metrics about the optional components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I can accept that. We need a way to install the operators, but there does not have to be a constant updates checking loop.
/cc @crawford |
|
||
That end-to-end job should also be run against pull requests for | ||
the operators and other components that are most affected by the new | ||
profile, such as the etcd and auth operators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- MCO + NTO (support for tuning and reboots) and sriov, ptp (OLM support)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should make it clear here that "operators" will include both core payload and optional ones.
- `cluster-etcd-operator` will not deploy the etcd cluster without minimum of 3 master nodes (can be changed by enabling `useUnsupportedUnsafeNonHANonProductionUnstableEtcd`) | ||
- Even with the unsupported feature flag, `etcd-quorum-guard` still requires 3 nodes due to its replica count. | ||
- `cluster-authentication-operator` will not deploy `OAuthServer` without minimum of 3 master nodes (can be change by enabling `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer`) | ||
- `cluster-ingress-operator` deploys the router with 2 replicas. On a single node one will fail to start and the ingress will show as degraded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need ingress in RAN use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auth currently depends on functional ingress.
CI coverage informing the release. An end-to-end job using the profile | ||
and running an appropriate subset of the standard OpenShift tests | ||
will be created and configured to block accepting release images | ||
unless it passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also expect an integration CI job that runs with the optional operators we expect will target this deployment topology. Launching against this environment will need to be a supported target topology in our common test infra for operators.
Additionally I would expect the ability to launch one of these single node deployments from cluster-bot
not be deployed by this profile. | ||
|
||
The profile describes single-node, all-in-one, deployments, so there | ||
is no need to support provisioning additional workers. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Telco requirement would be to add worker nodes to an all-in-one deployment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're specifically calling that out as something we would not do with these types of clusters. What sort of use case would call for a single-node control plane with separate worker(s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe if there was need for additional compute resources at a far edge location, the telco would be more likely to deploy a second independent single-node cluster rather than adding an additional worker to the existing single-node cluster so that there uniformity in implementation across their network.
enhancement. | ||
* This enhancement does not address high-availability for single-node | ||
deployments. | ||
* This enhancement does not address in-place upgrades for this first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Telco requirement will be to provide a mechanism to upgrade the machine without re-provisioning the infrastructure and applications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not match the requirements we've been given so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for browsell - Bandwidth is often constrained in the far edge telco use cases, and needing to re-provision, and re-pull all of the packages is undesirable. If you could re-provision from a local cache you may resolve this issue, however there is also a desire to minimize downtime which would be decreased if you can remove the need to re-provision.
/assign |
configured when deployed and used in the `single-node-production-edge` | ||
deployments. | ||
|
||
Although the environment is assumed to have significant resources, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a Telco point of view, optimising CPU usage of the infrastructure is the most important.
Although the environment is assumed to have significant resources, it | ||
is important to dedicate most of them to end-user workloads, rather | ||
than cluster control plane or monitoring. Therefore, the cluster | ||
profile will configure telemetry and logging to forward data, instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to consider the case where the single node is disconnected from the centralised collection, need to buffer and forward when connection is re-established.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty skeptical of this. The entire point of the prometheus stack is to buffer collection. If you're inventing a new path for this that has to be recreated for the large variety of system data we collect, that's a red flag to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clearer, I'm not convinced the general single edge production node profile has the resource constraints AND the store and forward requirement described here (I can believe telecom edge does, just not all production edge). Can you make a stronger case for this being a general statement for all production edge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note this because monitoring is a fundamental component of OpenShift. It provides the loop by which insight into production performance is measured. Up until this proposal monitoring has been required to exist, and we assume that monitoring is a fundamental part of the platform, with component being self monitored and self managed. Saying "we will do this someplace else" removes the closed loop within OpenShift, limits how good operators can be at self observation, duplicates a large amount of configuration, and except in very trivial integration scenarios (i just want app metrics, or just want a small subset of node metrics) is going to duplicate a lot of work.
Cluster monitoring talks to at least 6 on node components (node exporter, kubelet, networking, machine config daemon, dns, and things like SRO). Which of those contain data that central monitoring needs? I would wager most. Which of the 23 core components must be stored and forwarded for alerting of platform health? I would wager most. So if we end up duplicating outside the platform a significant fraction of the scope what the platform already gathers, then we're wasting engineering effort that would be better spent on efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying here for context: we've discussed with monitoring before this is an almost infinitely tunable component in terms of CPU, memory, and disk space, and I'd expect to see justifications that tuning is insufficient before we remove it:
- going from two replicas to one is half CPU and memory
- cutting retention in half reduces disk in half (and I believe memory, but that may not be true anymore)
- doubling the scrape interval reduces CPU in half
- cutting out half of the metric series scraped (of which 55% of the default cluster metrics are control plane and likely 25% are just bugs we can trivially fix as excessive cardinality) should halve memory and cpu and disk.
And I'm sure there are some more. Doing the three CPU ones of those might result in 1/8th the CPU and memory OOTB. I would be completely supportive of "efficient single node monitoring" (since that benefits everyone), but I'm generally not supportive of "monitoring off". A single prometheus should be able to run at about 0.1 or 0.2 core, potentially even before some of these tuning. Have we collected the data and the tuning before we jump to "turn off"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreeing what @smarterclayton has said so far, and cc'ing @simonpasquier. We have also discussed the above options already with @dhellmann and have an investigational spike on what we can actually achieve.
I think you need to be very careful about how you define 'resource-constrained' because even telco bare metal environments will have resource constraints based not only on compute resources, but more importantly on power, space and cooling which are often leading to the single-node cluster use case in the first place. |
Sure. The point I was trying to make is we're not talking about a single-board system like a raspberrypi. |
|
||
### Test Plan | ||
|
||
In order to claim full support for this configuration, we must have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, we only support fault tolerant/HA configurations of OpenShift, and there is considerable control plane operator logic that assumes this requirement. I think it might be worth mentioning that making fault-tolerance optional is likely to require not just additional testing for the newly supported cluster configurations but also that additional testing effort will be necessary to ensure that support for existing fault tolerant configurations does not regress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm...as of right now I think that the "e2e-agnostic" test is required across all core repos, and that's not going to change from our "default profile".
It would make sense to add a new /test e2e-single-node
of course, and that'd be a periodic. Some repos might opt into running that always on PRs, or just on demand.
single-node configurations for production use in environments with | ||
"reasonably significant" memory, storage, and compute resources. | ||
* Clusters built using the `single-node-production-edge` profile | ||
should pass most Kubernetes and OpenShift conformance end-to-end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a lot of the OpenShift e2es assume things like a registry working I think. Would we really be leaving all of that unchanged, or does this enhancement call for removing components (and teaching the test suite to handle that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to do a lot of that analysis. The goal is to be as close as possible to the default deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some discussion about the topic in #482 I'm guessing the goal for both of these will be identical, since in both cases you're running a limited set of functionality, iow. only core elements w/o addons.
worker node. | ||
* Many operators will be configured to reduce the footprint of their | ||
operands, such as by running fewer replicas. | ||
* In-place upgrades will not be supported by the first iteration of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deploying a computer without a means to update it is irresponsible. Some people will wave their hands and say it only processes trusted input, is disconnected from the Internet etc etc. I still think it's irresponsible.
Or I guess this does say "in place" so perhaps for this use case (a bit like Code Ready) the idea is any important state is stored outside of the node, so reprovisioning it is a viable path for upgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Edit sorry, I see there's more about this below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the general idea. These sorts of deployments are expected to be part of a larger system that includes an orchestration tool outside of the cluster for managing 100s or 1000s of individual instances. Assuming a wipe-and-rebuild approach can be implemented within the other constraints like the length of change windows, it seemed reasonable to go that route instead of trying to make in-place work. On the other hand, if in-place isn't a big deal, maybe we don't need to make that assumption. So, it's definitely still up for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noting there is a consideration for remote low bandwidth clusters in which we'd want to update only the delta vs. the entire image, which may be more feasible for an in-place upgrade vs. a full re-image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the low bandwidth is not part of the enhancement. Here we should agree how single node looks like. We will deal with farther improvements (bandwidth, bootstrap removal) as part of other enhancements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regrading the delta - I assume that we dont recreate all the containers images every release - so it should be solved by docker. Unless I'm wrong.
Regarding rchos - I have no clue - is there such functionality in place? @cgwalters ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason ostree is popular is quite a while ago we implemented a pretty good delta mechanism. We aren't using it for RHCOS but "base" FCOS does use it.
https://ostreedev.github.io/ostree/formats/#static-deltas
There'd be some work to enable this but not really hard.
One even more radical approach here would be to commit all of the container images into ostree as well - lifecycle bind everything into a single transactional update. Basically put the containers in e.g. /usr/share/containers/<sha256>
and teach crio how to use the equivalent of podman --root=/usr/share/containers/<sha256>
if it exists. Then we get ostree deltas for everything.
(But the "lifecycle binding" is also important - this way we have the old OS with old containers, or new OS with new containers, and no possibility of skew)
This is what some ostree users are doing today, although at least one I talked to was using systemd-nspawn
- but same principle.
There's also https://blogs.gnome.org/alexl/2020/05/13/putting-container-updates-on-a-diet/ which is a clearer long term across-the-board win; not sure what the status of it is though.
|
||
### Open Questions | ||
|
||
1. Telco workloads frequently require a realtime kernel. How will a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Using RT kernel works "day 1" too)
|
||
### Test Plan | ||
|
||
In order to claim full support for this configuration, we must have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm...as of right now I think that the "e2e-agnostic" test is required across all core repos, and that's not going to change from our "default profile".
It would make sense to add a new /test e2e-single-node
of course, and that'd be a periodic. Some repos might opt into running that always on PRs, or just on demand.
when those worker nodes lose communication with the cluster control | ||
plane. The most significant problem is that if a node reboots while | ||
it has lost communication with the control plane, it does not | ||
restart any pods it was previously running until communication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...this seems like it shouldn't be too hard to fix. Just start daemonsets by default for example, and add an annotation to pods to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Although one problem right now is that crio currently wipes all containers across minor version upgrades...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bigger problem is with user workloads, especially any that also need to talk to the API to learn what to do (fetch a ConfigMap, etc.). It's likely possible to make all of that work, but at some point we would be working around kubernetes instead of taking advantage of it, and the application deployment for the remote workers would be different from centralized sites which means we don't meet the goal of having application management be as uniform as possible.
There's more background about why this approach was rejected in some internal documents, which we're working on publishing more widely as more people get involved in examining these requirements.
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
Why has so many PRs been opened for this post-feature freeze when we're a) in feature freeze and this is a new feature and b) this enhancement isn't fully approved with open discussion? |
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
@kikisdeliveryservice FF was postponed by a week. The profile will be there, the question is what components will it include. So the prs are relevant |
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
AFAIK, about FF there was an email that went out about the build waiting being delayed not that FF was extended. Can you please send me (via email or slack) official docs saying it has changed? That being said, my second issue still stands - this enhancements isn't fully approved & has open comments but you're beginning to implement which seems counter to the point of having an enhancement? |
There is significant time pressure to prepare a proof-of-concept implementation of this work. The team is using the PRs along with tools like clusterbot to prepare a release image for that PoC. @romfreiman, I think it would be reasonable to tag all of the PRs as |
oc patch -n openshift-ingress-operator ingresscontroller/default --type=merge --patch '{"spec":{"replicas": 1}}' | ||
``` | ||
|
||
#### machine-config-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we expecting any day2 configuration change using MCO on the single-node cluster in its lifetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep. Applying rt kernel, or other telco related configurations (performance tunings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to sinny's question as this is super impt.. there's an open question (# 3 below) that mentions this but it feels like this needs a lot more consideration & consensus followed by memorialising that in the enhancement here (and probably updating the risks section as well) as opposed to just leaving it as a tbd question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying any configuration (like rt kernel, kargs, files, etc) during install time shouldn't have much risk on single node. For any day2 changes MCO is going to drain all running workload and then reboot(few exception). For a production level node, we need to discuss where running workload will be scheduled when drain occurs? If something goes wrong in between, there is no way to get must-gather to perform troubleshooting. I am sure there will be more issues which may come into picture as MCO has been designed keeping in mind that it will run on multi node cluster (with minimum 3 compute and control plane).
- group: apps/v1 | ||
kind: Deployment | ||
name: etcd-quorum-guard | ||
namespace: openshift-machine-config-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
etcd-quorum-guard doesn't live in the mco namespace since july: openshift/machine-config-operator#1928
Should this be openshift-etcd?
cc: @hexfusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes correct
|
||
|
||
|
||
### Risks and Mitigations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given how much of a paradigm shift this is, i'd expect to see the risks articulated here...
the operators and other components that are most affected by the new | ||
profile, such as the etcd and auth operators. | ||
|
||
### Graduation Criteria |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this requires a bunch of non-trivial work and coordination would like to see what the rollout plans for this look like..
for a host to boot, including bonded interfaces, access to multiple | ||
VLANs, and static IPs. How do we anticipate configuring those? | ||
3. The machine-config-operator works by (almost always) rebooting a | ||
host. Is that going to be OK in these single-node deployments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this merits a lot more discussion
- "@romfreiman" | ||
- "@markmc" | ||
reviewers: | ||
- TBD, probably all leads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get this updated to make sure that affected operators/groups get to review this
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters. In stage one (following openshift/enhancements#504) there should be no implication on components logic. In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one. This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation. For more info, please refer to the enhancement link and participate in the discussion.
this matches openshift/enhancements#504 and doesn't change existing behavior
From a monitoring standpoint, a single-node cluster profile wouldn't be sufficient to tune the components. Only a small fraction of the monitoring resources is deployed through CVO (e.g. only namespace + CMO itself). The vast majority of the deployment is encoded into CMO and it can be partially customized via a configmap. To support the single node topology, we would have to expose a |
It looks like we can do this without a profile. See #560 for details of the alternative approach. |
This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.