Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single-node production deployments #504

Closed
wants to merge 12 commits into from

Conversation

dhellmann
Copy link
Contributor

This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.

This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhellmann

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2020
@dhellmann
Copy link
Contributor Author

/cc @smarterclayton @derekwaynecarr

@dhellmann
Copy link
Contributor Author

@deads2k & @csrwng, based on other proposals, you may both be interested in this.

@romfreiman
Copy link

@hexfusion FYI

2. Similarly, telco workloads typically require special network setups
for a host to boot, including bonded interfaces, access to multiple
VLANs, and static IPs. How do we anticipate configuring those?
3. The machine-config-operator works by (almost always) rebooting a host.
Copy link
Contributor

@MarSik MarSik Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. MCO is absolutely necessary as it is used by PAO (optional OLM operator) and NTO to apply the computed host OS and kernel tuning values. It is also used to allocate hugepages for example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, you either need to describe a way to pre-deploy an image including optional OLM operators and reboots or allow day-2 operations that do the same. Not all Telco deployments are the same (RT vs. non-RT, different networking, different CNFs, different hugepages, different NUMA topology...).


### Open Questions

1. Telco workloads frequently require a realtime kernel. How will a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all telco workloads require RT kernel. Today the user uses the second day procedure that involves either MachineConfig or the Performance Addon Operator (that does that via MCO).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Using RT kernel works "day 1" too)

1. Telco workloads frequently require a realtime kernel. How will a
user specify whether to use the realtime or regular kernel? Should
we assume they always want the realtime version?
2. Similarly, telco workloads typically require special network setups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLM needs to be enabled as the sriov operator is installed that way.

[without waiting for 3 master nodes](https://github.com/openshift/cluster-etcd-operator/blob/98590e6ecfe282735c4eff01432ae40b29f81202/pkg/etcdenvvar/etcd_env.go#L72))

In addition, some components are not relevant for this cluster
profile (e.g. console, cluster-autoscaler, marketplace?) and shouldn't
Copy link
Contributor

@MarSik MarSik Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marketplace/OLM is necessary. Both Performance Addon Operator, sriov and ptp operators are deployed that way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 OLM is necessary for Telco mobile network use cases.

In addition, not relevant:

  • The console is optional
  • The ingressVIP and apiVIP are not relevant in this type of cluster

Other known gaps:

  • Metrics service should provide the ability to be exported to an external Kafka or Prometheus instance.
  • No local logging necessary but centralized logging is expected (e.g. exporting logs to centralized Kafka bus or ElasticSearch cluster). It is okay to maintain a short term buffer (e.g. past few hours) for logging in case of disconnection from the external logging target so relevant logs are available after disconnection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need an installation that understands OLM's API surface, that does not necessarily mean you require OLM running actively on the machine at all times. I could see an alternative where OLM's controllers are run in a one-shot install / upgrade mode as opposed to constantly running. It may come down to where resource utilization needs to be cut.

If upgrades initially will happen by reimaging the machine, then is there a reason to have OLM continuously running and checking its catalog for updates? I think this depends on whether there are likely to be updates to the optional components independent of reimaging this single node cluster. The other purpose OLM serves if it is running is to provide general metrics about the optional components.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I can accept that. We need a way to install the operators, but there does not have to be a constant updates checking loop.

@dhellmann
Copy link
Contributor Author

/cc @crawford


That end-to-end job should also be run against pull requests for
the operators and other components that are most affected by the new
profile, such as the etcd and auth operators.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • MCO + NTO (support for tuning and reboots) and sriov, ptp (OLM support)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should make it clear here that "operators" will include both core payload and optional ones.

- `cluster-etcd-operator` will not deploy the etcd cluster without minimum of 3 master nodes (can be changed by enabling `useUnsupportedUnsafeNonHANonProductionUnstableEtcd`)
- Even with the unsupported feature flag, `etcd-quorum-guard` still requires 3 nodes due to its replica count.
- `cluster-authentication-operator` will not deploy `OAuthServer` without minimum of 3 master nodes (can be change by enabling `useUnsupportedUnsafeNonHANonProductionUnstableOAuthServer`)
- `cluster-ingress-operator` deploys the router with 2 replicas. On a single node one will fail to start and the ingress will show as degraded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need ingress in RAN use case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auth currently depends on functional ingress.

CI coverage informing the release. An end-to-end job using the profile
and running an appropriate subset of the standard OpenShift tests
will be created and configured to block accepting release images
unless it passes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also expect an integration CI job that runs with the optional operators we expect will target this deployment topology. Launching against this environment will need to be a supported target topology in our common test infra for operators.

Additionally I would expect the ability to launch one of these single node deployments from cluster-bot

not be deployed by this profile.

The profile describes single-node, all-in-one, deployments, so there
is no need to support provisioning additional workers. The

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Telco requirement would be to add worker nodes to an all-in-one deployment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're specifically calling that out as something we would not do with these types of clusters. What sort of use case would call for a single-node control plane with separate worker(s)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if there was need for additional compute resources at a far edge location, the telco would be more likely to deploy a second independent single-node cluster rather than adding an additional worker to the existing single-node cluster so that there uniformity in implementation across their network.

enhancement.
* This enhancement does not address high-availability for single-node
deployments.
* This enhancement does not address in-place upgrades for this first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Telco requirement will be to provide a mechanism to upgrade the machine without re-provisioning the infrastructure and applications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does not match the requirements we've been given so far.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for browsell - Bandwidth is often constrained in the far edge telco use cases, and needing to re-provision, and re-pull all of the packages is undesirable. If you could re-provision from a local cache you may resolve this issue, however there is also a desire to minimize downtime which would be decreased if you can remove the need to re-provision.

@derekwaynecarr
Copy link
Member

/assign

configured when deployed and used in the `single-node-production-edge`
deployments.

Although the environment is assumed to have significant resources, it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a Telco point of view, optimising CPU usage of the infrastructure is the most important.

Although the environment is assumed to have significant resources, it
is important to dedicate most of them to end-user workloads, rather
than cluster control plane or monitoring. Therefore, the cluster
profile will configure telemetry and logging to forward data, instead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to consider the case where the single node is disconnected from the centralised collection, need to buffer and forward when connection is re-established.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty skeptical of this. The entire point of the prometheus stack is to buffer collection. If you're inventing a new path for this that has to be recreated for the large variety of system data we collect, that's a red flag to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clearer, I'm not convinced the general single edge production node profile has the resource constraints AND the store and forward requirement described here (I can believe telecom edge does, just not all production edge). Can you make a stronger case for this being a general statement for all production edge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note this because monitoring is a fundamental component of OpenShift. It provides the loop by which insight into production performance is measured. Up until this proposal monitoring has been required to exist, and we assume that monitoring is a fundamental part of the platform, with component being self monitored and self managed. Saying "we will do this someplace else" removes the closed loop within OpenShift, limits how good operators can be at self observation, duplicates a large amount of configuration, and except in very trivial integration scenarios (i just want app metrics, or just want a small subset of node metrics) is going to duplicate a lot of work.

Cluster monitoring talks to at least 6 on node components (node exporter, kubelet, networking, machine config daemon, dns, and things like SRO). Which of those contain data that central monitoring needs? I would wager most. Which of the 23 core components must be stored and forwarded for alerting of platform health? I would wager most. So if we end up duplicating outside the platform a significant fraction of the scope what the platform already gathers, then we're wasting engineering effort that would be better spent on efficiency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying here for context: we've discussed with monitoring before this is an almost infinitely tunable component in terms of CPU, memory, and disk space, and I'd expect to see justifications that tuning is insufficient before we remove it:

  • going from two replicas to one is half CPU and memory
  • cutting retention in half reduces disk in half (and I believe memory, but that may not be true anymore)
  • doubling the scrape interval reduces CPU in half
  • cutting out half of the metric series scraped (of which 55% of the default cluster metrics are control plane and likely 25% are just bugs we can trivially fix as excessive cardinality) should halve memory and cpu and disk.

And I'm sure there are some more. Doing the three CPU ones of those might result in 1/8th the CPU and memory OOTB. I would be completely supportive of "efficient single node monitoring" (since that benefits everyone), but I'm generally not supportive of "monitoring off". A single prometheus should be able to run at about 0.1 or 0.2 core, potentially even before some of these tuning. Have we collected the data and the tuning before we jump to "turn off"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreeing what @smarterclayton has said so far, and cc'ing @simonpasquier. We have also discussed the above options already with @dhellmann and have an investigational spike on what we can actually achieve.

@sronanrh
Copy link

This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.

I think you need to be very careful about how you define 'resource-constrained' because even telco bare metal environments will have resource constraints based not only on compute resources, but more importantly on power, space and cooling which are often leading to the single-node cluster use case in the first place.

@dhellmann
Copy link
Contributor Author

This enhancement describes a new single-node cluster profile for
production use in "edge" deployments that are not considered to be
resource-constrained, such as telecommunications bare metal
environments.

I think you need to be very careful about how you define 'resource-constrained' because even telco bare metal environments will have resource constraints based not only on compute resources, but more importantly on power, space and cooling which are often leading to the single-node cluster use case in the first place.

Sure. The point I was trying to make is we're not talking about a single-board system like a raspberrypi.


### Test Plan

In order to claim full support for this configuration, we must have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, we only support fault tolerant/HA configurations of OpenShift, and there is considerable control plane operator logic that assumes this requirement. I think it might be worth mentioning that making fault-tolerance optional is likely to require not just additional testing for the newly supported cluster configurations but also that additional testing effort will be necessary to ensure that support for existing fault tolerant configurations does not regress.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm...as of right now I think that the "e2e-agnostic" test is required across all core repos, and that's not going to change from our "default profile".

It would make sense to add a new /test e2e-single-node of course, and that'd be a periodic. Some repos might opt into running that always on PRs, or just on demand.

single-node configurations for production use in environments with
"reasonably significant" memory, storage, and compute resources.
* Clusters built using the `single-node-production-edge` profile
should pass most Kubernetes and OpenShift conformance end-to-end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, a lot of the OpenShift e2es assume things like a registry working I think. Would we really be leaving all of that unchanged, or does this enhancement call for removing components (and teaching the test suite to handle that?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to do a lot of that analysis. The goal is to be as close as possible to the default deployment.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some discussion about the topic in #482 I'm guessing the goal for both of these will be identical, since in both cases you're running a limited set of functionality, iow. only core elements w/o addons.

worker node.
* Many operators will be configured to reduce the footprint of their
operands, such as by running fewer replicas.
* In-place upgrades will not be supported by the first iteration of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think deploying a computer without a means to update it is irresponsible. Some people will wave their hands and say it only processes trusted input, is disconnected from the Internet etc etc. I still think it's irresponsible.

Or I guess this does say "in place" so perhaps for this use case (a bit like Code Ready) the idea is any important state is stored outside of the node, so reprovisioning it is a viable path for upgrades?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Edit sorry, I see there's more about this below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the general idea. These sorts of deployments are expected to be part of a larger system that includes an orchestration tool outside of the cluster for managing 100s or 1000s of individual instances. Assuming a wipe-and-rebuild approach can be implemented within the other constraints like the length of change windows, it seemed reasonable to go that route instead of trying to make in-place work. On the other hand, if in-place isn't a big deal, maybe we don't need to make that assumption. So, it's definitely still up for discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting there is a consideration for remote low bandwidth clusters in which we'd want to update only the delta vs. the entire image, which may be more feasible for an in-place upgrade vs. a full re-image.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the low bandwidth is not part of the enhancement. Here we should agree how single node looks like. We will deal with farther improvements (bandwidth, bootstrap removal) as part of other enhancements.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrading the delta - I assume that we dont recreate all the containers images every release - so it should be solved by docker. Unless I'm wrong.
Regarding rchos - I have no clue - is there such functionality in place? @cgwalters ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason ostree is popular is quite a while ago we implemented a pretty good delta mechanism. We aren't using it for RHCOS but "base" FCOS does use it.

https://ostreedev.github.io/ostree/formats/#static-deltas

There'd be some work to enable this but not really hard.

One even more radical approach here would be to commit all of the container images into ostree as well - lifecycle bind everything into a single transactional update. Basically put the containers in e.g. /usr/share/containers/<sha256> and teach crio how to use the equivalent of podman --root=/usr/share/containers/<sha256> if it exists. Then we get ostree deltas for everything.

(But the "lifecycle binding" is also important - this way we have the old OS with old containers, or new OS with new containers, and no possibility of skew)

This is what some ostree users are doing today, although at least one I talked to was using systemd-nspawn - but same principle.

There's also https://blogs.gnome.org/alexl/2020/05/13/putting-container-updates-on-a-diet/ which is a clearer long term across-the-board win; not sure what the status of it is though.


### Open Questions

1. Telco workloads frequently require a realtime kernel. How will a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Using RT kernel works "day 1" too)


### Test Plan

In order to claim full support for this configuration, we must have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm...as of right now I think that the "e2e-agnostic" test is required across all core repos, and that's not going to change from our "default profile".

It would make sense to add a new /test e2e-single-node of course, and that'd be a periodic. Some repos might opt into running that always on PRs, or just on demand.

when those worker nodes lose communication with the cluster control
plane. The most significant problem is that if a node reboots while
it has lost communication with the control plane, it does not
restart any pods it was previously running until communication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...this seems like it shouldn't be too hard to fix. Just start daemonsets by default for example, and add an annotation to pods to do so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Although one problem right now is that crio currently wipes all containers across minor version upgrades...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bigger problem is with user workloads, especially any that also need to talk to the API to learn what to do (fetch a ConfigMap, etc.). It's likely possible to make all of that work, but at some point we would be working around kubernetes instead of taking advantage of it, and the application deployment for the remote workers would be different from centralized sites which means we don't meet the goal of having application management be as uniform as possible.

There's more background about why this approach was rejected in some internal documents, which we're working on publishing more widely as more people get involved in examining these requirements.

osherdp added a commit to osherdp/cluster-storage-operator that referenced this pull request Dec 6, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
osherdp added a commit to osherdp/library-go that referenced this pull request Dec 6, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
osherdp added a commit to osherdp/cluster-csi-snapshot-controller-operator that referenced this pull request Dec 6, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
@kikisdeliveryservice
Copy link
Contributor

Why has so many PRs been opened for this post-feature freeze when we're a) in feature freeze and this is a new feature and b) this enhancement isn't fully approved with open discussion?

osherdp added a commit to osherdp/cluster-node-tuning-operator that referenced this pull request Dec 7, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
@romfreiman
Copy link

Why has so many PRs been opened for this post-feature freeze when we're a) in feature freeze and this is a new feature and b) this enhancement isn't fully approved with open discussion?

@kikisdeliveryservice FF was postponed by a week. The profile will be there, the question is what components will it include. So the prs are relevant

osherdp added a commit to osherdp/console-operator that referenced this pull request Dec 7, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
osherdp added a commit to osherdp/operator-marketplace that referenced this pull request Dec 7, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
osherdp added a commit to osherdp/operator-lifecycle-manager that referenced this pull request Dec 7, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
@kikisdeliveryservice
Copy link
Contributor

Why has so many PRs been opened for this post-feature freeze when we're a) in feature freeze and this is a new feature and b) this enhancement isn't fully approved with open discussion?

@kikisdeliveryservice FF was postponed by a week. The profile will be there, the question is what components will it include. So the prs are relevant

AFAIK, about FF there was an email that went out about the build waiting being delayed not that FF was extended. Can you please send me (via email or slack) official docs saying it has changed?

That being said, my second issue still stands - this enhancements isn't fully approved & has open comments but you're beginning to implement which seems counter to the point of having an enhancement?

@dhellmann
Copy link
Contributor Author

That being said, my second issue still stands - this enhancements isn't fully approved & has open comments but you're beginning to implement which seems counter to the point of having an enhancement?

There is significant time pressure to prepare a proof-of-concept implementation of this work. The team is using the PRs along with tools like clusterbot to prepare a release image for that PoC.

@romfreiman, I think it would be reasonable to tag all of the PRs as WIP to avoid distracting teams who are finishing their 4.7 work.

oc patch -n openshift-ingress-operator ingresscontroller/default --type=merge --patch '{"spec":{"replicas": 1}}'
```

#### machine-config-operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we expecting any day2 configuration change using MCO on the single-node cluster in its lifetime?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. Applying rt kernel, or other telco related configurations (performance tunings)

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice Dec 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to sinny's question as this is super impt.. there's an open question (# 3 below) that mentions this but it feels like this needs a lot more consideration & consensus followed by memorialising that in the enhancement here (and probably updating the risks section as well) as opposed to just leaving it as a tbd question.

Copy link
Contributor

@sinnykumari sinnykumari Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying any configuration (like rt kernel, kargs, files, etc) during install time shouldn't have much risk on single node. For any day2 changes MCO is going to drain all running workload and then reboot(few exception). For a production level node, we need to discuss where running workload will be scheduled when drain occurs? If something goes wrong in between, there is no way to get must-gather to perform troubleshooting. I am sure there will be more issues which may come into picture as MCO has been designed keeping in mind that it will run on multi node cluster (with minimum 3 compute and control plane).

- group: apps/v1
kind: Deployment
name: etcd-quorum-guard
namespace: openshift-machine-config-operator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd-quorum-guard doesn't live in the mco namespace since july: openshift/machine-config-operator#1928

Should this be openshift-etcd?

cc: @hexfusion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct




### Risks and Mitigations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given how much of a paradigm shift this is, i'd expect to see the risks articulated here...

the operators and other components that are most affected by the new
profile, such as the etcd and auth operators.

### Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this requires a bunch of non-trivial work and coordination would like to see what the rollout plans for this look like..

for a host to boot, including bonded interfaces, access to multiple
VLANs, and static IPs. How do we anticipate configuring those?
3. The machine-config-operator works by (almost always) rebooting a
host. Is that going to be OK in these single-node deployments?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this merits a lot more discussion

- "@romfreiman"
- "@markmc"
reviewers:
- TBD, probably all leads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get this updated to make sure that affected operators/groups get to review this

osherdp added a commit to osherdp/cluster-storage-operator that referenced this pull request Dec 9, 2020
This adds annotations for the single-node-production-edge cluster profile. There's a growing requirement from several customers to enable creation of single-node (not high-available) Openshift clusters.
In stage one (following openshift/enhancements#504) there should be no implication on components logic.
In the next stage, the component's behavior will match a non high-availability profile if the customer is specifically interested in one.
This PR is separate from the 'single-node-developer' work, which will implement a different behavior and is currently on another stage of implementation.

For more info, please refer to the enhancement link and participate in the discussion.
osherdp added a commit to osherdp/cluster-authentication-operator that referenced this pull request Dec 9, 2020
@simonpasquier
Copy link
Contributor

From a monitoring standpoint, a single-node cluster profile wouldn't be sufficient to tune the components. Only a small fraction of the monitoring resources is deployed through CVO (e.g. only namespace + CMO itself). The vast majority of the deployment is encoded into CMO and it can be partially customized via a configmap. To support the single node topology, we would have to expose a single-node option in the CMO config and CVO could deploy a specific configmap in case of the single-node cluster profile is enabled. Overall #555 seems a more straightforward approach for us.

@dhellmann
Copy link
Contributor Author

It looks like we can do this without a profile. See #560 for details of the alternative approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.