extend design for how the CVO knows the cluster profile #543

guillaumerose · 2020-11-18T12:43:48Z

As required per openshift/api#782 (comment) and openshift/cluster-version-operator#404 (comment).

This is needed to cover all the needs of the CVO and the usage of Cluster Profile with the installer.

deads2k · 2020-11-18T12:59:04Z

enhancements/update/cluster-profiles.md

+CLUSTER_PROFILE=[identifier]
+```
+This environment variable would have to be specified in the CVO deployment. When
+no `CLUSTER_PROFILE=[identifier]` variable is specified, the `default` cluster profile


the literal "default" doesn't look right to me

smarterclayton · 2020-11-20T13:41:33Z

I am -1 to exposing the profile - it’s a grouping of deployment characteristics. Operators should be aware of the characteristic (single node) not the profile (CRC, single node production edge). I would be supportive of api changes that expose details onto other config but not clusrer version and profile.

Ie “single master” in infrastructure status if we had a really good reason for it.

smarterclayton · 2020-11-20T13:43:13Z

enhancements/update/cluster-profiles.md

+
+A cluster profile is specified to the CVO as an identifier in, either:
+ * an environment variable,
+ * the `status.ClusterProfile` properties in `ClusterVersion`.


I don’t see a reason to do this because it exposes the wrong behavior for clients to trigger on. What are the use cases that you would read this value (ie, why are you trying to do this)? What characteristics of the profile are you trying to react to?

This solution was a reaction of the push back from the installer team. See openshift/installer#3986 (comment)

I don't want to react on this profile. This was just a mean to pass the cluster profile without having to change the installer.

openshift-install create manifests patch the CV openshift-install create cluster

For sure, I will be happier if I can pass it directly in the install-config.yaml.

An argument for this new property is that it can be used in insights and telemetry.

is this focused on the CRC case at the moment? it is disabling insights and telemetry by default....

while we do phase 0 of the crc related enhancement, is the patch step enumerated above sufficient to make improved incremental progress without committing to an externally user facing API? it seems like a reasonable first-step to me.

We will need to use cluster profile for #504 in 4.8, so it's not only CRC oriented

deads2k · 2020-11-20T15:26:03Z

What are the use cases that you would read this value (ie, why are you trying to do this)?

There needs to be a way for the CVO to know which profile to apply in a self-hosted installation. Since the CVO is part of the payload, there isn't a non-API alternative that we see. This value will exist somewhere in the API and our only choice is about where that location is.

I am -1 to exposing the profile - it’s a grouping of deployment characteristics.

This value is intended for use by the ClusterVersionOperator, not by other operators: that's why it is on the ClusterVersion, not on a something like Infrastructure. But in a self-hosted cluster, that intent has to preserved somewhere in the API, there isn't an external source of trust as there is in cases like externally managed clusters.

Separately, I do expect that both CRC and SingleNodeProduction are going to want divergent operator behavior, but this field isn't intended to describe that. It is narrowly focused on selecting which bits of the payload to apply.

smarterclayton · 2020-11-20T21:27:30Z

There needs to be a way for the CVO to know which profile to apply in a self-hosted installation. Since the CVO is part of the payload, there isn't a non-API alternative that we see.

The CVO is running code that generates its own next config - that's just as easy to do with an env var as an API from the deployment/pod that creates itself?

smarterclayton · 2020-11-20T21:29:14Z

I'm not opposed to the CVO having data it itself manages, I am super concerned about people abusing that. It's hard to have an API field that has godoc that says "You aren't allowed to use this", it's easy if the field doesn't exist. We've had similar problems with other "non API" constructs so I'm just thinking about preventing the problem altogether.

guillaumerose · 2020-11-20T21:38:36Z

Env. var is not that easy! The first time we will introduce it, the outgoing will not know about it and won't be able to templatize the manifest. It forces us to add the cluster profile in the struct in one release, and use it in the CVO manifest in a second release. We can workaround that but still not easy.

Tried here: openshift/cluster-version-operator@e5ccafa, Update CI job:
Cluster did not complete upgrade: timed out waiting for the condition: Error loading manifests from /etc/cvo/updatepayloads/QGnAQsjGa670WqA9XFnWRg: error running preprocess on 0000_00_cluster-version-operator_03_deployment.yaml: failed to execute template: template: manifest:32:33: executing \"manifest\" at <.ClusterProfile>: can't evaluate field ClusterProfile in type payload.manifestRenderConfig

smarterclayton · 2020-11-20T21:39:57Z

The first time we will introduce it, the outgoing will not know about it and won't be able to templatize the manifest. It forces us to add the cluster profile in the struct in one release, and use it in the CVO manifest in a second release. We can workaround that but still not easy.

We do this fairly frequently for other things. You could do it and backport it to older releases. And then we just require a higher Z version to upgrade. We do it often enough that I would say it's just as easy to me as adding a new api field without the downsides of people being able to see what the profile is.

guillaumerose · 2020-11-20T21:43:10Z

Should I change this enhancement to remove this field from ClusterVersion but add it to InstallConfig ?

https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go#L62

smarterclayton · 2020-11-23T14:50:36Z

Is it not already added to install config? How were you planning on telling the installer the profile of the cluster?

guillaumerose · 2020-11-23T15:19:43Z

Installer team pushed back on this last July. openshift/installer#3986
The only way to make it work is either a ConfigMap or change the ClusterVersion.

smarterclayton · 2020-11-23T16:14:33Z

I think this proposal should include "how does a user specify which profile they want" and the details of that implementation. Setting an env var on the initial deployment is just as valid as a field on the api or a configmap. The profile env var being supported throughout upgrade is already a requirement for hypershift, so CVO continuing to broaden support for it is the current path. If we have reasons not to expose the profile to other operators (except perhaps as role specific variables) then the proposal would be to ensure that CVO properly preserves the profile value via env in all three modes (hypershift, self-hosted HA, and self-hosted singleton) and any arguments about why that won't work should be assessed (as we were doing)

guillaumerose · 2020-11-24T09:03:44Z

I updated the enhancement to include 2 design proposals:

one with only the env variable that requires to be released in 2 phases,
one with the new property in ClusterVersion.

I hope I covered everything.

derekwaynecarr

My initial reaction is that the following:

openshift-install create manifests
patch the CV
openshift-install create cluster

is a reasonable step to unlock CRC scenarios without complicating the end-user facing API while we learn/iterate. the install-config is not pertinent for the ibm cloud scenario, so i feel like that path gives us the ability to innovate without being stuck to a given API.

enhancements/update/cluster-profiles.md

derekwaynecarr · 2020-11-24T16:25:18Z

enhancements/update/cluster-profiles.md

+Cluster profile will be set like this env. variable.
+Upgrade will have to preserve the initial cluster profile.
+
+###### Hypershift


i would exclude this section.

are you really more interested in capturing how CRC can signal to openshift-install? if so, lets focus on that scenario, but hypershift is not pertinent here per ibm cloud comment above.

derekwaynecarr · 2020-11-24T16:26:43Z

enhancements/update/cluster-profiles.md

+
+A cluster profile is specified to the CVO as an identifier in, either:
+ * an environment variable,
+ * the `status.ClusterProfile` properties in `ClusterVersion`.


is this focused on the CRC case at the moment? it is disabling insights and telemetry by default....

derekwaynecarr · 2020-11-24T16:27:39Z

enhancements/update/cluster-profiles.md

+
+A cluster profile is specified to the CVO as an identifier in, either:
+ * an environment variable,
+ * the `status.ClusterProfile` properties in `ClusterVersion`.


while we do phase 0 of the crc related enhancement, is the patch step enumerated above sufficient to make improved incremental progress without committing to an externally user facing API? it seems like a reasonable first-step to me.

derekwaynecarr · 2020-11-24T16:29:37Z

@csrwng can you comment as well?

csrwng · 2020-11-24T17:22:29Z

@derekwaynecarr the issue with patching a manifest after the installer creates manifests is that there really is no good manifest to patch (they don't include the CVO deployment). The CVO deployment is generated by the bootkube script once the install is under way (https://github.com/openshift/installer/blob/ad31070e5a34ea92ce9792b2832fcbf313ec9832/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L56-L61). One possibility is to have the installer react to an env var for the profile that then it ends up passing to the cvo render invocation. At least that would not require a change to install-config.yaml. @guillaumerose was something like that explored with the installer team?

csrwng · 2020-11-24T17:28:43Z

Sorry, missed the part of patching the CV. If we patch the CV, it means that the CV would have to expose the profile which I thought is something we wanted to initially avoid.

guillaumerose · 2020-11-24T20:31:22Z

One possibility is to have the installer react to an env var for the profile that then it ends up passing to the cvo render invocation.

This is a great idea. It keeps cluster profile something hidden but it is a good step forward. Installer team didn't explored that.
It still require the 2 phases deployment although.

As discussed with @csrwng, instead of the release in 2 times, we can have 2 sets of manifests: some with {{ .ClusterProfile }} and some without, just for the upgrade.

guillaumerose · 2020-11-25T08:48:14Z

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:

a new field in install-config,
or a more-or-less documented env. variable (like OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE) CLUSTER_PROFILE

Thanks.

enhancements/update/cluster-profiles.md

romfreiman · 2020-11-26T21:45:58Z

Lets explore 2 use cases that we should address here:

ETCD bootstrap - should it render different manifests as a result of a specific profile?
cluster-authenticator-operator (as an example): https://github.com/openshift/cluster-authentication-operator/blob/adbb442cc7bbcb6a71f6118729d994a4f4dc8985/pkg/controllers/readiness/unsupported_override.go#L57. It has to behave differently in case of CRC/SNO.

In both cases it sums up not to a profile, but to a deployment time (which is derived from the profile). So I assyme we plan that each component will have different type of manifests, depending on the profile 'grouping'. It will address runtime mode (for example, diffferent CEO manifest that has 'DEPLOYMENT=NonHA' as environment variable for crc/sno profiles. So we can assume that the operators should no read the profile type from the cluster.
But, how would it work for bootkube rendering: https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L74. It might be that we'll have to propagate env variables to each render command, right? Or it should be wrapped in bootkube logic with some if clauses that translates the profile name to operators behavior?
cc: @hexfusion

guillaumerose · 2020-11-27T12:38:10Z

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?

mfojtik · 2020-11-27T15:20:04Z

enhancements/update/cluster-profiles.md

+
+When upgrading, outgoing CVO will forward the cluster profile information to the incoming CVO with the environment variable.
+
+`include.release.openshift.io/[identifier]=true` would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`


@derekwaynecarr is include.release.openshift.io something we claimed as API? I see this used for IBM and high availability something.

@mfojtik yes

eranco74 · 2020-11-29T13:00:31Z

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?

They are not used once the bootstrap is completed,
So unless the static pods running on the bootstrap take decisions (e.g. set some configuration assuming that the cluster will have 3 master nodes) there shouldn't be any reason the etcd member running on the bootstrap will need to know what is the cluster profile.
@hexfusion thoughts?

hexfusion · 2020-11-30T12:59:06Z

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?

They are not used once the bootstrap is completed,
So unless the static pods running on the bootstrap take decisions (e.g. set some configuration assuming that the cluster will have 3 master nodes) there shouldn't be any reason the etcd member running on the bootstrap will need to know what is the cluster profile.
@hexfusion thoughts?

Ideally, the render command would be able to conclude the desired state from the bootstrap node. All conditional decisions about the cluster profile would be made during render. So as long as we can conclude the current profile in bootkube we shouldn't have any problems.

eranco74 · 2020-12-01T08:08:10Z

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
@guillaumerose, seems that the answer is yes.

guillaumerose · 2020-12-02T08:10:40Z

For the bootstrap process, I read that that we will introduce a new CRD for that (cluster-config). So we are good with the current design.

guillaumerose · 2020-12-02T08:10:58Z

@derekwaynecarr @smarterclayton @wking is the current design OK ?

staebler · 2020-12-02T21:12:11Z

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:
* a new field in install-config,

* or a more-or-less documented env. variable (like `OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE`) `CLUSTER_PROFILE`
Thanks.

Here is my third proposal. The bootkube.sh script looks for a specific manifest file to exist. If the file exists, the script adds the contents of that file as the CLUSTER_PROFILE environment variable for the bootstrap CVO pod. This makes it something that a user must deliberately do and a particular step in the installation process.

derekwaynecarr · 2020-12-02T21:34:59Z

enhancements/update/cluster-profiles.md

+}
+```
+
+Option 2:


option 2 is preferred as we iterate/learn especially while the bootstrap host is still required.

I prefer option 2 as well.

Using the profile during install should trigger a warning. Also, it should be prefixed the same way other install flags are like OPENSHIFT_INSTALL_EXPERIMENTAL_CLUSTER_PROFILE

Also note we'll have to add this env to cluster-bot once we agree on the flow.

ravidbro · 2020-12-02T21:35:31Z

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:
* a new field in install-config,

* or a more-or-less documented env. variable (like `OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE`) `CLUSTER_PROFILE`
Thanks.
Here is my third proposal. The bootkube.sh script looks for a specific manifest file to exist. If the file exists, the script adds the contents of that file as the CLUSTER_PROFILE environment variable for the bootstrap CVO pod. This makes it something that a user must deliberately do and a particular step in the installation process.

And what will the user need to do to create that manifest?

derekwaynecarr · 2020-12-02T21:50:52Z

option 2 is my preference while we iterate, we need install/cvo to agree.

/assign @eparis @crawford

guillaumerose · 2020-12-03T08:01:56Z

I removed option 1 and set the good name for the env. variable in the installer.

I also changed a bit the releases phases: if we only add ClusterProfile in manifestRenderConfig, we will not be able to add in the next release manifests that doesn't belong to the default profile. The outgoing CVO will load all manifests in the release image without any distinction. It would force CRC and others to wait a third release to use it.
The bare minimum to do is to select only manifests with the default profile annotation + change manifestRenderConfig.

eparis · 2020-12-03T15:40:22Z

staebler 2 minutes ago
I can live with option 2 on that first enhancement. No need to stress about that one.

lalatenduM < 1 minute ago
+1 to as long as the profile does not appear in end user facing api

Thus from that
/lgtm
/approve

openshift-ci-robot · 2020-12-03T15:40:47Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eparis, guillaumerose

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [eparis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot requested review from kbsingh and sjenning November 18, 2020 12:44

deads2k changed the title ~~Add design for cluster profile~~ Add design for how the CVO knows the cluster profile Nov 18, 2020

deads2k changed the title ~~Add design for how the CVO knows the cluster profile~~ clarify design for how the CVO knows the cluster profile Nov 18, 2020

deads2k changed the title ~~clarify design for how the CVO knows the cluster profile~~ extend design for how the CVO knows the cluster profile Nov 18, 2020

deads2k reviewed Nov 18, 2020

View reviewed changes

smarterclayton reviewed Nov 20, 2020

View reviewed changes

guillaumerose force-pushed the design1 branch from cdd1e36 to f4c4e26 Compare November 24, 2020 09:02

derekwaynecarr requested changes Nov 24, 2020

View reviewed changes

guillaumerose force-pushed the design1 branch from f4c4e26 to 5219771 Compare November 25, 2020 08:45

guillaumerose mentioned this pull request Nov 26, 2020

Add ClusterProfile template variable openshift/cluster-version-operator#483

Merged

romfreiman reviewed Nov 26, 2020

View reviewed changes

enhancements/update/cluster-profiles.md Outdated Show resolved Hide resolved

mfojtik reviewed Nov 27, 2020

View reviewed changes

Separate the part related to the annotation from the env. var. design.

4fbfd51

guillaumerose force-pushed the design1 branch from 5219771 to 9f13f63 Compare December 2, 2020 08:16

romfreiman mentioned this pull request Dec 2, 2020

Pass CLUSTER_PROFILE env var to CVO render openshift/installer#4444

Merged

derekwaynecarr reviewed Dec 2, 2020

View reviewed changes

openshift-ci-robot assigned crawford and eparis Dec 2, 2020

Add design for cluster profile

b4a5ef9

guillaumerose force-pushed the design1 branch from 9f13f63 to b4a5ef9 Compare December 3, 2020 07:57

This was referenced Dec 3, 2020

add single-node variant for applying cluster profile of single node openshift/ci-chat-bot#114

Merged

exporting cluster-profile of SNO in case we're using a 'sno' variant openshift/release#14043

Closed

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2020

openshift-merge-robot merged commit 68de0ef into openshift:master Dec 3, 2020


		When upgrading, outgoing CVO will forward the cluster profile information to the incoming CVO with the environment variable.

		`include.release.openshift.io/[identifier]=true` would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`

extend design for how the CVO knows the cluster profile #543

extend design for how the CVO knows the cluster profile #543

Conversation

guillaumerose commented Nov 18, 2020

Choose a reason for hiding this comment

smarterclayton commented Nov 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Nov 20, 2020

smarterclayton commented Nov 20, 2020 • edited Loading

smarterclayton commented Nov 20, 2020

guillaumerose commented Nov 20, 2020

smarterclayton commented Nov 20, 2020

guillaumerose commented Nov 20, 2020

smarterclayton commented Nov 23, 2020

guillaumerose commented Nov 23, 2020

smarterclayton commented Nov 23, 2020

guillaumerose commented Nov 24, 2020 • edited Loading

derekwaynecarr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Nov 24, 2020

csrwng commented Nov 24, 2020

csrwng commented Nov 24, 2020

guillaumerose commented Nov 24, 2020

guillaumerose commented Nov 25, 2020

romfreiman commented Nov 26, 2020

guillaumerose commented Nov 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eranco74 commented Nov 29, 2020

hexfusion commented Nov 30, 2020 • edited Loading

eranco74 commented Dec 1, 2020 • edited Loading

guillaumerose commented Dec 2, 2020

guillaumerose commented Dec 2, 2020

staebler commented Dec 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romfreiman Dec 2, 2020 • edited Loading

Choose a reason for hiding this comment

ravidbro commented Dec 2, 2020

derekwaynecarr commented Dec 2, 2020

guillaumerose commented Dec 3, 2020

eparis commented Dec 3, 2020

openshift-ci-robot commented Dec 3, 2020

smarterclayton commented Nov 20, 2020 •

edited

Loading

guillaumerose commented Nov 24, 2020 •

edited

Loading

hexfusion commented Nov 30, 2020 •

edited

Loading

eranco74 commented Dec 1, 2020 •

edited

Loading

romfreiman Dec 2, 2020 •

edited

Loading