Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend design for how the CVO knows the cluster profile #543

Merged
merged 2 commits into from
Dec 3, 2020

Conversation

guillaumerose
Copy link
Contributor

As required per openshift/api#782 (comment) and openshift/cluster-version-operator#404 (comment).

This is needed to cover all the needs of the CVO and the usage of Cluster Profile with the installer.

@deads2k deads2k changed the title Add design for cluster profile Add design for how the CVO knows the cluster profile Nov 18, 2020
@deads2k deads2k changed the title Add design for how the CVO knows the cluster profile clarify design for how the CVO knows the cluster profile Nov 18, 2020
@deads2k deads2k changed the title clarify design for how the CVO knows the cluster profile extend design for how the CVO knows the cluster profile Nov 18, 2020
CLUSTER_PROFILE=[identifier]
```
This environment variable would have to be specified in the CVO deployment. When
no `CLUSTER_PROFILE=[identifier]` variable is specified, the `default` cluster profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the literal "default" doesn't look right to me

@smarterclayton
Copy link
Contributor

I am -1 to exposing the profile - it’s a grouping of deployment characteristics. Operators should be aware of the characteristic (single node) not the profile (CRC, single node production edge). I would be supportive of api changes that expose details onto other config but not clusrer version and profile.

Ie “single master” in infrastructure status if we had a really good reason for it.


A cluster profile is specified to the CVO as an identifier in, either:
* an environment variable,
* the `status.ClusterProfile` properties in `ClusterVersion`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t see a reason to do this because it exposes the wrong behavior for clients to trigger on. What are the use cases that you would read this value (ie, why are you trying to do this)? What characteristics of the profile are you trying to react to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution was a reaction of the push back from the installer team. See openshift/installer#3986 (comment)

I don't want to react on this profile. This was just a mean to pass the cluster profile without having to change the installer.

openshift-install create manifests
patch the CV
openshift-install create cluster

For sure, I will be happier if I can pass it directly in the install-config.yaml.

An argument for this new property is that it can be used in insights and telemetry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this focused on the CRC case at the moment? it is disabling insights and telemetry by default....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we do phase 0 of the crc related enhancement, is the patch step enumerated above sufficient to make improved incremental progress without committing to an externally user facing API? it seems like a reasonable first-step to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to use cluster profile for #504 in 4.8, so it's not only CRC oriented

@deads2k
Copy link
Contributor

deads2k commented Nov 20, 2020

What are the use cases that you would read this value (ie, why are you trying to do this)?

There needs to be a way for the CVO to know which profile to apply in a self-hosted installation. Since the CVO is part of the payload, there isn't a non-API alternative that we see. This value will exist somewhere in the API and our only choice is about where that location is.

I am -1 to exposing the profile - it’s a grouping of deployment characteristics.

This value is intended for use by the ClusterVersionOperator, not by other operators: that's why it is on the ClusterVersion, not on a something like Infrastructure. But in a self-hosted cluster, that intent has to preserved somewhere in the API, there isn't an external source of trust as there is in cases like externally managed clusters.

Separately, I do expect that both CRC and SingleNodeProduction are going to want divergent operator behavior, but this field isn't intended to describe that. It is narrowly focused on selecting which bits of the payload to apply.

@smarterclayton
Copy link
Contributor

smarterclayton commented Nov 20, 2020

There needs to be a way for the CVO to know which profile to apply in a self-hosted installation. Since the CVO is part of the payload, there isn't a non-API alternative that we see.

The CVO is running code that generates its own next config - that's just as easy to do with an env var as an API from the deployment/pod that creates itself?

@smarterclayton
Copy link
Contributor

I'm not opposed to the CVO having data it itself manages, I am super concerned about people abusing that. It's hard to have an API field that has godoc that says "You aren't allowed to use this", it's easy if the field doesn't exist. We've had similar problems with other "non API" constructs so I'm just thinking about preventing the problem altogether.

@guillaumerose
Copy link
Contributor Author

Env. var is not that easy! The first time we will introduce it, the outgoing will not know about it and won't be able to templatize the manifest. It forces us to add the cluster profile in the struct in one release, and use it in the CVO manifest in a second release. We can workaround that but still not easy.

Tried here: openshift/cluster-version-operator@e5ccafa, Update CI job:
Cluster did not complete upgrade: timed out waiting for the condition: Error loading manifests from /etc/cvo/updatepayloads/QGnAQsjGa670WqA9XFnWRg: error running preprocess on 0000_00_cluster-version-operator_03_deployment.yaml: failed to execute template: template: manifest:32:33: executing \"manifest\" at <.ClusterProfile>: can't evaluate field ClusterProfile in type payload.manifestRenderConfig

@smarterclayton
Copy link
Contributor

The first time we will introduce it, the outgoing will not know about it and won't be able to templatize the manifest. It forces us to add the cluster profile in the struct in one release, and use it in the CVO manifest in a second release. We can workaround that but still not easy.

We do this fairly frequently for other things. You could do it and backport it to older releases. And then we just require a higher Z version to upgrade. We do it often enough that I would say it's just as easy to me as adding a new api field without the downsides of people being able to see what the profile is.

@guillaumerose
Copy link
Contributor Author

Should I change this enhancement to remove this field from ClusterVersion but add it to InstallConfig ?

https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go#L62

@smarterclayton
Copy link
Contributor

Is it not already added to install config? How were you planning on telling the installer the profile of the cluster?

@guillaumerose
Copy link
Contributor Author

Installer team pushed back on this last July. openshift/installer#3986
The only way to make it work is either a ConfigMap or change the ClusterVersion.

@smarterclayton
Copy link
Contributor

I think this proposal should include "how does a user specify which profile they want" and the details of that implementation. Setting an env var on the initial deployment is just as valid as a field on the api or a configmap. The profile env var being supported throughout upgrade is already a requirement for hypershift, so CVO continuing to broaden support for it is the current path. If we have reasons not to expose the profile to other operators (except perhaps as role specific variables) then the proposal would be to ensure that CVO properly preserves the profile value via env in all three modes (hypershift, self-hosted HA, and self-hosted singleton) and any arguments about why that won't work should be assessed (as we were doing)

@guillaumerose
Copy link
Contributor Author

guillaumerose commented Nov 24, 2020

I updated the enhancement to include 2 design proposals:

  • one with only the env variable that requires to be released in 2 phases,
  • one with the new property in ClusterVersion.

I hope I covered everything.

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial reaction is that the following:

openshift-install create manifests
patch the CV
openshift-install create cluster

is a reasonable step to unlock CRC scenarios without complicating the end-user facing API while we learn/iterate. the install-config is not pertinent for the ibm cloud scenario, so i feel like that path gives us the ability to innovate without being stuck to a given API.

enhancements/update/cluster-profiles.md Show resolved Hide resolved
Cluster profile will be set like this env. variable.
Upgrade will have to preserve the initial cluster profile.

###### Hypershift
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would exclude this section.

are you really more interested in capturing how CRC can signal to openshift-install? if so, lets focus on that scenario, but hypershift is not pertinent here per ibm cloud comment above.


A cluster profile is specified to the CVO as an identifier in, either:
* an environment variable,
* the `status.ClusterProfile` properties in `ClusterVersion`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this focused on the CRC case at the moment? it is disabling insights and telemetry by default....


A cluster profile is specified to the CVO as an identifier in, either:
* an environment variable,
* the `status.ClusterProfile` properties in `ClusterVersion`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we do phase 0 of the crc related enhancement, is the patch step enumerated above sufficient to make improved incremental progress without committing to an externally user facing API? it seems like a reasonable first-step to me.

@derekwaynecarr
Copy link
Member

@csrwng can you comment as well?

@csrwng
Copy link
Contributor

csrwng commented Nov 24, 2020

@derekwaynecarr the issue with patching a manifest after the installer creates manifests is that there really is no good manifest to patch (they don't include the CVO deployment). The CVO deployment is generated by the bootkube script once the install is under way (https://github.com/openshift/installer/blob/ad31070e5a34ea92ce9792b2832fcbf313ec9832/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L56-L61). One possibility is to have the installer react to an env var for the profile that then it ends up passing to the cvo render invocation. At least that would not require a change to install-config.yaml. @guillaumerose was something like that explored with the installer team?

@csrwng
Copy link
Contributor

csrwng commented Nov 24, 2020

Sorry, missed the part of patching the CV. If we patch the CV, it means that the CV would have to expose the profile which I thought is something we wanted to initially avoid.

@guillaumerose
Copy link
Contributor Author

One possibility is to have the installer react to an env var for the profile that then it ends up passing to the cvo render invocation.

This is a great idea. It keeps cluster profile something hidden but it is a good step forward. Installer team didn't explored that.
It still require the 2 phases deployment although.

As discussed with @csrwng, instead of the release in 2 times, we can have 2 sets of manifests: some with {{ .ClusterProfile }} and some without, just for the upgrade.

@guillaumerose
Copy link
Contributor Author

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:

  • a new field in install-config,
  • or a more-or-less documented env. variable (like OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE) CLUSTER_PROFILE

Thanks.

@romfreiman
Copy link

Lets explore 2 use cases that we should address here:

  1. ETCD bootstrap - should it render different manifests as a result of a specific profile?
  2. cluster-authenticator-operator (as an example): https://github.com/openshift/cluster-authentication-operator/blob/adbb442cc7bbcb6a71f6118729d994a4f4dc8985/pkg/controllers/readiness/unsupported_override.go#L57. It has to behave differently in case of CRC/SNO.

In both cases it sums up not to a profile, but to a deployment time (which is derived from the profile). So I assyme we plan that each component will have different type of manifests, depending on the profile 'grouping'. It will address runtime mode (for example, diffferent CEO manifest that has 'DEPLOYMENT=NonHA' as environment variable for crc/sno profiles. So we can assume that the operators should no read the profile type from the cluster.
But, how would it work for bootkube rendering: https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L74. It might be that we'll have to propagate env variables to each render command, right? Or it should be wrapped in bootkube logic with some if clauses that translates the profile name to operators behavior?
cc: @hexfusion

@guillaumerose
Copy link
Contributor Author

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?


When upgrading, outgoing CVO will forward the cluster profile information to the incoming CVO with the environment variable.

`include.release.openshift.io/[identifier]=true` would make the CVO render this manifest only when `CLUSTER_PROFILE=[identifier]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr is include.release.openshift.io something we claimed as API? I see this used for IBM and high availability something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfojtik yes

@eranco74
Copy link
Contributor

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?

They are not used once the bootstrap is completed,
So unless the static pods running on the bootstrap take decisions (e.g. set some configuration assuming that the cluster will have 3 master nodes) there shouldn't be any reason the etcd member running on the bootstrap will need to know what is the cluster profile.
@hexfusion thoughts?

@hexfusion
Copy link
Contributor

hexfusion commented Nov 30, 2020

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
I don't know how much manifests from the CVO will collide with the ones render by this components. Are they still used after the bootstrap dance ?

They are not used once the bootstrap is completed,
So unless the static pods running on the bootstrap take decisions (e.g. set some configuration assuming that the cluster will have 3 master nodes) there shouldn't be any reason the etcd member running on the bootstrap will need to know what is the cluster profile.
@hexfusion thoughts?

Ideally, the render command would be able to conclude the desired state from the bootstrap node. All conditional decisions about the cluster profile would be made during render. So as long as we can conclude the current profile in bootkube we shouldn't have any problems.

@eranco74
Copy link
Contributor

eranco74 commented Dec 1, 2020

Very good point. Should we say that all components that can render manifests can take CLUSTER_PROFILE env. variable into account ?
@guillaumerose, seems that the answer is yes.

@guillaumerose
Copy link
Contributor Author

For the bootstrap process, I read that that we will introduce a new CRD for that (cluster-config). So we are good with the current design.

@guillaumerose
Copy link
Contributor Author

@derekwaynecarr @smarterclayton @wking is the current design OK ?

@staebler
Copy link
Contributor

staebler commented Dec 2, 2020

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:

* a new field in install-config,

* or a more-or-less documented env. variable (like `OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE`) `CLUSTER_PROFILE`

Thanks.

Here is my third proposal. The bootkube.sh script looks for a specific manifest file to exist. If the file exists, the script adds the contents of that file as the CLUSTER_PROFILE environment variable for the bootstrap CVO pod. This makes it something that a user must deliberately do and a particular step in the installation process.

}
```

Option 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

option 2 is preferred as we iterate/learn especially while the bootstrap host is still required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer option 2 as well.

Using the profile during install should trigger a warning. Also, it should be prefixed the same way other install flags are like OPENSHIFT_INSTALL_EXPERIMENTAL_CLUSTER_PROFILE

Copy link

@romfreiman romfreiman Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note we'll have to add this env to cluster-bot once we agree on the flow.

@ravidbro
Copy link

ravidbro commented Dec 2, 2020

I removed the proposal with the new field in ClusterVersion. Let's now focus on what to change in the installer. 2 proposals:

* a new field in install-config,

* or a more-or-less documented env. variable (like `OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE`) `CLUSTER_PROFILE`

Thanks.

Here is my third proposal. The bootkube.sh script looks for a specific manifest file to exist. If the file exists, the script adds the contents of that file as the CLUSTER_PROFILE environment variable for the bootstrap CVO pod. This makes it something that a user must deliberately do and a particular step in the installation process.

And what will the user need to do to create that manifest?

@derekwaynecarr
Copy link
Member

option 2 is my preference while we iterate, we need install/cvo to agree.

/assign @eparis @crawford

@guillaumerose
Copy link
Contributor Author

I removed option 1 and set the good name for the env. variable in the installer.

I also changed a bit the releases phases: if we only add ClusterProfile in manifestRenderConfig, we will not be able to add in the next release manifests that doesn't belong to the default profile. The outgoing CVO will load all manifests in the release image without any distinction. It would force CRC and others to wait a third release to use it.
The bare minimum to do is to select only manifests with the default profile annotation + change manifestRenderConfig.

@eparis
Copy link
Member

eparis commented Dec 3, 2020

staebler 2 minutes ago
I can live with option 2 on that first enhancement. No need to stress about that one.

lalatenduM < 1 minute ago
+1 to as long as the profile does not appear in end user facing api

Thus from that
/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eparis, guillaumerose

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2020
@openshift-merge-robot openshift-merge-robot merged commit 68de0ef into openshift:master Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.