Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for credentials management outside the openshift cluster for new platforms #709

Merged

Conversation

akhil-rane
Copy link
Contributor

@akhil-rane akhil-rane force-pushed the cco_manual_mode_is_default branch from 3435a57 to 04bbb13 Compare March 31, 2021 15:01
## Summary

The intent of this enhancement is to take the process of credentials management outside the OpenShift cluster for new
platforms. We propose to make *manual* mode as default for clusters on new platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optionally supported by CLI tooling. (ccoctl)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin +1

also add "that follows the best practices of the hosting infrastructure at that time."

As part of this enhancement we plan to do the following:
* Set *manual* as default/preferred credentials mode for OpenShift
* Not store admin level cloud credentials inside a cluster
* Provide a tooling design for credentials management outside the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to tighten these up a little, I would restate the goals as:

  • Guide new platforms to start with Manual mode rather than Mint mode.
  • Provide optional tooling (ccoctl) for administrators to manage their credentials outside the cluster. (prep for install, reconcile, prep for upgrade)
  • Establish precedent for future direction of prefering Manual over Mint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with above.


### Non-Goals

* We do not plan to go into design details on how this tool will be ported to cloud providers other than AWS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing Mint mode for any currently supported cloud. (AWS, Azure, GCP)
Changing the default for any provider currently defaulting to Mint mode, though we may revisit in the future.


#### Story 3
As a OpenShift cluster administrator I should be able to install a OpenShift cluster, without providing admin credentials
to the installer, by injecting precreated Kubernetes Secrets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an OpenShift cluster administrator, I should be able to reconcile my credentials manually using local CLI tooling on an on-going basis.
As an OpenShift cluster administrator, I should be able to prep credentials for a cluster upgrade.


### Graduation Criteria

### Upgrade / Downgrade Strategy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section is more about how do we roll out this new functionality. In this case it would roll out smooth as we're not changing any behavior on an existing cluster. This is more about new platforms, and future direction of CCO, should we start changing the default behavior and recommended best practice.

The notes below are relevant though and maybe could jump up to the design section.


We intend to build an optional tool **ccoctl** which will handle credentials management of the cluster in *manual* mode.
The following is the set of requirements for the current prototype (for AWS). We can have something similar for other
platforms.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a plan here for how ccoctl is distributed. I am really wondering if this should be exposed in oc adm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume distribution of ccoctl is specific to an enhancement around that component. this enhancement is more focused on 'should each cluster be empowered to mint scoped cloud credentials' as a default posture moving forward.

@sjenning
Copy link
Contributor

sjenning commented Apr 6, 2021

This doesn't have a lot of detail, but I'm on board with the general idea of pre-creating creds outside the cluster, parsing the CredentialsRequests policy and secret straight out of the release payload, and the cluster not having highly privileged AWS access i.e. IAM. 👍


### User Stories

#### Story 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these really three separate stories? They seem more like steps of the same story. In other words, is it meaningful for a user to do Story 1 by itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree that this seems like a single "extract and locally reconcile" use-case. Probably also need an "and destroy when I'm done with them" tail for that use case.


#### Story 3
As a OpenShift cluster administrator I should be able to install a OpenShift cluster, without providing admin credentials
to the installer, by injecting precreated Kubernetes Secrets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we envision a user doing this injection? Is it going to be through custom manifests added manually by the user to {install-dir}/manifests? I am concerned about the complexity of our recommended flow. Is it practical for the installer to do the job of the ccoctl tool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes manual mode today involved injecting via custom manifests, but ccoctl will generate everything for you and drop the files if desired.

Integrating right into openshift-install is interesting. If that is on the table that could be very useful, but we'd still want a ccoctl (or oc adm?) entrypoint for on-going maintenance I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for on-going maintenance. But it seems like a much better user experience to have the installer do the ccoctl work during install, rather than the preferred installation path requiring multiple steps with multiple tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with above for credentials that are scoped to a single cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler do we want to outline how that could work in this enhancement or leave it for later? If so, how could it work? :) Are you thinking new openshift-install subcommand, something integrated into installconfig, or other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that it would be the default behavior of the installer. There would not be a sub-command. If the user wanted to opt out of having the installer creates the roles and secret manifests, then there would be an option in the install config to tell the installer to not create them. Could that be a new value for credentialsMode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC same thing is happening when we set credentialsMode to manual. We might need to add a different (default) mode where in installer does ccoctl work as part of install.

### User Stories

#### Story 1
As a OpenShift cluster administrator I should be able to extract the CredentialsRequest manifests from the release image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ccoctl tool going to be tied to a release image? Or will the user need to know the release image?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect that it will be tied to a release image and extracted with oc release commands. I don't think it will be tightly coupled in that it would often work with future releases fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't quite answer my (poorly phrased) question. When using ccoctl, is the expectation that the user would need to specify the release image from which to get the CredentialsRequests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, for install prep yes you will need to refer to the release image with the cred requests you will need. I am hoping we can run day 2 and work against the CredRequests in a live cluster/current kube context to help users reconcile.

* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all
the required infrastructure is in place before starting the installation process. Current tooling we have only supports
the AWS cloud, we do not have anything planned for other cloud providers.
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way that push-button upgrades could work for upgrades where there are no changes in CredentialsRequests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As designed today, no, you will always need to indicate that your cluster has been prepped for the next upgrade. We wanted users who opt into Manual mode to be familiar with this flow so it was decided this would be better for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the "users who opt into Manual mode" the users with existing clusters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would not change anything automatically on anyone, but yes I believe a current cluster admin could flip their cluster to manual mode if they took explicit action to do so.

* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all
the required infrastructure is in place before starting the installation process. Current tooling we have only supports
the AWS cloud, we do not have anything planned for other cloud providers.
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure that this has been discussed at length, but this is a huge drawback. Going forward we would be saying that, by default, we do not support push-button upgrades.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's definitely unfortunate, it does however seem to mirror feedback from the field. I don't think Mint mode is going away, maybe not the default in the future but even that is up in the air and perhaps Mint is always the default. Right now if you opt into Manual, this is what you are signing up for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part that I am tripping up on is that the second sentence of the summary of this enhancement is "We propose to make manual mode as default for clusters on new platforms." That takes away the "opt-in" aspect of Manual mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, this is what we are proposing for a new cloud provider we expect to be added. All clusters on that platform would initially be in this mode. Mint may never happen on that platform, we're not sure.

Many providers today are passthrough only which is technically masking some of the problems we've tried to solve with manual mode, where nothing is actively preventing you from upgrading when your passthrough shared creds are lacking some new permission. So we're not forcing them to take action for upgrades but at the cost of potential breakage. I think these cloud providers tend to use pretty broad permissions so I don't recall seeing it bite anyone yet.

So I guess here, for new platforms, we are proposing that this is now the default, you do not have automated 4.x to 4.x+1 upgrades, you'll need to take action manually to clarify that you've prepared for it.

It should be noted that this does not apply to .z upgrades, 4.7.1 to 4.7.2 is automated without user interaction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this helps: the push-button upgrades are available as long as the cluster admin sets the appropriate annotation. Once the annotation is in place, the upgrade button will be available (whether the user has actually completed the necessary work will show up during/post upgrade).

The new requirement for doing some pre-upgrade steps is a direct consequence of the decision to move cloud/platform credentials management out of the cluster. Given this new directive, we're left with something needing to tell the cluster that is okay to perform an upgrade (since it can no longer self-service). We could encode this in the release image so that CCO/CVO can peek into the next version to know about the upgradeability...but I'm just brainstorming here and I don't think we've spent much time considering how to improve this new upgrade experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LalatenduMohanty @wking WDYT about including this logic in CVO?

Copy link
Member

@LalatenduMohanty LalatenduMohanty Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need for manual mode from customers. But are we sure we want to make it the default? Why not keeping mint mode defult but manual mode for customer who want this? May be this has been discussed before. I am asking because automatic upgrade was part of OCP/OTA longterm plan and making the manual mode as default is a big change. I hope product management has already signed-off on this.

I agree that we should separate Y stream updates from Z stream updates. We should keep the automatic updates enabled for Z stream updates if that is possible by changing CVO to validate if the cluster has right permissions, I am fine with the idea.

We would like to know more about the checks we need to add to CVO and then we can say more clearly if it will be feasible with respect to CVO's architecture.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the OpenShift product will get out of the business of self-managing cloud permissions/credentials. Mint mode would not be implemented today based on this directive. And based on this directive, supporting AWS Role-based/Pod Identity with short lived tokens (STS) is done outside of the cluster as CCO was not extended to support minting IAM Roles from inside the cloud-credential-operator.

This does not mean that we will take away Mint mode for the platforms where it exists today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the absence of #363, the currently recommended pattern is for information about 4.y to be back ported to 4.(y-1) operators, so they can intelligently say "ah, the current cluster state is not compatible with the information I've been taught about 4.y" and set Upgradeable=False. I'm not in favor about moving that responsibility into the CVO. For example, under some circumstances, the cred operator might be able to list permissions attached to existing credentials, see that they met the expanded 4.y criteria, and set Upgradeable=True. The CVO will never be able to perform that in-cluster cred permissions listing. I do really appreciate the just-works nature of mint and passthrough mode. Currently manual mode is a very small fraction of Telemetry-submitting clusters. But I also understand that handing clusters admin-level creds is a big ask. If cluster admins want to manually create expanded creds as needed (and ideally reduce perms if we reduce our asks), I am completely fine with that. But I would prefer the cred-operator have read-only perms sufficient to audit the creds vs. the cred-requests, regardless of who is granting the creds. I don't see any reason why cluster admins would be uncomfortable with such a read-only cred, but if and only if they are, the cred operator could fall back to the upgradeable-to flow, with Upgradeable=True and a reason like BlindManualMode with a message like I have no audit creds, so whether you have credentials in place for 4.(y+1) is entirely on you, cluster admins. Even then, the cred operator could go Upgradeable=False if a cred request it knew 4.(y+1) would ask for had no associated Secret at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cases where the cred-operator lacks a sufficient cred to audit CredRequest Secrets, it would also be good to have CI around each CredRequest to show that the consuming operator complains clearly if the Secret it has is insufficient to perform a task. That would make life easier on admins in the completely manual mode to identify and recover if/when they flub a Secret.

@derekwaynecarr
Copy link
Member

/assign


#### Story 2
As a OpenShift cluster administrator I should be able to extract the CredentialsRequest manifests from the release image
and create Kubernetes Secrets in to satisfy all in-cluster component's CredentialsRequests.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

required for a specific desired installation topology.

I anticipate we evolve https://github.com/openshift/enhancements/blob/master/enhancements/single-node/cluster-high-availability-mode-api.md#cluster-operators

where controlPlaneTopology will have an option for "External"

in that case the permissions are only pertinent for what runs within the cluster itself.

@akhil-rane akhil-rane force-pushed the cco_manual_mode_is_default branch from 04bbb13 to 6bd880a Compare April 21, 2021 17:31
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from derekwaynecarr after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Create/update credentials in the underlying cloud provider, and also create/update Kubernetes Secrets in the correct
namespaces to satisfy all CredentialsRequests in the new release.
* Set an appropriate annotation `cloudcredential.openshift.io/upgradeable-to` to a new upgradable version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the exact steps for installation and upgrades (keeping the POC for AWS in mind) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will give more clarity around how the user experience is changing because of this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no requirement to use ccoctl, but with the tool it would look like:
oc adm release extract --credentials-requests --cloud=aws --to=/path/to/extract (for the release image being upgraded to)
ccoctl aws create-iam-roles --region=us-east-1 --credentials-requests-dir=/path/to/extract --name "some-name-here-used-to-name-iam-roles" --identity-provider-arn=ARN:for_AWS_IAM_IDENTITY_PROVIDER_HERE --output-dir=/path/to/manifests
oc apply -f (for each file in /path/to/manifests)
oc patch (to apply the upgradeable-to annotation)

* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all
the required infrastructure is in place before starting the installation process. Current tooling we have only supports
the AWS cloud, we do not have anything planned for other cloud providers.
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials
Copy link
Member

@LalatenduMohanty LalatenduMohanty Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need for manual mode from customers. But are we sure we want to make it the default? Why not keeping mint mode defult but manual mode for customer who want this? May be this has been discussed before. I am asking because automatic upgrade was part of OCP/OTA longterm plan and making the manual mode as default is a big change. I hope product management has already signed-off on this.

I agree that we should separate Y stream updates from Z stream updates. We should keep the automatic updates enabled for Z stream updates if that is possible by changing CVO to validate if the cluster has right permissions, I am fine with the idea.

We would like to know more about the checks we need to add to CVO and then we can say more clearly if it will be feasible with respect to CVO's architecture.

have changed.
* Create/update credentials in the underlying cloud provider, and also create/update Kubernetes Secrets in the correct
namespaces to satisfy all CredentialsRequests in the new release.
* Set an appropriate annotation `cloudcredential.openshift.io/upgradeable-to` to a new upgradable version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add how this requirement is enforced? CCO sets Upgradeable=False whenever the next minor isn't included in the value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCO sets Upgradeable=False by default in Manual mode. When user is ready to upgrade (after making necessary changes to satisfy credentials requests in next version), they can cloudcredential.openshift.io/upgradeable-to:<version> to upgrade

@sdodson
Copy link
Member

sdodson commented Jul 1, 2021

Given this has already been implemented I've asked @akhil-rane to read it over and reconcile and differences between the enhancement and the implementation and then I'll merge this.

@akhil-rane akhil-rane force-pushed the cco_manual_mode_is_default branch from 6bd880a to b1720ea Compare July 1, 2021 20:45
@sdodson
Copy link
Member

sdodson commented Jul 2, 2021

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 2, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 2, 2021
@sdodson
Copy link
Member

sdodson commented Jul 2, 2021

/lgtm cancel
@akhil-rane please see the lint test failures, the linting jobs may be new since this PR was last updated, sorry.

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 2, 2021
@sdodson
Copy link
Member

sdodson commented Jul 7, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2021
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2021
@sdodson sdodson force-pushed the cco_manual_mode_is_default branch from 86b6ad9 to e6ac582 Compare July 7, 2021 19:42
@akhil-rane
Copy link
Contributor Author

@sdodson I think we need LGTM here

@sdodson
Copy link
Member

sdodson commented Jul 14, 2021

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 14, 2021
@openshift-merge-robot openshift-merge-robot merged commit f19cccc into openshift:master Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.