-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for credentials management outside the openshift cluster for new platforms #709
Add proposal for credentials management outside the openshift cluster for new platforms #709
Conversation
enhancements/installer/credentials-management-outside-openshift-cluster.md
Outdated
Show resolved
Hide resolved
enhancements/installer/credentials-management-outside-openshift-cluster.md
Outdated
Show resolved
Hide resolved
enhancements/installer/credentials-management-outside-openshift-cluster.md
Outdated
Show resolved
Hide resolved
enhancements/installer/credentials-management-outside-openshift-cluster.md
Outdated
Show resolved
Hide resolved
3435a57
to
04bbb13
Compare
## Summary | ||
|
||
The intent of this enhancement is to take the process of credentials management outside the OpenShift cluster for new | ||
platforms. We propose to make *manual* mode as default for clusters on new platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optionally supported by CLI tooling. (ccoctl)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dgoodwin +1
also add "that follows the best practices of the hosting infrastructure at that time."
As part of this enhancement we plan to do the following: | ||
* Set *manual* as default/preferred credentials mode for OpenShift | ||
* Not store admin level cloud credentials inside a cluster | ||
* Provide a tooling design for credentials management outside the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to tighten these up a little, I would restate the goals as:
- Guide new platforms to start with Manual mode rather than Mint mode.
- Provide optional tooling (ccoctl) for administrators to manage their credentials outside the cluster. (prep for install, reconcile, prep for upgrade)
- Establish precedent for future direction of prefering Manual over Mint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agree with above.
|
||
### Non-Goals | ||
|
||
* We do not plan to go into design details on how this tool will be ported to cloud providers other than AWS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing Mint mode for any currently supported cloud. (AWS, Azure, GCP)
Changing the default for any provider currently defaulting to Mint mode, though we may revisit in the future.
|
||
#### Story 3 | ||
As a OpenShift cluster administrator I should be able to install a OpenShift cluster, without providing admin credentials | ||
to the installer, by injecting precreated Kubernetes Secrets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an OpenShift cluster administrator, I should be able to reconcile my credentials manually using local CLI tooling on an on-going basis.
As an OpenShift cluster administrator, I should be able to prep credentials for a cluster upgrade.
|
||
### Graduation Criteria | ||
|
||
### Upgrade / Downgrade Strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section is more about how do we roll out this new functionality. In this case it would roll out smooth as we're not changing any behavior on an existing cluster. This is more about new platforms, and future direction of CCO, should we start changing the default behavior and recommended best practice.
The notes below are relevant though and maybe could jump up to the design section.
|
||
We intend to build an optional tool **ccoctl** which will handle credentials management of the cluster in *manual* mode. | ||
The following is the set of requirements for the current prototype (for AWS). We can have something similar for other | ||
platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a plan here for how ccoctl is distributed. I am really wondering if this should be exposed in oc adm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume distribution of ccoctl
is specific to an enhancement around that component. this enhancement is more focused on 'should each cluster be empowered to mint scoped cloud credentials' as a default posture moving forward.
This doesn't have a lot of detail, but I'm on board with the general idea of pre-creating creds outside the cluster, parsing the |
|
||
### User Stories | ||
|
||
#### Story 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these really three separate stories? They seem more like steps of the same story. In other words, is it meaningful for a user to do Story 1 by itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I agree that this seems like a single "extract and locally reconcile" use-case. Probably also need an "and destroy when I'm done with them" tail for that use case.
|
||
#### Story 3 | ||
As a OpenShift cluster administrator I should be able to install a OpenShift cluster, without providing admin credentials | ||
to the installer, by injecting precreated Kubernetes Secrets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we envision a user doing this injection? Is it going to be through custom manifests added manually by the user to {install-dir}/manifests? I am concerned about the complexity of our recommended flow. Is it practical for the installer to do the job of the ccoctl
tool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes manual mode today involved injecting via custom manifests, but ccoctl will generate everything for you and drop the files if desired.
Integrating right into openshift-install is interesting. If that is on the table that could be very useful, but we'd still want a ccoctl (or oc adm?) entrypoint for on-going maintenance I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for on-going maintenance. But it seems like a much better user experience to have the installer do the ccoctl work during install, rather than the preferred installation path requiring multiple steps with multiple tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with above for credentials that are scoped to a single cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@staebler do we want to outline how that could work in this enhancement or leave it for later? If so, how could it work? :) Are you thinking new openshift-install subcommand, something integrated into installconfig, or other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that it would be the default behavior of the installer. There would not be a sub-command. If the user wanted to opt out of having the installer creates the roles and secret manifests, then there would be an option in the install config to tell the installer to not create them. Could that be a new value for credentialsMode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC same thing is happening when we set credentialsMode
to manual. We might need to add a different (default) mode where in installer does ccoctl work as part of install.
### User Stories | ||
|
||
#### Story 1 | ||
As a OpenShift cluster administrator I should be able to extract the CredentialsRequest manifests from the release image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ccoctl
tool going to be tied to a release image? Or will the user need to know the release image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We expect that it will be tied to a release image and extracted with oc release commands. I don't think it will be tightly coupled in that it would often work with future releases fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't quite answer my (poorly phrased) question. When using ccoctl
, is the expectation that the user would need to specify the release image from which to get the CredentialsRequests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, for install prep yes you will need to refer to the release image with the cred requests you will need. I am hoping we can run day 2 and work against the CredRequests in a live cluster/current kube context to help users reconcile.
* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all | ||
the required infrastructure is in place before starting the installation process. Current tooling we have only supports | ||
the AWS cloud, we do not have anything planned for other cloud providers. | ||
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way that push-button upgrades could work for upgrades where there are no changes in CredentialsRequests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As designed today, no, you will always need to indicate that your cluster has been prepped for the next upgrade. We wanted users who opt into Manual mode to be familiar with this flow so it was decided this would be better for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the "users who opt into Manual mode" the users with existing clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would not change anything automatically on anyone, but yes I believe a current cluster admin could flip their cluster to manual mode if they took explicit action to do so.
* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all | ||
the required infrastructure is in place before starting the installation process. Current tooling we have only supports | ||
the AWS cloud, we do not have anything planned for other cloud providers. | ||
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure that this has been discussed at length, but this is a huge drawback. Going forward we would be saying that, by default, we do not support push-button upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it's definitely unfortunate, it does however seem to mirror feedback from the field. I don't think Mint mode is going away, maybe not the default in the future but even that is up in the air and perhaps Mint is always the default. Right now if you opt into Manual, this is what you are signing up for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That part that I am tripping up on is that the second sentence of the summary of this enhancement is "We propose to make manual mode as default for clusters on new platforms." That takes away the "opt-in" aspect of Manual mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, this is what we are proposing for a new cloud provider we expect to be added. All clusters on that platform would initially be in this mode. Mint may never happen on that platform, we're not sure.
Many providers today are passthrough only which is technically masking some of the problems we've tried to solve with manual mode, where nothing is actively preventing you from upgrading when your passthrough shared creds are lacking some new permission. So we're not forcing them to take action for upgrades but at the cost of potential breakage. I think these cloud providers tend to use pretty broad permissions so I don't recall seeing it bite anyone yet.
So I guess here, for new platforms, we are proposing that this is now the default, you do not have automated 4.x to 4.x+1 upgrades, you'll need to take action manually to clarify that you've prepared for it.
It should be noted that this does not apply to .z upgrades, 4.7.1 to 4.7.2 is automated without user interaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this helps: the push-button upgrades are available as long as the cluster admin sets the appropriate annotation. Once the annotation is in place, the upgrade button will be available (whether the user has actually completed the necessary work will show up during/post upgrade).
The new requirement for doing some pre-upgrade steps is a direct consequence of the decision to move cloud/platform credentials management out of the cluster. Given this new directive, we're left with something needing to tell the cluster that is okay to perform an upgrade (since it can no longer self-service). We could encode this in the release image so that CCO/CVO can peek into the next version to know about the upgradeability...but I'm just brainstorming here and I don't think we've spent much time considering how to improve this new upgrade experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LalatenduMohanty @wking WDYT about including this logic in CVO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need for manual mode from customers. But are we sure we want to make it the default? Why not keeping mint mode defult but manual mode for customer who want this? May be this has been discussed before. I am asking because automatic upgrade was part of OCP/OTA longterm plan and making the manual mode as default is a big change. I hope product management has already signed-off on this.
I agree that we should separate Y stream updates from Z stream updates. We should keep the automatic updates enabled for Z stream updates if that is possible by changing CVO to validate if the cluster has right permissions, I am fine with the idea.
We would like to know more about the checks we need to add to CVO and then we can say more clearly if it will be feasible with respect to CVO's architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the OpenShift product will get out of the business of self-managing cloud permissions/credentials. Mint mode would not be implemented today based on this directive. And based on this directive, supporting AWS Role-based/Pod Identity with short lived tokens (STS) is done outside of the cluster as CCO was not extended to support minting IAM Roles from inside the cloud-credential-operator.
This does not mean that we will take away Mint mode for the platforms where it exists today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the absence of #363, the currently recommended pattern is for information about 4.y to be back ported to 4.(y-1) operators, so they can intelligently say "ah, the current cluster state is not compatible with the information I've been taught about 4.y" and set Upgradeable=False
. I'm not in favor about moving that responsibility into the CVO. For example, under some circumstances, the cred operator might be able to list permissions attached to existing credentials, see that they met the expanded 4.y criteria, and set Upgradeable=True
. The CVO will never be able to perform that in-cluster cred permissions listing. I do really appreciate the just-works nature of mint and passthrough mode. Currently manual mode is a very small fraction of Telemetry-submitting clusters. But I also understand that handing clusters admin-level creds is a big ask. If cluster admins want to manually create expanded creds as needed (and ideally reduce perms if we reduce our asks), I am completely fine with that. But I would prefer the cred-operator have read-only perms sufficient to audit the creds vs. the cred-requests, regardless of who is granting the creds. I don't see any reason why cluster admins would be uncomfortable with such a read-only cred, but if and only if they are, the cred operator could fall back to the upgradeable-to
flow, with Upgradeable=True
and a reason like BlindManualMode
with a message like I have no audit creds, so whether you have credentials in place for 4.(y+1) is entirely on you, cluster admins
. Even then, the cred operator could go Upgradeable=False
if a cred request it knew 4.(y+1) would ask for had no associated Secret at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cases where the cred-operator lacks a sufficient cred to audit CredRequest Secrets, it would also be good to have CI around each CredRequest to show that the consuming operator complains clearly if the Secret it has is insufficient to perform a task. That would make life easier on admins in the completely manual mode to identify and recover if/when they flub a Secret.
/assign |
|
||
#### Story 2 | ||
As a OpenShift cluster administrator I should be able to extract the CredentialsRequest manifests from the release image | ||
and create Kubernetes Secrets in to satisfy all in-cluster component's CredentialsRequests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
required for a specific desired installation topology.
I anticipate we evolve https://github.com/openshift/enhancements/blob/master/enhancements/single-node/cluster-high-availability-mode-api.md#cluster-operators
where controlPlaneTopology
will have an option for "External"
in that case the permissions are only pertinent for what runs within the cluster itself.
04bbb13
to
6bd880a
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Create/update credentials in the underlying cloud provider, and also create/update Kubernetes Secrets in the correct | ||
namespaces to satisfy all CredentialsRequests in the new release. | ||
* Set an appropriate annotation `cloudcredential.openshift.io/upgradeable-to` to a new upgradable version. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the exact steps for installation and upgrades (keeping the POC for AWS in mind) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will give more clarity around how the user experience is changing because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no requirement to use ccoctl
, but with the tool it would look like:
oc adm release extract --credentials-requests --cloud=aws --to=/path/to/extract (for the release image being upgraded to)
ccoctl aws create-iam-roles --region=us-east-1 --credentials-requests-dir=/path/to/extract --name "some-name-here-used-to-name-iam-roles" --identity-provider-arn=ARN:for_AWS_IAM_IDENTITY_PROVIDER_HERE --output-dir=/path/to/manifests
oc apply -f (for each file in /path/to/manifests)
oc patch (to apply the upgradeable-to annotation)
* Taking the credentials management outside the cluster will create additional overhead for the customer to make sure all | ||
the required infrastructure is in place before starting the installation process. Current tooling we have only supports | ||
the AWS cloud, we do not have anything planned for other cloud providers. | ||
* Push-button upgrades will not work in *manual* mode as the cluster no longer has the admin credentials to mint credentials |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need for manual mode from customers. But are we sure we want to make it the default? Why not keeping mint mode defult but manual mode for customer who want this? May be this has been discussed before. I am asking because automatic upgrade was part of OCP/OTA longterm plan and making the manual mode as default is a big change. I hope product management has already signed-off on this.
I agree that we should separate Y stream updates from Z stream updates. We should keep the automatic updates enabled for Z stream updates if that is possible by changing CVO to validate if the cluster has right permissions, I am fine with the idea.
We would like to know more about the checks we need to add to CVO and then we can say more clearly if it will be feasible with respect to CVO's architecture.
have changed. | ||
* Create/update credentials in the underlying cloud provider, and also create/update Kubernetes Secrets in the correct | ||
namespaces to satisfy all CredentialsRequests in the new release. | ||
* Set an appropriate annotation `cloudcredential.openshift.io/upgradeable-to` to a new upgradable version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add how this requirement is enforced? CCO sets Upgradeable=False
whenever the next minor isn't included in the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCO sets Upgradeable=False
by default in Manual mode. When user is ready to upgrade (after making necessary changes to satisfy credentials requests in next version), they can cloudcredential.openshift.io/upgradeable-to:<version>
to upgrade
Given this has already been implemented I've asked @akhil-rane to read it over and reconcile and differences between the enhancement and the implementation and then I'll merge this. |
6bd880a
to
b1720ea
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sdodson The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm cancel |
/lgtm |
86b6ad9
to
e6ac582
Compare
@sdodson I think we need LGTM here |
/lgtm |
/cc @dgoodwin @abhinavdahiya @staebler @joelddiaz @sdodson @derekwaynecarr