-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connected assisted installer #376
connected assisted installer #376
Conversation
/assign @bparees |
/assign @crawford |
/assign @abhinavdahiya |
/assign @eparis |
ea66939
to
d4c9dcc
Compare
@dhellmann not clear why i'm an assignee on this, is there a particular aspect you're looking for my feedback on? |
The bot gave me your name. I assumed it had some way to pick you as a reviewer other than /dev/random but perhaps not? |
pretty much randomly selected from here: https://github.com/openshift/enhancements/blob/master/OWNERS @sdodson probably more your area, or maybe you can pick someone more appropriate to be involved in reviewing this /unassign @bparees |
I only skimmed so far, but I love the ideas behind this! This is building extensively on #210 and it'd be good to cross-link. |
- Make initial deployment of usable and supportable clusters simpler. | ||
- Move more infrastructure configuration from day 1 to day 2. | ||
- Support connected on-premise deployments. | ||
- Support existing IPI features, especially for day 2 cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: provisioner shouldn't matter for features, especially day-2 features. This should be:
- Support cluster-managed infrastructure features, especially on day 2.
Or something like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware that it is currently possible to install a UPI bare metal cluster and then on day 2 use IPI to add workers. That's the goal here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are features that you get today on bare metal only if you used IPI and do not get with UPI, particularly power management via BMC. Given the direction we're going, it probably makes sense for us to stop referring to that as "existing IPI features" and get in the habit of using a more direct descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we really just disagreeing about the wording? Or is the description unclear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to look up baseboard management controller, so take this with a grain of "Trevor has no idea what he's talking about" salt ;). But it seems to me that "I have an existing metal cluster and want to give its machine API controller the ability and authority to manage Machines as a day 2 operation" would be a different enhancement from (and possibly a prerequisite for) this install-time enhancement. Couldn't you set up that day-2 management without needing to call out to cloud.redhat.com
? You'd already have a local cluster where you could host whatever the incoming machines would need for their first boot...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the two things are somewhat related.
The existing bare metal IPI workflow requires management controller credentials for each host, so the installer can automate the deployment. Under that flow the installer has all of the information that would be required to turn the machine API on, so it does not make much sense to assume the machine API is disabled and we expect to always leave it enabled.
It also should be possible to use the UPI process to install a cluster and then add the machine API integration later. There's a separate set of work planned for that, regardless of the progress on this enhancement. For a UPI process, the user is going to have to do more work to set up the machine API, including BareMetalHost CRs with specific settings and connecting them to Machine CRs.
The important thing we're referring to here is that we can save some of the information available during the new automated installation process so that the user has less work to do to enable the machine API later. It seemed like adding the details of how that would work would be going too deep into planned implementation, but if we need more detail in this document about how that would work we can add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current interface which decides if cluster-managed-baremetal pieces are enabled is the platform, so when you do an IPI deployment using platform: baremetal
you get those features.
However the current UPI docs use platform: none
and there's no currently supported way to enable the managed-baremetal pieces after initial deployment. There has been some work looking at how we might change that ref openshift/machine-api-operator#560
However I don't think making day-2 enablement of managed baremetal is a prerequisite for this assisted-installer workflow, because AFAIK the plan is to use the platform: baremetal
even in the case where you choose the "boot it yourself" path - this also gives some other features like cluster-hosted loadbalancing and DNS, so it makes sense to reuse the same platform if possible. /cc @avishayt to keep me honest on plans here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we use installer platform: baremetal to generate the ignitions
d4c9dcc
to
d835d65
Compare
dd6c11b
to
ec23fbd
Compare
As a cluster deployer, I want to install a production-ready OpenShift | ||
cluster without committing to delegating all infrastructure control to | ||
the installer or to the cluster, so I can adapt my existing admin | ||
processes and infrastructure management tools instead of replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This story seems orthogonal to the enhancement. Folks can already do this with user-provisioned infrastructure, and afterwards delegate as much or as little of the infrastructure management as they want to the cluster, right? If there are missing delegation bits, sorting those out seems like it would be a day-2 issue, and this enhancement is about day-1 issues. Or am I missing a connection...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently make it easy for them to add the bare metal machine API on day 2 (see discussion above). Being able to do that is a key requirement for this work. Perhaps that deserves its own enhancement?
Could you add some specifics notes on the security for this design? There are a lot of attack vectors in remote installation and auto-join systems, but maybe there are details I’m missing that make this model secure. Would be helpful to have them explicitly noted. |
When the ISO boots, it starts an agent that communicates with the REST | ||
API for the assisted installer service running on `cloud.redhat.com` | ||
to receive instructions. The agent registers the host with the | ||
service, using the user's pull secret embedded in the ISO's ignition config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romfreiman do we plan to embed the user pull secret to the ISO?
Currently this isn't the case and I think we can use some other token (that the cloud service generate) for the authentication rather than the user pull secret.
In general I think it's preferable not to have user secrets / passwords in the ISO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have concerns about holding user secrets server-side, this could be the service encrypting the pull secret to itself, storing the encrypted blob in the ISO, and just hanging on to the encryption key.
When enough hosts are configured, the assisted installer application | ||
replies to the agent on each host with the instructions it needs to | ||
take part in forming the cluster. The assisted installer application | ||
selects one host to run the bootstrap services used during |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In regard to the bootstrap service.
I think it's worth mentioning that unlike all other OpenShift installers, the assisted installer doesn’t use an auxiliary bootstrap node.
Instead, it will pivot the bootstrap to become a master node once the control plane is running on 2 other master nodes.
This flow reduces the minimum number of nodes to 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, it will pivot the bootstrap to become a master node once the control plane is running on 2 other master nodes.
Do we run a descheduler by default now? In addition to the "did we clean out all the bootstrap-specific stuff?" concerns we had about pivoting bootstrap into a compute node, pivoting into a control-plane node adds "do we end up with most of the critical pieces all lumped together on the two born-as-control-plane machines?". Although I guess a few rounds of control-plane reboots during subsequent updates and config rollouts would wash away any initial lopsided balancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the "did we clean out all the bootstrap-specific stuff?" concerns we had about pivoting bootstrap into a compute node,
This is a pretty easy problem to solve, we can add support to Ignition to make everything ephemeral at the OS level (i.e. mount /var
as a tmpfs
, /etc
as an overlayfs and everything else read-only). Then rebooting guarantees everything done at the filesystem level is gone (and there's no reason for an OpenShift install process to operate at the block level).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the discussion should probably move to #361.
This enhancement describes changes in and around the installer to assist with deployment on user-provisioned infrastructure. The use cases are primarily relevant for bare metal, but in the future may be applicable to cloud users who are running an installer UI directly instead of using a front-end such as `cloud.redhat.com` or the UI provided by their cloud vendor. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
ec23fbd
to
8df2b11
Compare
@avishayt @romfreiman maybe we can work on responding to this request together? |
We are beginning to formalize this and will provide the information once we have it compiled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
||
## Alternatives | ||
|
||
The telco/edge bare metal team is working on support for automating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The telco/edge bare metal team is working on support for automating | |
The bare metal IPI team is working on support for automating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot say "IPI" in external docs. We need to say "installer-provisioned infrastructure". But do we really have a metal team that ignores user-provisioned infrastructure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #382
The team working on the automation does not support user-provisioned deployments today. IIUC, the main installer team supports those.
Definitely think @ericavonb is spot on with this request. Perhaps it can be tracked as a TODO or graduation requirement if y'all want to move forward with merging what has been written so far (and seems largely agreed on) /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford, dhellmann, markmc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a follow-up change to openshift#376 with some placeholders for discussion items that were not resolved before the previous PR merged. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
This is a follow-up change to openshift#376 with some placeholders for discussion items that were not resolved before the previous PR merged. Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
This enhancement describes changes in and around the installer to
assist with deployment on user-provisioned infrastructure. The use
cases are primarily relevant for bare metal, but in the future may be
applicable to cloud users who are running an installer UI directly
instead of using a front-end such as
cloud.redhat.com
or the UIprovided by their cloud vendor.