Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connected assisted installer #376

Merged

Conversation

dhellmann
Copy link
Contributor

This enhancement describes changes in and around the installer to
assist with deployment on user-provisioned infrastructure. The use
cases are primarily relevant for bare metal, but in the future may be
applicable to cloud users who are running an installer UI directly
instead of using a front-end such as cloud.redhat.com or the UI
provided by their cloud vendor.

@dhellmann
Copy link
Contributor Author

/assign @bparees

@dhellmann
Copy link
Contributor Author

/assign @crawford

@dhellmann
Copy link
Contributor Author

/assign @abhinavdahiya

@dhellmann
Copy link
Contributor Author

/assign @eparis

@bparees
Copy link
Contributor

bparees commented Jun 15, 2020

@dhellmann not clear why i'm an assignee on this, is there a particular aspect you're looking for my feedback on?

@dhellmann
Copy link
Contributor Author

@dhellmann not clear why i'm an assignee on this, is there a particular aspect you're looking for my feedback on?

The bot gave me your name. I assumed it had some way to pick you as a reviewer other than /dev/random but perhaps not?

@bparees
Copy link
Contributor

bparees commented Jun 15, 2020

pretty much randomly selected from here: https://github.com/openshift/enhancements/blob/master/OWNERS

@sdodson probably more your area, or maybe you can pick someone more appropriate to be involved in reviewing this

/unassign @bparees

@dhellmann
Copy link
Contributor Author

OK, thanks @bparees

/assign @sdodson

@cgwalters
Copy link
Member

I only skimmed so far, but I love the ideas behind this!

This is building extensively on #210 and it'd be good to cross-link.

- Make initial deployment of usable and supportable clusters simpler.
- Move more infrastructure configuration from day 1 to day 2.
- Support connected on-premise deployments.
- Support existing IPI features, especially for day 2 cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: provisioner shouldn't matter for features, especially day-2 features. This should be:

  • Support cluster-managed infrastructure features, especially on day 2.

Or something like that...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware that it is currently possible to install a UPI bare metal cluster and then on day 2 use IPI to add workers. That's the goal here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are features that you get today on bare metal only if you used IPI and do not get with UPI, particularly power management via BMC. Given the direction we're going, it probably makes sense for us to stop referring to that as "existing IPI features" and get in the habit of using a more direct descriptor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we really just disagreeing about the wording? Or is the description unclear?

Copy link
Member

@wking wking Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look up baseboard management controller, so take this with a grain of "Trevor has no idea what he's talking about" salt ;). But it seems to me that "I have an existing metal cluster and want to give its machine API controller the ability and authority to manage Machines as a day 2 operation" would be a different enhancement from (and possibly a prerequisite for) this install-time enhancement. Couldn't you set up that day-2 management without needing to call out to cloud.redhat.com? You'd already have a local cluster where you could host whatever the incoming machines would need for their first boot...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the two things are somewhat related.

The existing bare metal IPI workflow requires management controller credentials for each host, so the installer can automate the deployment. Under that flow the installer has all of the information that would be required to turn the machine API on, so it does not make much sense to assume the machine API is disabled and we expect to always leave it enabled.

It also should be possible to use the UPI process to install a cluster and then add the machine API integration later. There's a separate set of work planned for that, regardless of the progress on this enhancement. For a UPI process, the user is going to have to do more work to set up the machine API, including BareMetalHost CRs with specific settings and connecting them to Machine CRs.

The important thing we're referring to here is that we can save some of the information available during the new automated installation process so that the user has less work to do to enable the machine API later. It seemed like adding the details of how that would work would be going too deep into planned implementation, but if we need more detail in this document about how that would work we can add it.

Copy link
Contributor

@hardys hardys Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current interface which decides if cluster-managed-baremetal pieces are enabled is the platform, so when you do an IPI deployment using platform: baremetal you get those features.

However the current UPI docs use platform: none and there's no currently supported way to enable the managed-baremetal pieces after initial deployment. There has been some work looking at how we might change that ref openshift/machine-api-operator#560

However I don't think making day-2 enablement of managed baremetal is a prerequisite for this assisted-installer workflow, because AFAIK the plan is to use the platform: baremetal even in the case where you choose the "boot it yourself" path - this also gives some other features like cluster-hosted loadbalancing and DNS, so it makes sense to reuse the same platform if possible. /cc @avishayt to keep me honest on plans here :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, we use installer platform: baremetal to generate the ignitions

@dhellmann dhellmann force-pushed the assisted-installer branch from d4c9dcc to d835d65 Compare June 15, 2020 21:31
@dhellmann dhellmann force-pushed the assisted-installer branch 3 times, most recently from dd6c11b to ec23fbd Compare June 15, 2020 21:37
As a cluster deployer, I want to install a production-ready OpenShift
cluster without committing to delegating all infrastructure control to
the installer or to the cluster, so I can adapt my existing admin
processes and infrastructure management tools instead of replacing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This story seems orthogonal to the enhancement. Folks can already do this with user-provisioned infrastructure, and afterwards delegate as much or as little of the infrastructure management as they want to the cluster, right? If there are missing delegation bits, sorting those out seems like it would be a day-2 issue, and this enhancement is about day-1 issues. Or am I missing a connection...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently make it easy for them to add the bare metal machine API on day 2 (see discussion above). Being able to do that is a key requirement for this work. Perhaps that deserves its own enhancement?

@ericavonb
Copy link
Contributor

Could you add some specifics notes on the security for this design? There are a lot of attack vectors in remote installation and auto-join systems, but maybe there are details I’m missing that make this model secure. Would be helpful to have them explicitly noted.

When the ISO boots, it starts an agent that communicates with the REST
API for the assisted installer service running on `cloud.redhat.com`
to receive instructions. The agent registers the host with the
service, using the user's pull secret embedded in the ISO's ignition config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romfreiman do we plan to embed the user pull secret to the ISO?
Currently this isn't the case and I think we can use some other token (that the cloud service generate) for the authentication rather than the user pull secret.
In general I think it's preferable not to have user secrets / passwords in the ISO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have concerns about holding user secrets server-side, this could be the service encrypting the pull secret to itself, storing the encrypted blob in the ISO, and just hanging on to the encryption key.

When enough hosts are configured, the assisted installer application
replies to the agent on each host with the instructions it needs to
take part in forming the cluster. The assisted installer application
selects one host to run the bootstrap services used during
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regard to the bootstrap service.
I think it's worth mentioning that unlike all other OpenShift installers, the assisted installer doesn’t use an auxiliary bootstrap node.
Instead, it will pivot the bootstrap to become a master node once the control plane is running on 2 other master nodes.
This flow reduces the minimum number of nodes to 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, it will pivot the bootstrap to become a master node once the control plane is running on 2 other master nodes.

Do we run a descheduler by default now? In addition to the "did we clean out all the bootstrap-specific stuff?" concerns we had about pivoting bootstrap into a compute node, pivoting into a control-plane node adds "do we end up with most of the critical pieces all lumped together on the two born-as-control-plane machines?". Although I guess a few rounds of control-plane reboots during subsequent updates and config rollouts would wash away any initial lopsided balancing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the "did we clean out all the bootstrap-specific stuff?" concerns we had about pivoting bootstrap into a compute node,

This is a pretty easy problem to solve, we can add support to Ignition to make everything ephemeral at the OS level (i.e. mount /var as a tmpfs, /etc as an overlayfs and everything else read-only). Then rebooting guarantees everything done at the filesystem level is gone (and there's no reason for an OpenShift install process to operate at the block level).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the discussion should probably move to #361.

This enhancement describes changes in and around the installer to
assist with deployment on user-provisioned infrastructure. The use
cases are primarily relevant for bare metal, but in the future may be
applicable to cloud users who are running an installer UI directly
instead of using a front-end such as `cloud.redhat.com` or the UI
provided by their cloud vendor.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
@dhellmann dhellmann force-pushed the assisted-installer branch from ec23fbd to 8df2b11 Compare June 16, 2020 13:13
@dhellmann
Copy link
Contributor Author

Could you add some specifics notes on the security for this design? There are a lot of attack vectors in remote installation and auto-join systems, but maybe there are details I’m missing that make this model secure. Would be helpful to have them explicitly noted.

@avishayt @romfreiman maybe we can work on responding to this request together?

@avishayt
Copy link

Could you add some specifics notes on the security for this design? There are a lot of attack vectors in remote installation and auto-join systems, but maybe there are details I’m missing that make this model secure. Would be helpful to have them explicitly noted.

@avishayt @romfreiman maybe we can work on responding to this request together?

We are beginning to formalize this and will provide the information once we have it compiled.

Copy link
Contributor

@crawford crawford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020

## Alternatives

The telco/edge bare metal team is working on support for automating
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The telco/edge bare metal team is working on support for automating
The bare metal IPI team is working on support for automating

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot say "IPI" in external docs. We need to say "installer-provisioned infrastructure". But do we really have a metal team that ignores user-provisioned infrastructure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #382

The team working on the automation does not support user-provisioned deployments today. IIUC, the main installer team supports those.

@markmc
Copy link
Contributor

markmc commented Jun 19, 2020

Could you add some specifics notes on the security for this design? There are a lot of attack vectors in remote installation and auto-join systems, but maybe there are details I’m missing that make this model secure. Would be helpful to have them explicitly noted.

@avishayt @romfreiman maybe we can work on responding to this request together?

We are beginning to formalize this and will provide the information once we have it compiled.

Definitely think @ericavonb is spot on with this request. Perhaps it can be tracked as a TODO or graduation requirement if y'all want to move forward with merging what has been written so far (and seems largely agreed on)

/lgtm

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, dhellmann, markmc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 19, 2020
@openshift-merge-robot openshift-merge-robot merged commit 3244219 into openshift:master Jun 19, 2020
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Jun 19, 2020
This is a follow-up change to openshift#376 with some placeholders for
discussion items that were not resolved before the previous PR merged.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
dhellmann added a commit to dhellmann/openshift-enhancements that referenced this pull request Jun 19, 2020
This is a follow-up change to openshift#376 with some placeholders for
discussion items that were not resolved before the previous PR merged.

Signed-off-by: Doug Hellmann <dhellmann@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.