Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Baremetal: Support to send full ignition to masters #3276

Merged
merged 2 commits into from
Apr 16, 2020
Merged

Baremetal: Support to send full ignition to masters #3276

merged 2 commits into from
Apr 16, 2020

Conversation

kirankt
Copy link
Contributor

@kirankt kirankt commented Mar 11, 2020

Today, the installer uses a stub ignition that simply points to the full rendered ignition hosted by the bootstrap. However, the full ignition may contain configuration to configure network interfaces with bonding, vlans, etc.

That means the full ignition would need to be sent to the masters instead of the stub, because the hosts may not have network connectivity until after ignition is run. This PR enables the baremetal platform to optionally send full ignition to masters.

Depends on:
openshift-metal3/terraform-provider-ironic#38
metal3-io/baremetal-operator#455
openshift/cluster-api-provider-baremetal#65

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 11, 2020
@openshift-ci-robot
Copy link
Contributor

Hi @kirankt. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 11, 2020
@kirankt
Copy link
Contributor Author

kirankt commented Mar 11, 2020

/label platform/baremetal
/hold

@openshift-ci-robot openshift-ci-robot added the platform/baremetal IPI bare metal hosts platform label Mar 11, 2020
@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1612/

@hardys
Copy link
Contributor

hardys commented Mar 12, 2020

Looks good, but I think there are a few places where we can remove the old ignition_master stuff to only rely on the URL?

Also we need to start thinking about how to handle the user-data secrets that end up in the cluster (for both masters and workers) e.g check:

$ oc get secret worker-user-data -n openshift-machine-api -o jsonpath="{.data.userData}" | base64 --decode
{"ignition":{"config":{"append":[{"source":"https://[fd2e:6f44:5dd8:c956::5]:22623/config/worker" ...

We have an ordering problem here, because similar to the terraform assets, the providerSpec containing this is created in the installer, before the MCS on the bootstrap/cluster is available.

So we may need to modify the BareMetalHost schema to add a UserDataUrl or similar, which will require discussion in the upstream metal3 community

@kirankt
Copy link
Contributor Author

kirankt commented Mar 12, 2020

Looks good, but I think there are a few places where we can remove the old ignition_master stuff to only rely on the URL?

Also we need to start thinking about how to handle the user-data secrets that end up in the cluster (for both masters and workers) e.g check:

$ oc get secret worker-user-data -n openshift-machine-api -o jsonpath="{.data.userData}" | base64 --decode
{"ignition":{"config":{"append":[{"source":"https://[fd2e:6f44:5dd8:c956::5]:22623/config/worker" ...

We have an ordering problem here, because similar to the terraform assets, the providerSpec containing this is created in the installer, before the MCS on the bootstrap/cluster is available.

So we may need to modify the BareMetalHost schema to add a UserDataUrl or similar, which will require discussion in the upstream metal3 community

Regarding the schema addition, can't we just deduce the URL using the APIVIP and the familiar ignition port number?

@hardys
Copy link
Contributor

hardys commented Mar 12, 2020

Regarding the schema addition, can't we just deduce the URL using the APIVIP and the familiar ignition port number?

The problem isn't deducing the URL, it's that the cluster-api-provider-barmetal expects the full UserData (not a URL to download from) similar to the terraform provider, and at the point of creating the secret we only have the pointer/wrapper config (same as the situation with the terraform provider).

So, we probably need some new fields (same as the terraform provider) defined in the BMH schema, which can then be provided via the machineset provider spec, then some code to download the rendered config (which could either be in the cluster-api-provider or perhaps the baremetal-operator)

@kirankt
Copy link
Contributor Author

kirankt commented Mar 13, 2020

Looks good, but I think there are a few places where we can remove the old ignition_master stuff to only rely on the URL?

Also we need to start thinking about how to handle the user-data secrets that end up in the cluster (for both masters and workers) e.g check:

$ oc get secret worker-user-data -n openshift-machine-api -o jsonpath="{.data.userData}" | base64 --decode
{"ignition":{"config":{"append":[{"source":"https://[fd2e:6f44:5dd8:c956::5]:22623/config/worker" ...

We have an ordering problem here, because similar to the terraform assets, the providerSpec containing this is created in the installer, before the MCS on the bootstrap/cluster is available.

So we may need to modify the BareMetalHost schema to add a UserDataUrl or similar, which will require discussion in the upstream metal3 community

Regarding the schema addition, can't we just deduce the URL using the APIVIP and the familiar ignition port number?

Regarding the schema addition, can't we just deduce the URL using the APIVIP and the familiar ignition port number?

The problem isn't deducing the URL, it's that the cluster-api-provider-barmetal expects the full UserData (not a URL to download from) similar to the terraform provider, and at the point of creating the secret we only have the pointer/wrapper config (same as the situation with the terraform provider).

So, we probably need some new fields (same as the terraform provider) defined in the BMH schema, which can then be provided via the machineset provider spec, then some code to download the rendered config (which could either be in the cluster-api-provider or perhaps the baremetal-operator)

Got it. Thanks. Will get going on the BMH/CAPM3 side.

@kirankt
Copy link
Contributor Author

kirankt commented Mar 17, 2020

@hardys I've tried to render the ignition (based on your original example) in BMO before we create the config drive. Please take a look and let me know what you think.
https://github.com/kirankt/baremetal-operator/blob/3127ea4521598688689d4cd29fd99f7977870909/pkg/controller/baremetalhost/host_config_data.go#L92

@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1640/

@kirankt
Copy link
Contributor Author

kirankt commented Mar 17, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 17, 2020
@kirankt
Copy link
Contributor Author

kirankt commented Mar 31, 2020

@hardys is this a good time to remove the 'WIP' label from this PR? Or do we wait until the CAPBM PR merges? openshift/cluster-api-provider-baremetal#65

@stbenjam
Copy link
Member

stbenjam commented Apr 1, 2020

/ok-to-test

@openshift-ci-robot openshift-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 1, 2020
@kirankt kirankt changed the title WIP: Baremetal: Support to send full ignition to masters Baremetal: Support to send full ignition to masters Apr 1, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2020
@kirankt
Copy link
Contributor Author

kirankt commented Apr 1, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2020
@stbenjam
Copy link
Member

stbenjam commented Apr 1, 2020

@kirankt Have a look at the tf-fmt and golint failures, you'll have to address those.

Have a look at https://www.terraform.io/docs/commands/fmt.html

@stbenjam
Copy link
Member

stbenjam commented Apr 1, 2020

stuck

So far my attempt to test this, my masters seem stuck at boot and there's no CPU usage, not sure what's going on. Looking at terraform state, it does appear we sent the full ignition to Ironic. So that part is working.

@kirankt
Copy link
Contributor Author

kirankt commented Apr 15, 2020

@abhinavdahiya. Ping. Need your feedback/approval for this PR ASAP. We're trying to make it into the 4.5 release. If you're busy, please assign it to someone else in your team.

…r will track will track installer-specific TF changes made to the baremetal platform
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2020
@stbenjam
Copy link
Member

/test e2e-metal-ipi

@sdodson
Copy link
Member

sdodson commented Apr 15, 2020

Installer team advises against using a partial implementation of Ignition handling to resolve the stub to the MCS url and fetch that. If in the future the stub adds additional required functionality that will need to be considered by baremetal team. However this appears to be localized to baremetal therefore we'll approve this but please consider moving to a more complete Ignition handler to avoid future risk.

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sdodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 15, 2020
@stbenjam
Copy link
Member

/lgtm

@stbenjam
Copy link
Member

Thanks, we'll have discussions during 4.6 planning about how to move away from this.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2020
@stbenjam
Copy link
Member

/test e2e-aws-upgrade

@stbenjam
Copy link
Member

/test e2e-aws

@stbenjam
Copy link
Member

/test e2e-aws-upgrade

3 similar comments
@kirankt
Copy link
Contributor Author

kirankt commented Apr 16, 2020

/test e2e-aws-upgrade

@kirankt
Copy link
Contributor Author

kirankt commented Apr 16, 2020

/test e2e-aws-upgrade

@kirankt
Copy link
Contributor Author

kirankt commented Apr 16, 2020

/test e2e-aws-upgrade

@stbenjam
Copy link
Member

/test e2e-aws-upgrade
/test e2e-aws

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 16, 2020

@kirankt: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-openstack 8c821d2 link /test e2e-openstack
ci/prow/e2e-aws-scaleup-rhel7 8c821d2 link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kirankt
Copy link
Contributor Author

kirankt commented Apr 16, 2020

/test golint
/test verify-vendor

@openshift-merge-robot openshift-merge-robot merged commit 883bb3a into openshift:master Apr 16, 2020
hardys pushed a commit to hardys/installer that referenced this pull request May 12, 2020
This was merged in openshift#3276
but had the side-effect of breaking an interface some users were
already reliant on, where additional config is added to the
pointer ignition output via `create ignition-configs`

Downstream bz https://bugzilla.redhat.com/show_bug.cgi?id=1833483
reported this issue, and we've taken the decision to revert this
to resolve that interface regression, and look at other ways to
satisfy the requirements that led to openshift#3276

This reverts commit f67d61a.
hardys pushed a commit to hardys/installer that referenced this pull request Dec 2, 2020
This restores the work which was previously done via openshift#3276
but then reverted via openshift#3589 due to breaking users who customized
the pointer ignition config in IPI deployments.

A solution to that has been proposed via openshift#4413 - see openshift/enhancements#540 for more details.

Note that some additional changes beyond the initial implementation
were required due to the MCO now supporting multiple
ignition versions, thus this depends on openshift-metal3/terraform-provider-ironic#46

Co-Authored-By: Steven Hardy <shardy@redhat.com>
hardys pushed a commit to hardys/installer that referenced this pull request Dec 2, 2020
This restores the work which was previously done via openshift#3276
but then reverted via openshift#3589 due to breaking users who customized
the pointer ignition config in IPI deployments.

A solution to that has been proposed via openshift#4413 - see openshift/enhancements#540 for more details.

Note that some additional changes beyond the initial implementation
were required due to the MCO now supporting multiple
ignition versions, thus this depends on openshift-metal3/terraform-provider-ironic#46

Co-Authored-By: Steven Hardy <shardy@redhat.com>
michaelgugino pushed a commit to mgugino-upstream-stage/installer that referenced this pull request Dec 4, 2020
This restores the work which was previously done via openshift#3276
but then reverted via openshift#3589 due to breaking users who customized
the pointer ignition config in IPI deployments.

A solution to that has been proposed via openshift#4413 - see openshift/enhancements#540 for more details.

Note that some additional changes beyond the initial implementation
were required due to the MCO now supporting multiple
ignition versions, thus this depends on openshift-metal3/terraform-provider-ironic#46

Co-Authored-By: Steven Hardy <shardy@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. platform/baremetal IPI bare metal hosts platform size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants