Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2033953: Afterburn to try config-drive before Nova metadata #2903

Merged

Conversation

pierreprinetti
Copy link
Member

@pierreprinetti pierreprinetti commented Jan 10, 2022

With this change, the afterburn-hostname service uses the 'openstack'
provider, which fetches userdata from the config drive, and only uses
the Nova metadata service as a fallback. This is especially useful in
network topologies where Nova-metadata is not guaranteed to be reachable
(e.g. when using baremetal (Ironic) instances).

With this change, the afterburn-hostname service uses the 'openstack'
provider, which fetches userdata from the config drive, and only uses
the Nova metadata service as a fallback. This is especially useful in
network topologies where Nova-metadata is not guaranteed to be reachable
(e.g. when using baremetal (Ironic) instances).

Co-authored-by: Martin André <m.andre@redhat.com>
Co-authored-by: Pierre Prinetti <pierreprinetti@redhat.com>
@openshift-ci openshift-ci bot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

@pierreprinetti: This pull request references Bugzilla bug 2033953, which is invalid:

  • expected the bug to target the "4.10.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2033953: Afterburn to try config-drive before Nova metadata

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pierreprinetti
Copy link
Member Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

@pierreprinetti: This pull request references Bugzilla bug 2033953, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @eurijon

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jan 10, 2022
@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jan 10, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 10, 2022

@pierreprinetti: This pull request references Bugzilla bug 2033953, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.10.0) matches configured target release for branch (4.10.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @eurijon

In response to this:

Bug 2033953: Afterburn to try config-drive before Nova metadata

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pierreprinetti
Copy link
Member Author

/cc mandre

@openshift-ci openshift-ci bot requested a review from mandre January 10, 2022 15:24
@cgwalters
Copy link
Member

This intersects a bit with a whole huge recent saga around hostnames; see https://bugzilla.redhat.com/show_bug.cgi?id=2008521#c40
and coreos/afterburn#509

A lot of confusion and overlap between "ownership" of the hostname. While this isn't new, the presence of this file is perpetuating a difference between FCOS and RHCOS behavior on OpenStack. (I think this is rooted in the uncertain availability of the metadata service across OpenStack deployments)

Another related topic; AIUI the reason for the current design is that availability of the config drive is racy. I don't know if that applies in Ironic or not.

@pierreprinetti
Copy link
Member Author

@cgwalters
My intention here is to only make a tactic adjustment to the way a host fetches userdata: changing the behaviour from "fetch Nova metadata" to "fetch configdrive; if failed, fetch Nova metadata".

I couldn't find Afterburn providers' documentation, but here's the relevant code:

match provider {
        // ...
        "openstack" => openstack::try_config_drive_else_network(),
        "openstack-metadata" => box_result!(OpenstackProviderNetwork::try_new()?),
        // ...

Is this change having a larger blast radius than what I expect?

/retest-required

@cgwalters
Copy link
Member

Yes,
/approve
since I think this is probably not going to break anything.

But one other question: what is the role of Ironic in OCP going forward? I thought the move to use the CoreOS "Live ISO" approach for IPI metal meant we wouldn't have this "use openstack on bare metal" stuff in the future.

@pierreprinetti
Copy link
Member Author

We want to offer OpenShift-on-OpenStack users the ability to use managed baremetal nodes within OpenStack. Beyond the value of leveraging tools that the users might already know, I believe one further added value is to support a mixed fleet of virtual and physical servers within a single OCP cluster. Does that make sense?

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pierreprinetti for the patch.
/lgtm

what is the role of Ironic in OCP going forward? I thought the move to use the CoreOS "Live ISO" approach for IPI metal meant we wouldn't have this "use openstack on bare metal" stuff in the future.

This is about allowing OpenShift on OpenStack deployments to leverage ironic and provision BM worker nodes, it's unrelated to IPI metal.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 11, 2022
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@mdbooth
Copy link
Contributor

mdbooth commented Jan 11, 2022

This intersects a bit with a whole huge recent saga around hostnames; see https://bugzilla.redhat.com/show_bug.cgi?id=2008521#c40 and coreos/afterburn#509

A lot of confusion and overlap between "ownership" of the hostname. While this isn't new, the presence of this file is perpetuating a difference between FCOS and RHCOS behavior on OpenStack. (I think this is rooted in the uncertain availability of the metadata service across OpenStack deployments)

Another related topic; AIUI the reason for the current design is that availability of the config drive is racy. I don't know if that applies in Ironic or not.

The specific reason for the creation of a persistent hostname was a different in kubelet's behaviour when using legacy cloud provider vs external cloud provider. In summary: when using a legacy cloud provider kubelet gets its node name from the cloud provider; when using an external cloud provider kubelet gets its node name from hostname unless you pass it on the command line.

It turns out that in practise hostname is not always identical to the value returned by the cloud provider. It's an upgrade issue if either:

  • Hostname changes on an already-provisioned host
  • Nodename changes for an already-provisioned kubelet

Consequently we decided to use afterburn to fetch hostname from the cloud provider and pass it to kubelet via the command line option, which means node name remains the same across a legacy->external CCM upgrade, and we don't have to change the hostname.

@mdbooth
Copy link
Contributor

mdbooth commented Jan 11, 2022

/lgtm

Config drive with fallback to metadata is ideal for getting hostname.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, mandre, mdbooth, pierreprinetti

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 11, 2022

@pierreprinetti: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node 60854e7 link false /test e2e-aws-single-node
ci/prow/e2e-aws-workers-rhel7 60854e7 link false /test e2e-aws-workers-rhel7
ci/prow/e2e-aws-disruptive 60854e7 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-upgrade-single-node 60854e7 link false /test e2e-aws-upgrade-single-node
ci/prow/e2e-aws-workers-rhel8 60854e7 link false /test e2e-aws-workers-rhel8
ci/prow/e2e-vsphere-upgrade 60854e7 link false /test e2e-vsphere-upgrade
ci/prow/e2e-ovn-step-registry 60854e7 link false /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 23bd462 into openshift:master Jan 12, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 12, 2022

@pierreprinetti: All pull requests linked via external trackers have merged:

Bugzilla bug 2033953 has been moved to the MODIFIED state.

In response to this:

Bug 2033953: Afterburn to try config-drive before Nova metadata

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pierreprinetti pierreprinetti deleted the afterburn_provider branch January 12, 2022 17:05
@pierreprinetti
Copy link
Member Author

/cherry-pick release-4.9

@openshift-cherrypick-robot

@pierreprinetti: new pull request created: #2915

In response to this:

/cherry-pick release-4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants