-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MULTIARCH-4569: aws: support multi-arch nodes #8698
MULTIARCH-4569: aws: support multi-arch nodes #8698
Conversation
@r4f4: This pull request references MULTIARCH-4569 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/cc @jeffdyoung |
Testing this locally with x86 controlPlane nodes and arm64 compute nodes:
|
Update: finally figured out why some platforms were failing in CI but not in my local tests:
When I made the rhcos asset multi-arch aware, I added to it unexported fields. That meant the json serializer was not able to read those values and the asset was serialized as Making the struct fields exported (i.e. Uppercase) should fix the problem. |
@r4f4: This pull request references MULTIARCH-4569 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
9f1f813
to
b03d715
Compare
b03d715
to
58dcf3e
Compare
Update: changed the first commit to point to |
/lgtm |
CI jobs are being added as part of openshift/release#54129. I'll update this PR's description with the link. |
@r4f4: This pull request references MULTIARCH-4569 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is the first step to implement multi-arch clusters for AWS: allow users to specify a different architecture for compute nodes when the `MultiArchInstallAWS` feature gate is enabled.
58dcf3e
to
20e04e3
Compare
Update: rebased on top of current master since #8713 merged with the o/api bump. |
@r4f4: This pull request references MULTIARCH-4569 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test ? |
@r4f4: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
A CI job was just added in the release repo. Let's try it out here: /test e2e-aws-ovn-heterogeneous |
@@ -52,101 +55,107 @@ func (i *Image) Dependencies() []asset.Asset { | |||
func (i *Image) Generate(ctx context.Context, p asset.Parents) error { | |||
if oi, ok := os.LookupEnv("OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE"); ok && oi != "" { | |||
logrus.Warn("Found override for OS Image. Please be warned, this is not advised") | |||
*i = Image(oi) | |||
*i = *MakeAsset(oi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in the override case - the same image would be used for cp/computes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If we need to support 2 image overrides, then we'd have to either require the OPENSHIFT_INSTALL_OS_IMAGE
to have both images (e.g, separated by ,
) or we'd have to introduce a new env var for compute images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a change to the installer doc mentioning this is also needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have docs for that env var as it's not officially supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Prashanth684 can you envision a situation where we'd need to override both controlplane and compute images with distinct values for multi-arch clusters? I'd rather not touch this env var as it's used by agent and powervs (at least not in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not touch this env var as it's used by agent and powervs
it makes sense to address separately. Do they use it for installation - if so we have to make sure that it works or that we don't allow them to provision a cluster with multi-arch computes on day 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent: used when generating the base ISO: https://github.com/openshift/installer/blob/master/pkg/asset/agent/image/baseiso.go#L149-L151
powervs: it seems it's abusing the envvar: https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/powervs/platform.go#L19-L24
we don't allow them to provision a cluster with multi-arch computes on day 0
The only way to bypass the heterogeneous validation in the installer is by setting the feature gate: d9e5e66
That only happens for AWS with this PR.
pkg/asset/rhcos/image.go
Outdated
} | ||
switch config.Platform.Name() { | ||
case aws.Name: | ||
if len(config.Platform.AWS.AMIID) > 0 { | ||
return config.Platform.AWS.AMIID, nil | ||
if ami := config.Platform.AWS.AMIID; len(ami) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could just call osimage twice in the generate rather than do this - and this is just returning the image matching the cp arch which will not work if the compute arch is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THB I don't even know why we have amiID
at the platform level if we can specify it for controlPlane/compute/defaultMachinePlatform. If someone specifies this amiID for nodes with different arches, it's more of user error than anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's like an optimization so they don't have to enter it twice :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickdillon Do you know what's the purpose of amiID
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickdillon Do you know what's the purpose of amiID here?
We probably need to do some archaeology to confirm, but I think this was the initial, non-machinepool implementation for custom AMI and was replaced by machinepools so it should be deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'll create a card for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some platforms, we will need to be able to get different RHCOS images based on the architecture of the nodes. Currently it's assumed that the same image is used for all nodes.
Use different instance types based on the node's architecture.
376679a
to
0847771
Compare
Update: addressed most of @Prashanth684's comments. |
are expected and we are going to skip them until https://issues.redhat.com/browse/MULTIARCH-4552 is complete. |
@r4f4: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
/label acknowledge-critical-fixes-only |
/label platform/aws |
@sadasu your comments should all be addressed now. Can you take another look? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: patrickdillon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8b80710
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: ose-installer-altinfra |
[ART PR BUILD NOTIFIER] Distgit: ose-baremetal-installer |
[ART PR BUILD NOTIFIER] Distgit: ose-installer-terraform-providers |
[ART PR BUILD NOTIFIER] Distgit: ose-installer-artifacts |
This PR adds support for multi-arch nodes (amd64 controlPlane and arm64 compute, or vice-versa). To accomplish that we:
MultiArchInstallAWS
/hold
Depends on: