Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3086 - amd64 images are pulled on all the architectures #3337

Merged
merged 1 commit into from
Oct 9, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion pkg/pod/entrypoint_lookup_impl.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package pod

import (
"fmt"
"runtime"

"github.com/google/go-containerregistry/pkg/authn"
"github.com/google/go-containerregistry/pkg/authn/k8schain"
Expand Down Expand Up @@ -67,7 +68,15 @@ func (e *entrypointCache) Get(ref name.Reference, namespace, serviceAccountName
return nil, fmt.Errorf("error creating k8schain: %v", err)
}
mkc := authn.NewMultiKeychain(kc)
img, err := remote.Image(ref, remote.WithAuthFromKeychain(mkc))
// By default go-containerregistry pulls amd64 images.
// Setting correct image pull architecture based on the underlying platform
// _of the node that Tekton's controller is running on_. If the cluster
// is comprised of nodes of heterogeneous architectures, this might cause issues.
var pf = v1.Platform{
Architecture: runtime.GOARCH,
OS: runtime.GOOS,
}
img, err := remote.Image(ref, remote.WithAuthFromKeychain(mkc), remote.WithPlatform(pf))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the right band-aid for supporting multi-arch on homogeneous clusters short-term, but we should probably open an issue to further discuss heterogeneous clusters (where the controller may be running on a different platform than the pod gets scheduled on).

One problem with this in general is that different architectures could (technically) have different entrypoints, so to properly support multi-arch on heterogeneous clusters, I think we will need to pass through all of the arch-specific configs and have the endpoint select the appropriate one based on where it schedules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an example of a subtle, but significant variation in config, when playing with multi-arch kaniko I rebuilt with Bazel and ended up with /kaniko/executor_{arch} as the entrypoint in the respective architecture images. So while my early intuition was that these should largely be the same, I don't think we can assume that.

Again: this probably shouldn't hold up this tactical change for homogeneous clusters, but is something that should be considered for heterogeneous multi-arch support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmoor right, but we do fetch the image here to get the entrypoint from the config. So if the entrypoint is correctly set in the image to /kaniko/executor_{arch}, it shouldn't be a problem, should it ? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't be a problem, should it ?

Only if the arch where the pod will be scheduled matches the arch of the node where this code is running to look up the entrypoint. If you have a controller on amd64 and schedule a task on arm, you could run into issues (assuming I understand correctly what this code is doing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly the scenario I'm talking about. This is better than what's there today, so we should clean it up and merge it, but it doesn't close the book on multi-arch support :D

Copy link
Member

@vdemeester vdemeester Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a controller on amd64 and schedule a task on arm, you could run into issues (assuming I understand correctly what this code is doing).

Ah, indeed, I didn't thought of that one 😓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out this behavior in a comment in the code. Something like:

// By default go-containerregistry pulls amd64 images.
// Setting correct image pull architecture based on the underlying platform
// _of the node that Tekton's controller is running on_. If the cluster
// is comprised of nodes of heterogeneous architectures, this might cause issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a suggested edit with this above, since I was adding a nit for spacing anyhow.

if err != nil {
return nil, fmt.Errorf("error getting image manifest: %v", err)
}
Expand Down