Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #3086 - amd64 images are pulled on all the architectures #3337

Merged
merged 1 commit into from
Oct 9, 2020

Conversation

bahetiamit
Copy link
Contributor

@bahetiamit bahetiamit commented Oct 6, 2020

Changes

This PR fixes an issue wherein even if the multi-arch image is available, always amd64 image is getting pulled (#3086)

This happens as go-containerregistry package pulls amd64 image by default (Reference - https://github.com/google/go-containerregistry/blob/9f2724c4b962e4eb0630f8611f9382fb83e4e2e7/pkg/v1/remote/options.go#L113 )

Although the change is generic and applicable for all other architectures, I was able to successfully test using following e2e test cases on POWER (ppc64le) platform for which ubuntu, busybox and alpine/git images are required and they are multi-arch images.

  1. TestWorkingDirCreated
  2. TestWorkingDirIgnoredNonSlashWorkspace
  3. TestEntrypointRunningStepsInOrder
  4. TestGitPipelineRun
  5. TestGitPipelineRunFail
  6. TestTaskRun_EmbeddedResource

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you
review them:

  • Includes tests (if functionality changed/added)
  • Includes docs (if user facing)
  • Commit messages follow commit message best practices
  • Release notes block has been filled in or deleted (only if no user facing changes)

See the contribution guide for more details.

Double check this list of stuff that's easy to miss:

The entrypoint lookup will pull the image for the controller's architecture instead of amd64 to better support different architectures

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 6, 2020

CLA Check
The committers are authorized under a signed CLA.

@tekton-robot tekton-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Oct 6, 2020
@tekton-robot tekton-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 6, 2020
@tekton-robot
Copy link
Collaborator

Hi @bahetiamit. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

I literally hit this last night and was going to debug and fix this today. I love OSS :)

Comment on lines 70 to 75
//By default go-containerregistry pulls amd64 images.
//Setting correct image pull architecture based on the underlying platform
var pf = v1.Platform {
Architecture: runtime.GOARCH ,
OS : "linux",
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: gofmt -s -w this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatted the change.

Architecture: runtime.GOARCH ,
OS : "linux",
}
img, err := remote.Image(ref, remote.WithAuthFromKeychain(mkc), remote.WithPlatform(pf))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably the right band-aid for supporting multi-arch on homogeneous clusters short-term, but we should probably open an issue to further discuss heterogeneous clusters (where the controller may be running on a different platform than the pod gets scheduled on).

One problem with this in general is that different architectures could (technically) have different entrypoints, so to properly support multi-arch on heterogeneous clusters, I think we will need to pass through all of the arch-specific configs and have the endpoint select the appropriate one based on where it schedules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give an example of a subtle, but significant variation in config, when playing with multi-arch kaniko I rebuilt with Bazel and ended up with /kaniko/executor_{arch} as the entrypoint in the respective architecture images. So while my early intuition was that these should largely be the same, I don't think we can assume that.

Again: this probably shouldn't hold up this tactical change for homogeneous clusters, but is something that should be considered for heterogeneous multi-arch support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmoor right, but we do fetch the image here to get the entrypoint from the config. So if the entrypoint is correctly set in the image to /kaniko/executor_{arch}, it shouldn't be a problem, should it ? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't be a problem, should it ?

Only if the arch where the pod will be scheduled matches the arch of the node where this code is running to look up the entrypoint. If you have a controller on amd64 and schedule a task on arm, you could run into issues (assuming I understand correctly what this code is doing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly the scenario I'm talking about. This is better than what's there today, so we should clean it up and merge it, but it doesn't close the book on multi-arch support :D

Copy link
Member

@vdemeester vdemeester Oct 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a controller on amd64 and schedule a task on arm, you could run into issues (assuming I understand correctly what this code is doing).

Ah, indeed, I didn't thought of that one 😓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth calling out this behavior in a comment in the code. Something like:

// By default go-containerregistry pulls amd64 images.
// Setting correct image pull architecture based on the underlying platform
// _of the node that Tekton's controller is running on_. If the cluster
// is comprised of nodes of heterogeneous architectures, this might cause issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a suggested edit with this above, since I was adding a nit for spacing anyhow.

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

/cc @imjasonh

@imjasonh
Copy link
Member

imjasonh commented Oct 6, 2020

/kind feature

@tekton-robot tekton-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 6, 2020
//Setting correct image pull architecture based on the underlying platform
var pf = v1.Platform {
Architecture: runtime.GOARCH ,
OS : "linux",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for good measure this should also be runtime.GOOS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Made the required changes.

@mattmoor
Copy link
Member

mattmoor commented Oct 6, 2020

FWIW with this and some other changes I was able to perform a kaniko build on my RPi4 🤩

It was crazy slow, but I will try graviton next 😉

@imjasonh
Copy link
Member

imjasonh commented Oct 7, 2020

@bahetiamit thanks for making this change! You'll need to sign the LF CLA to get this merged.

@bahetiamit
Copy link
Contributor Author

@bahetiamit thanks for making this change! You'll need to sign the LF CLA to get this merged.

@imjasonh My bad. I am working towards it. As this needs to be corporate CLA, it is taking some time.

@snehlatamohite
Copy link

We have CLA approvals just need to confirm manager at our end, I have started confirmation on this.

"github.com/google/go-containerregistry/pkg/authn"
"github.com/google/go-containerregistry/pkg/authn/k8schain"
"github.com/google/go-containerregistry/pkg/name"
v1 "github.com/google/go-containerregistry/pkg/v1"
"github.com/google/go-containerregistry/pkg/v1/remote"
lru "github.com/hashicorp/golang-lru"
"k8s.io/client-go/kubernetes"
"runtime"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically goimports wants this up under fmt with a line of whitespace before the github.com imports.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because of gofmt - golang/go#37719. Do you want me to change ordering? For this repo, we use goimports tool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt moved it down because you removed the blank line. May as well stick it back up at the top and add the newline while we are waiting on CLA 😅

Sort of annoying that different go tools futz with the same parts, but in inconsistent ways... thanks 🙏

@mattmoor
Copy link
Member

mattmoor commented Oct 8, 2020

This LGTM. Any update on the CLA @bahetiamit ?

@bahetiamit
Copy link
Contributor Author

This LGTM. Any update on the CLA @bahetiamit ?

@mattmoor Request to CLA manager is sent. May be in next few hours I will be in whitelist & we should be good to go.

@bahetiamit
Copy link
Contributor Author

@mattmoor EasyCLA is now in place.

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test
/lgtm

@tekton-robot tekton-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. lgtm Indicates that a PR is ready to be merged. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 9, 2020
@mattmoor
Copy link
Member

mattmoor commented Oct 9, 2020

/assign @imjasonh @vdemeester

Copy link
Member

@imjasonh imjasonh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 9, 2020
Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a release-note part 😉
/meow

@tekton-robot
Copy link
Collaborator

@vdemeester: cat image

In response to this:

Added a release-note part 😉
/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2020
@tekton-robot tekton-robot merged commit e2f528b into tektoncd:master Oct 9, 2020
barthy1 added a commit to barthy1/pipeline that referenced this pull request Oct 12, 2020
- bump alpine-git version to the latest one to be able to use multi-arch image
- remove some s390x e2e and examples tests from excluded list, the bug was fixed by tektoncd#3337
- add kubectl image replacement for s390x

Signed-off-by: Yulia Gaponenko <yulia.gaponenko1@de.ibm.com>
barthy1 added a commit to barthy1/pipeline that referenced this pull request Oct 12, 2020
- bump alpine-git version to the latest one to be able to use multi-arch image
- remove some s390x e2e and examples tests from excluded list, the bug was fixed by tektoncd#3337
- add kubectl image replacement for s390x

Signed-off-by: Yulia Gaponenko <yulia.gaponenko1@de.ibm.com>
barthy1 added a commit to barthy1/pipeline that referenced this pull request Oct 12, 2020
- bump alpine-git version to the latest one to be able to use multi-arch image
- remove some s390x e2e and examples tests from excluded list, the bug was fixed by tektoncd#3337
- add kubectl image replacement for s390x

Signed-off-by: Yulia Gaponenko <yulia.gaponenko1@de.ibm.com>
barthy1 added a commit to barthy1/pipeline that referenced this pull request Oct 12, 2020
- bump alpine-git version to the latest one to be able to use multi-arch image
- remove some s390x e2e and examples tests from excluded list, the bug was fixed by tektoncd#3337
- add kubectl image replacement for s390x

Signed-off-by: Yulia Gaponenko <yulia.gaponenko1@de.ibm.com>
tekton-robot pushed a commit that referenced this pull request Oct 12, 2020
- bump alpine-git version to the latest one to be able to use multi-arch image
- remove some s390x e2e and examples tests from excluded list, the bug was fixed by #3337
- add kubectl image replacement for s390x

Signed-off-by: Yulia Gaponenko <yulia.gaponenko1@de.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants