Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to build simple arm64 containers on amd64 host #90

Closed
dackroyd opened this issue May 25, 2022 · 5 comments · Fixed by #91
Closed

Fails to build simple arm64 containers on amd64 host #90

dackroyd opened this issue May 25, 2022 · 5 comments · Fixed by #91

Comments

@dackroyd
Copy link
Contributor

We have run into some issues trying to build for multiple architectures (currently separate build + push for each) where an arm64 image is being built. Our Concourse workers are all amd64 arch, and we're looking to build for both that and for arm64.

In the case of the arm64 build, this fails with an error like:

error: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c <RUN command>" did not complete successfully: exit code: 1
time="2022-05-25T12:07:33Z" level=fatal msg="failed to build: build: exit status 1"
time="2022-05-25T12:07:33Z" level=fatal msg="failed to run task: exit status 1"
failed

I've managed to boil this down to a simple case which fails reproducibly:

FROM alpine:3.15

RUN apk add --no-cache \
            bash~=5

This fails with:

... trimmed ...
 > [2/2] RUN apk add --no-cache             bash~=5:
#5 0.320 fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/main/aarch64/APKINDEX.tar.gz
#5 1.168 fetch https://dl-cdn.alpinelinux.org/alpine/v3.15/community/aarch64/APKINDEX.tar.gz
#5 2.088 (1/4) Installing ncurses-terminfo-base (6.3_p20211120-r0)
#5 2.102 (2/4) Installing ncurses-libs (6.3_p20211120-r0)
#5 2.130 (3/4) Installing readline (8.1.1-r0)
#5 2.147 (4/4) Installing bash (5.1.16-r0)
#5 2.201 Executing bash-5.1.16-r0.post-install
#5 2.206 ERROR: bash-5.1.16-r0.post-install: script exited with error 1
#5 2.208 Executing busybox-1.34.1-r5.trigger
#5 2.257 1 error; 8 MiB in 18 packages
------
Dockerfile:3
--------------------
   2 |     
   3 | >>> RUN apk add --no-cache \
   4 | >>>             bash~=5
   5 |     
--------------------
error: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/sh -c apk add --no-cache             bash~=5" did not complete successfully: exit code: 1
time="2022-05-25T12:07:33Z" level=fatal msg="failed to build: build: exit status 1"
time="2022-05-25T12:07:33Z" level=fatal msg="failed to run task: exit status 1"
failed

Additional Notes:

  • Concourse version: v7.6.0
  • oci-build-task version: 0.10.0
@dackroyd
Copy link
Contributor Author

On further investigation, I found that the Dockerfile for oci-build-task uses moby/buildkit:0.9.3, which was upgraded to back in December 2021. 0.10.0 of this image was a fairly significant release, and one of the listed changes is "QEMU embedded emulators have been updated to v6.2.0" moby/buildkit#2634

Having re-built oci-build-task with this image upgraded to 0.10.3, and using this custom version to run our builds against, the issues we're encountering are resolved - i.e. the builds complete successfully. I'll raise a change to apply that upgrade here

dackroyd added a commit to dackroyd/oci-build-task that referenced this issue May 25, 2022
Fixes concourse#90

Signed-off-by: David Ackroyd <daveo.ackroyd@gmail.com>
dackroyd added a commit to dackroyd/oci-build-task that referenced this issue May 25, 2022
Fixes concourse#90

Signed-off-by: David Ackroyd <daveo.ackroyd@gmail.com>
@dackroyd
Copy link
Contributor Author

I’ll need to do a bit more digging tomorrow to isolate the affected hosts better, as I’ve not tested beyond our Concourse workers (which failed), and a quick check on an Intel MacBook (which passed) that needs to be confirmed. The issue title I’ve given suggests it is more widespread than I think it actually is in practice

@dackroyd
Copy link
Contributor Author

Got to do a little more testing today, and confirmed that without the buildkit:0.10.3 upgrade:

  • Intel MacBook: consistent pass
  • Concourse workers: consistent fail

With the 0.10.3 upgrade in place, both the Concourse workers and the Intel MacBook passed the build.

@xtremerui
Copy link
Contributor

xtremerui commented Oct 20, 2022

@dackroyd today in our own CI we encounter the same error

#9 [linux/arm64 2/2] RUN apt-get update
#0 0.141 .buildkit_qemu_emulator: /bin/sh: Invalid ELF image for this architecture
#9 ERROR: process "/dev/.buildkit_qemu_emulator /bin/sh -c apt-get update" did not complete successfully: exit code: 255

with oci-build-task v0.11 which I believe containes buildkit 0.10.4. We are trying to build concourse/dev image with oci-build-task params:

  IMAGE_PLATFORM: linux/arm64,linux/amd64
  OUTPUT_OCI: true

in worker with arch x86_64.

When merging #91 , we didn't have a chance to test like now, I am wondering if you could try using concourse/oci-build-task:0.11 to verify it still works in your case. Thank you!

@dackroyd
Copy link
Contributor Author

dackroyd commented Oct 20, 2022

@xtremerui it looks like you're running into a different issue here, i.e. Invalid ELF image for this architecture

This suggests to me that the build is running for the linux/arm64 architecture, but the image architecture is linux/amd64, thus the /bin/sh binary is x86_64, and the emulation fails (as sh is not an arm64 binary here)

I have had some success with the updated buildkit version to the point that simple builds do now work, however there is another issue with the emulation currently blocking anything non-trivial for us: tonistiigi/binfmt#112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants