Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: multi-arch build via qemu fails to exec go binary #68976

Open
bjohnso5 opened this issue Aug 20, 2024 · 52 comments
Open

runtime: multi-arch build via qemu fails to exec go binary #68976

bjohnso5 opened this issue Aug 20, 2024 · 52 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Milestone

Comments

@bjohnso5
Copy link

Go version

go version 1.23.0 linux/arm64

Output of go env in your module/workspace:

I'm unable to provide the output of `go env` as it fails with the same telemetry fork/exec error.

What did you do?

Our automated image build process fails to perform any step that invokes the go binary with the following error:

can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument

The Dockerfile is here, and is being built via a script that invokes docker buildx with multiple platforms, like:

docker buildx build --platform=linux/amd64,linux/arm64 --file 1.23/Dockerfile

It seems that there is something inherent in the qemu arm64 environment that renders go unable to fork itself to complete the telemetry setup. I'm fairly confident it's something specific to the 1.23 release as 1.22.6 builds successfully using the same setup today.

What did you see happen?

Failures to invoke any go command

What did you expect to see?

A successful install and configuration of go 1.23.0 in a multi-arch docker build.

@ianlancetaylor ianlancetaylor changed the title fork/exec failure: multi-arch build via qemu fails to exec go binary syscall: multi-arch build via qemu fails to exec go binary Aug 20, 2024
@ianlancetaylor ianlancetaylor added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 20, 2024
@fearfate
Copy link

fearfate commented Aug 21, 2024

I got the error when build arm64 image in amd host machine with buildx

docker buildx create --use --name=baker --driver docker-container  --platform=linux/amd64 --platform=linux/arm64 
docker buildx build --builder baker --platform=linux/amd64 --platform=linux/arm64  -t {tag} --push .

then I tried to run manually with docker run -it --rm --platform linux/arm64 {tag}

after unzip the command, I got the same error can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument, but when I exec the chmod a+x ${GOROOT}/bin/*, it works without any permission changes. However, after apply this command to Dockerfile,the error was not dealed

Dockerfile example:

FROM almalinux:9.4-20240530

ENV GOROOT=/usr/local/go \
    GOLANG_VERSION=1.23.0 \
    GOPATH=/go

ENV PATH=$GOPATH/bin:$PATH:$GOROOT/bin

RUN set -eox pipefail \
    && dnf install -y curl \
    && mkdir -p "${GOROOT}" "$GOPATH/src" "$GOPATH/bin" && chmod -R 1777 "$GOPATH" \
    && curl -sSL "https://go.dev/dl/go${GOLANG_VERSION}.linux-$(cat < /etc/arch).tar.gz" | tar -zxvC ${GOROOT} --strip-components=1 \
#    && chmod a+x ${GOROOT}/bin/* \
    && go version

WORKDIR $GOPATH

@seankhliao
Copy link
Member

With the circleci dockerfile, I get a segfault in gcc cc1 rather than something in go directly:

 > [linux/arm64 5/5] RUN	GO install "golang.org/x/vuln/cmd/govulncheck@v1.1.3" && go clean -cache -modcache && rm -rf "/home/circleci/go/pkg":                                                                                                   
0.120 + go install golang.org/x/vuln/cmd/govulncheck@v1.1.3                                                                 
0.528 go: downloading golang.org/x/vuln v1.1.3                                                                              
1.116 go: downloading golang.org/x/telemetry v0.0.0-20240522233618-39ace7a40ae7                                             
1.120 go: downloading golang.org/x/mod v0.19.0                                                                              
1.120 go: downloading golang.org/x/tools v0.23.0                                                                            
1.171 go: downloading golang.org/x/sync v0.7.0                                                                              
50.74 # net                                                                                                                 
50.74 gcc: internal compiler error: Segmentation fault signal terminated program cc1                                        
50.74 Please submit a full bug report,
50.74 with preprocessed source if appropriate.
50.74 See <file:///usr/share/doc/gcc-11/README.Bugs> for instructions.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load

 2 warnings found (use docker --debug to expand):
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 27)
 - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (line 28)
Dockerfile:48
--------------------
  46 |     USER circleci
  47 |     
  48 | >>> RUN	go install "golang.org/x/vuln/cmd/govulncheck@v${GOVULNCHECK_VERSION}" && go clean -cache -modcache && rm -rf "${GOPATH}/pkg"
  49 |     
--------------------
ERROR: failed to solve: process "/dev/.buildkit_qemu_emulator /bin/bash -exo pipefail -c go install \"golang.org/x/vuln/cmd/govulncheck@v${GOVULNCHECK_VERSION}\" && go clean -cache -modcache && rm -rf \"${GOPATH}/pkg\"" did not complete successfully: exit code: 1

@mgabeler-lee-6rs
Copy link

Might this have something to do with apparmor or other host "controls"? I was able to run the multiarch build on an Ubuntu 22.04 host using Docker 24.0.7 (from Ubuntu's packages) without errors, and inside the resulting arm64 container was able to run without errors:

  • go version
  • go telemetry on
  • go install golang.org/x/telemetry/cmd/gotelemetry@latest
  • gotelemetry on
  • gotelemetry upload (didn't have anything to upload, unsurprisingly)

@prattmic
Copy link
Member

Could you run the failing command under strace -F so we can see exactly which system call is failing?

@prattmic
Copy link
Member

cc @golang/telemetry

@findleyr findleyr added this to the Go1.23.1 milestone Aug 21, 2024
@findleyr
Copy link
Contributor

CC @matloob

Independent of the root cause, a failure to start the telemetry child process shouldn't prevent the go command from being used.

@bjohnso5
Copy link
Author

Could you run the failing command under strace -F so we can see exactly which system call is failing?

It appears the ptrace function(s) aren't implemented in the emulation environment:
image

@bjohnso5
Copy link
Author

Not sure if this is helpful, but I'm attaching two strace -f output files from the linux/arm64 golang:1.23.0 and golang:1.22.6 official images running go env. Note that these are in the successful case, but I'm hoping it might help with comparison if required.

go1.22.6_go_env_strace.txt
go1.23_go_env_strace.txt

@dmitshur
Copy link
Contributor

dmitshur commented Aug 21, 2024

Moved to Go1.24 milestone since this need to be fixed on the main branch first (for Go 1.24), before being considered for backporting. Please use the usual process (https://go.dev/wiki/MinorReleases) to create a separate backport tracking issue in the Go1.23.1 milestone.

@findleyr It's important that issues in the minor milestones are the backport kind with a CherryPickCandidate label, otherwise we might miss them in our release meeting review. Thanks.

@dmitshur dmitshur modified the milestones: Go1.23.1, Go1.24 Aug 21, 2024
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 21, 2024
@findleyr
Copy link
Contributor

Thanks again @dmitshur.

@gopherbot please backport this issue to 1.23: it is a regression that breaks the go command in certain environments.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #68995 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gabyhelp

This comment was marked as duplicate.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/607595 mentions this issue: telemetry: do not crash parent if child could not be started

nirui added a commit to nirui/sshwifty that referenced this issue Aug 24, 2024
…ng/go#68976). We'll disable bugged `linux/arm64` and `linux/arm` Docker image builds for now to avoid the failure, and re-enable it after Golang team fixed the problem.
@findleyr findleyr added the telemetry x/telemetry issues label Aug 26, 2024
gopherbot pushed a commit to golang/telemetry that referenced this issue Aug 27, 2024
Instead of calling log.Fatal if the child could not be started, call
log.Print. Various factors in the user's environment could cause the
child to not be able to start, but that shouldn't crash the parent
process (usually the go command).

Change other fatals into prints with early returns when attempting to
start the child.

Reset the crash output file to clean up if the child process could not
be started and crashmonitoring is enabled.

Updates golang/go#68976

Change-Id: I42f55dc90f68f91b272a7ebf64d2a4a3b00815c7
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/607595
Commit-Queue: Michael Matloob <matloob@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609195 mentions this issue: gopls: update x/telemetry to pick up recent bug fixes

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609196 mentions this issue: [gopls-release-branch.0.16] gopls: update x/telemetry to pick up recent bug fixes

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609256 mentions this issue: cmd: vendor golang.org/x/telemetry@e553cd4b

@prattmic
Copy link
Member

prattmic commented Sep 4, 2024

FWIW, Debian bookworm has QEMU 7.2 in its repository, which is probably why so many people have hit this.

To reproduce this issue:

go.mod:

module example.com/app

go 1.23

main.go:

package main

import (
        "fmt"
        "os"
        "os/exec"
)

func main() {
        if os.Getenv("TEST_SUBPROCESS") == "1" {
                fmt.Println("Hello from child")
                return
        }

        exe, err := os.Executable()
        if err != nil {
                panic(err)
        }

        cmd := exec.Command(exe)
        cmd.Env = append(cmd.Environ(), "TEST_SUBPROCESS=1")
        cmd.Stdout = os.Stdout
        cmd.Stderr = os.Stderr
        if err := cmd.Run(); err != nil {
                panic(err)
        }
}

Dockerfile:

FROM debian:bookworm

RUN apt-get update && apt-get install -y qemu-user-static

COPY app app

CMD ["qemu-x86_64-static", "app"]
$ go build   # outside the container
$ docker build -t issue68976_qemu .
$ docker run --rm -it issue68976_qemu

Without https://go.dev/cl/592078:

panic: fork/exec /app: invalid argument

goroutine 1 [running]:
main.main()
        /usr/local/google/home/mpratt/Downloads/issue68976_qemu/main.go:25 +0x17a

With https://go.dev/cl/592078:

Hello from child

@kevkevinpal
Copy link

kevkevinpal commented Sep 5, 2024

Having similar issue here https://github.com/stakwork/sphinx-tribes/actions/runs/10724756996/job/29741116565

will update if we find a work around

Update:
we fixed with this Dockerfile change: stakwork/sphinx-tribes@acca2f6

@alicethorne-ab
Copy link

We're also having this issue with Go 1.23.1, using a Bookworm-based image. This includes both the can't start telemetry child process and error obtaining buildID for go tool compile errors. It would be much appreciated to have this fix backported to Go 1.23.x.

@the-hotmann
Copy link

the-hotmann commented Sep 7, 2024

Same here. Building on Debian Host, but the images was a Alpine based one.
Have this error, since 1.23.1. Did not had this error with 1.23.0.

go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument

@matloob
Copy link
Contributor

matloob commented Sep 9, 2024

@alicethorne-ab I'd like to confirm: are you seeing a fatal can't start telemetry child process error? In 1.23.1 we still log the error, but it shouldn't be a fatal error anymore.

@Enrico204
Copy link

Enrico204 commented Sep 9, 2024

I have the same problem. I can see the error can't start telemetry child process in my log, but as you said, it is not fatal, so it is working properly.

Unfortunately, now I see the same error @the-hotmann is referring to when I try to use go install or go run. That error is fatal (exit code 1).

$ go run ./test.go
can't start telemetry child process: fork/exec /usr/local/go/bin/go: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument

$ cat test.go
package main

import "fmt"

func main() {
        fmt.Println("Hello world");
}

$ arch
aarch64

Here is the output of go install:

$ go install github.com/amacneil/dbmate/v2@v2.4.0
go: downloading github.com/amacneil/dbmate/v2 v2.4.0
go: downloading github.com/joho/godotenv v1.5.1
go: downloading github.com/urfave/cli/v2 v2.25.5
go: downloading github.com/lib/pq v1.10.9
go: downloading github.com/go-sql-driver/mysql v1.7.1
go: downloading github.com/ClickHouse/clickhouse-go/v2 v2.10.0
go: downloading github.com/ClickHouse/ch-go v0.56.0
go: downloading github.com/andybalholm/brotli v1.0.5
go: downloading github.com/pkg/errors v0.9.1
go: downloading go.opentelemetry.io/otel/trace v1.16.0
go: downloading go.opentelemetry.io/otel v1.16.0
go: downloading github.com/google/uuid v1.3.0
go: downloading github.com/paulmach/orb v0.9.2
go: downloading github.com/shopspring/decimal v1.3.1
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading github.com/cpuguy83/go-md2man/v2 v2.0.2
go: downloading github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673
go: downloading github.com/russross/blackfriday/v2 v2.1.0
go: downloading github.com/go-faster/city v1.0.1
go: downloading github.com/go-faster/errors v0.6.1
go: downloading github.com/klauspost/compress v1.16.5
go: downloading github.com/pierrec/lz4/v4 v4.1.17
go: downloading github.com/segmentio/asm v1.2.0
go: downloading golang.org/x/sys v0.8.0
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument
go: error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument

To reproduce:

$ podman run -it --rm --platform linux/arm64 golang:1.23.1
# printf "package main\n\nimport \"fmt\"\n\nfunc main() {\n\tfmt.Println(\"Hello world\");\n}\n" > test.go
# go run ./test.go

EDIT: missing --platform linux/arm64 in the command (lost in the copy-paste)

@samstride
Copy link

@prattmic , looks like that patch has a conflict? Any idea if this will make it into go v1.23.2?

BTW, I tried updating qemu to a newer version on the Debian 12 image but that didn't seem to work either.

@prattmic
Copy link
Member

It will likely be in 1.23.2.

What is the newer version of QEMU that did not work?

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/612218 mentions this issue: [release-branch.go1.23] os: add clone(CLONE_PIDFD) check to pidfd feature check

@samstride
Copy link

samstride commented Sep 12, 2024

@prattmic , I tried this:

# Debian 12 with go 1.23.1
FROM golang:1.23.1

RUN go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest && \
    go install github.com/securego/gosec/v2/cmd/gosec@latest && \
    go install golang.org/x/vuln/cmd/govulncheck@latest

The pipeline step with tonistiigi/binfmt:qemu-v8.1.5 looks like this:

build-golang:
  stage: build
  image: docker:latest
  services:
    - name: docker:dind
  script:
    - docker run --privileged --rm tonistiigi/binfmt:qemu-v8.1.5 --install all
    - docker context create multiarch-build
    - docker buildx create multiarch-build --name multiarch --driver docker-container --bootstrap --use
    - docker buildx build --push --platform linux/amd64,linux/arm64 -f Dockerfile.golang -t some-tag .

And I get this error during the multi-arch build:

error obtaining buildID for go tool compile: fork/exec /usr/local/go/pkg/tool/linux_arm64/compile: invalid argument

mickael-kerjean added a commit to mickael-kerjean/filestash that referenced this issue Sep 18, 2024
go 1.23 has a couple issue on arm that is documented here golang/go#68976
Up until this is solved we have to remove arm build because we need
partiioned cookie which is a go 1.23 feature
@andreadelfino
Copy link

I switched from multiarch/qemu-user-static (QEMU 7.2) to a custom image with QEMU 9.1 from debian:unstable and my multi-arch builds worked seamlessly. Uploaded here for reference: https://github.com/andreadelfino/qemu-user-static

It's probably just a workaround, and I still need to extensively test to see if everything downstream works as before.

bwplotka added a commit to GoogleCloudPlatform/alertmanager that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@gmail.com>
bwplotka added a commit to GoogleCloudPlatform/prometheus-engine that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@google.com>
bwplotka added a commit to GoogleCloudPlatform/prometheus-engine that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@google.com>
bwplotka added a commit to GoogleCloudPlatform/prometheus-engine that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@google.com>
bwplotka added a commit to GoogleCloudPlatform/prometheus-engine that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@google.com>
bwplotka added a commit to GoogleCloudPlatform/prometheus-engine that referenced this issue Sep 20, 2024
See golang/go#68976

Signed-off-by: bwplotka <bwplotka@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. telemetry x/telemetry issues
Projects
None yet
Development

No branches or pull requests