Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multistage docker build fails with unexpected EOF #40993

Closed
steinybot opened this issue May 18, 2020 · 14 comments
Closed

Multistage docker build fails with unexpected EOF #40993

steinybot opened this issue May 18, 2020 · 14 comments
Labels
area/builder kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03

Comments

@steinybot
Copy link

steinybot commented May 18, 2020

Description

Steps to reproduce the issue:

DOCKER_BUILDKIT=1 docker build \
  --build-arg BUILDKIT_INLINE_CACHE=1 \
  --cache-from "community-build-image:latest" \
  --file docker/Dockerfile \
  --tag "community-build-image:latest" \
  .

Describe the results you received:

...
 => CACHED [stage-1 10/11] COPY project project                                                                                                                   0.0s
 => CACHED [stage-1 11/11] RUN source "/root/.bashrc" &&   nvm use &&   sdk env &&   sbt update Test/update npmUpdate Test/npmUpdate                              0.0s
------
 > importing cache manifest from community-build-image:latest:
------
unexpected EOF
[1]+  Exit 2

Describe the results you expected:

The build should succeed.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:22:56 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:30:32 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.76-linuxkit
 Operating System: Amazon Linux 2 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 1.944GiB
 Name: 64112fa3bd2a
 ID: Z2SE:SWKL:JJGS:IRNP:5SPC:UKDY:WQ7Q:Z7U2:Z3GP:PN2F:5YKU:DCPP
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 21
  Goroutines: 41
  System Time: 2020-05-18T21:45:23.4842333Z
  EventsListeners: 0
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional environment details (AWS, VirtualBox, physical, etc.):

I first encountered this when trying to build a Docker image in AWS CodeBuild using the image aws/codebuild/amazonlinux2-x86_64-standard:3.0.

I built the image myself from https://github.com/aws/aws-codebuild-docker-images/tree/master/al2/x86_64/standard/3.0 and I can reproduce it reliably.

Here is the daemon log file:
dockerd-logfile.txt

It looks as though the root cause is a stack overflow.

@thaJeztah
Copy link
Member

@tonistiigi ptal

@tonistiigi
Copy link
Member

Please provide a reproducer

@steinybot
Copy link
Author

This seems to do it without requiring any additional files:

FROM amazonlinux:latest AS sdks

# These are required for SDKMAN.
# `which` should not be required but it is.
# See https://github.com/sdkman/sdkman-cli/issues/759.
RUN yum install -y \
  gzip \
  tar \
  unzip \
  which \
  zip

RUN touch "/root/.bashrc"

RUN curl -s 'https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh' | bash

# The version isn't a real parameter but it serves to invalidate the cache.
RUN curl -s 'https://get.sdkman.io?version=5.8.1' | bash

RUN echo '14.1.0' > '.nvmrc'
RUN printf 'java=14.0.1.hs-adpt\nsbt=1.3.10\n' > '.sdkmanrc'

RUN source "/root/.bashrc" && nvm install

RUN source "/root/.bashrc" && sdk install java '14.0.1.hs-adpt'
RUN source "/root/.bashrc" && sdk install sbt
# This just double checks that the versions are the same as the .sdkmanrc file.
RUN source "/root/.bashrc" && sdk env


FROM amazonlinux:latest

RUN yum install -y \
  curl \
  git \
  openssh

COPY --from=sdks '/root/.nvm' '/root/.nvm'

@steinybot
Copy link
Author

I can't seem to reproduce this on my mac, only within that amazonlinux2 container.

@tonistiigi
Copy link
Member

Please provide all the steps that are needed to reproduce. I would guess you don't get the above error by just running docker build . with default options with the above Dockerfile. If it needs a specific machine from aws, that is fine but again you need to provide steps for getting that environment and configuration that is causing you issues.

@steinybot
Copy link
Author

steinybot commented May 20, 2020

First you need to build the AWS CodeBuild image (or just run the final docker build step in AWS CodeBuild which was how I encountered this issue to begin with):

$ git clone https://github.com/aws/aws-codebuild-docker-images.git
$ cd aws-codebuild-docker-images/al2/x86_64/standard/3.0/
$ docker build -t aws/codebuild/standard:2.0 .

Get the files to reproduce it (I don't know what is relevant):

$ cd ..
$ git clone git@github.com:steinybot/bug-reports.git
$ cd bug-reports
$ git checkout moby/unexpected-eof

Now run the container (this is the long way but it works):

$ docker run -it --detach --name codebuild2 --mount type=bind,source="$(pwd)",target=/src --privileged=true --entrypoint bin/bash aws/codebuild/amazonlinux2-x86_64-standard:3.0
$ docker start codebuild2
$ docker attach codebuild2
bash-4.2# cd /src
bash-4.2# dockerd -D -l debug &>dockerd-logfile.txt &
bash-4.2# DOCKER_BUILDKIT=1 docker build --build-arg BUILDKIT_INLINE_CACHE=1 --cache-from "community-build-image:latest" --file docker/Dockerfile --tag "community-build-image:latest" .

An interesting thing is that once I reproduce the issue then I can also reproduce it with the smaller self contained Dockerfile above but if I start with a clean container then it requires the larger one with additional files. Doing trivial changes like removing curl from the apt-get command would prevent the error.

@steinybot
Copy link
Author

@tonistiigi are these steps sufficient to reproduce the problem?

Perhaps this issue belongs in https://github.com/moby/buildkit instead. The problem doesn't happen if I remove DOCKER_BUILDKIT=1 and BUILDKIT_INLINE_CACHE=1 so it would appear to be caused by enabling buildkit.

@tonistiigi
Copy link
Member

@steinybot Yes, I can reproduce now.

I think this might be fixed with moby/buildkit#1382

@tonistiigi
Copy link
Member

tonistiigi commented May 26, 2020

In case you prefer a workaround to upgrade, the thing that triggers this case is when you are copying the same files from the stage where they were already copied https://github.com/steinybot/bug-reports/blob/moby/unexpected-eof/docker/Dockerfile#L20-L21 https://github.com/steinybot/bug-reports/blob/moby/unexpected-eof/docker/Dockerfile#L41-L42

That creates a loop where the COPY source is cached by 2 cache locations (stage and context) so the first copy has a cache based on its own stage result.

If you remove --from=sdks from https://github.com/steinybot/bug-reports/blob/moby/unexpected-eof/docker/Dockerfile#L41-L42 this build should work fine again. You also need to delete your previous cache that already has the loops.

@steinybot
Copy link
Author

Thanks a lot for the update! I'll give that a go. Hopefully I can remove the cache from AWS CodeBuild.

@thaJeztah
Copy link
Member

Opened moby/buildkit#1514 to backport to the BuildKit "docker-19.03" / v0.6.x branch. After that's merged, it needs to be vendor in the 19.03 branch in this repository to resolve the issue for 19.03

thaJeztah added a commit to thaJeztah/docker that referenced this issue Jul 8, 2020
….6.4-15-gdc6afa0f)

full diff: moby/buildkit@a7d7b7f...dc6afa0

- solver: avoid recursive loop on cache-export
    - fixes moby/buildkit#1336 --export-cache option crashes buildkitd on custom frontend
    - fixes moby/buildkit#1313 Dockerd / buildkit in a infinite loop and burning cpu
    - fixes / addresses moby#41044 19.03.9 goroutine stack exceeds 1000000000-byte limit
    - fixes / addresses moby#40993 Multistage docker build fails with unexpected EOF

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Member

Opened #41185 to vendor the fix in the 19.03 branch

docker-jenkins pushed a commit to docker-archive/docker-ce that referenced this issue Jul 9, 2020
….6.4-15-gdc6afa0f)

full diff: moby/buildkit@a7d7b7f...dc6afa0

- solver: avoid recursive loop on cache-export
    - fixes moby/buildkit#1336 --export-cache option crashes buildkitd on custom frontend
    - fixes moby/buildkit#1313 Dockerd / buildkit in a infinite loop and burning cpu
    - fixes / addresses moby/moby#41044 19.03.9 goroutine stack exceeds 1000000000-byte limit
    - fixes / addresses moby/moby#40993 Multistage docker build fails with unexpected EOF

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Upstream-commit: e7c2b106ec7785fcb54b1cf80258a2bea25ed020
Component: engine
@Antiarchitect
Copy link

@thaJeztah Seems like this can be closed too by 19.03.13 :)

@thaJeztah
Copy link
Member

Good catch, yes. Looks like GitHub didn't auto-close. Should be fixed by #41185

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/builder kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/19.03
Projects
None yet
Development

No branches or pull requests

4 participants