Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong dockerfile used when building two images in the same context folder #1368

Closed
MichaelKim0407 opened this issue Feb 19, 2020 · 39 comments · Fixed by #2081
Closed

Wrong dockerfile used when building two images in the same context folder #1368

MichaelKim0407 opened this issue Feb 19, 2020 · 39 comments · Fixed by #2081
Assignees

Comments

@MichaelKim0407
Copy link

When building different images with different dockerfiles in the same context, the wrong dockerfile will be used (probably due to caching problems).

I created a minimal reproduction here. Please see README in that repo.

@tonistiigi
Copy link
Member

Interesting. Buildkit uses same method as rsync, based on file metadata, for file transfers. Because you do fast git clone in your test script the files (often) have the same timestamp (with the rest of the metadata) that is causing this.

Eg. you can change your test script to

#!/bin/bash

rm -rf .test
git clone .git .test
cd .test

mkdir foo
rsync a/Dockerfile foo/
rsync b/Dockerfile foo/
cat foo/Dockerfile

for the same behavior.

Need to think about this.

@MichaelKim0407
Copy link
Author

Thanks for the explanation. So essentially, the underlying copying method BuildKit is using (i.e. rsync) thinks they are the same file because it only checked for timestamp and file size?

For context, git clone is a realistic scenario because I discovered this problem in CI builds.

In this case I think it's fair to ask why BuildKit is trying to copy apparently different files into the same destination?

I played around the test script a little bit, and it seems that if I change the build path to an absolute path (/path/to/.test/ instead of .), the error will go away. But if I change both commands to use absolute paths, the error reappears.

I have no idea of BuildKit's implementation, but it makes me speculate:

  • BuildKit is rsyncing the dockerfile and ignore file into a temporary folder before building
  • The path of the temporary folder is deterministic, so subsequent builds can reuse assets
  • The path calculation takes the literal value of build path into consideration

If my speculations are true, I would make the following suggestions:

  • This has nothing to do with the bug itself, but wouldn't it make sense to always calculate the temporary path from absolute paths?
  • When rsyncing the dockerfile and ignore file, check absolute_path(--file) instead, because if the value is different, we know for sure they are different files, and rsyncing them into the same destination doesn't make sense.

Of course these are my speculations, as I haven't looked into BuildKit's source code yet.

Although I'm not a Go developer, I'll be happy to contribute if you can confirm these and think the changes will be simple enough. Please let me know how you want to proceed!

@tonistiigi
Copy link
Member

I played around the test script a little bit, and it seems that if I change the build path to an absolute path (/path/to/.test/ instead of .), the error will go away. But if I change both commands to use absolute paths, the error reappears.

Yes, there are cases where cli side will copy the Dockerfile before sending. This would trigger new timestamp and this issue doesn't appear.

Although this seems like a simple fix it's not really a good idea to always force a copy like this on the cli side as it would make Dockerfile an exceptional case. What if build needs another file (dockerignore, or something new later), it would hit the same issue.

We could make it possible for the frontend to choose what algorithm to use for specific sources. Eg. dockerfile is always small so wouldn't be wasteful to always copy it, or do a full crypto checksum over data.

The transfer logic itself lives in https://github.com/tonistiigi/fsutil repository.

@MichaelKim0407
Copy link
Author

I'm not quite following.. Are files always sent to the same folder, no matter what is being built? Or are they sent to a temporary folder that has some re-use mechanism?

@tonistiigi
Copy link
Member

They are sent to folders when they can be reused after a build has completed. Frontend can control the index mechanism for different files, eg. Dockerfiles go to difference place than build context. There is also separation based on workdir for different project not to collide and make rsync meaningless (don't remember if docker cli implements this).

@MichaelKim0407
Copy link
Author

MichaelKim0407 commented Feb 21, 2020

Just to be clear, frontend is the client (docker cli), and backend is the builder (BuildKit)?

And frontend decides where files go to, and backend decides how to copy the files?

(Sorry I'm not a docker developer.. I'm not sure what index mechanism means here)

I know BuildKit is experimental but I'm using BuildKit specifically for the Dockerfile.dockerignore behavior which I find very useful for building multiple images in the same project. All I know about docker builds is that files are being sent somewhere, so that builds don't actually run with the source files.

I would say from an end user perspective that, while making things fast is nice, reliability is more important for building images. I don't expect my image to break because the build process somehow used wrong files.

Thanks for the explanation, though!

@tonistiigi
Copy link
Member

And frontend decides where files go to, and backend decides how to copy the files?

Frontend is a high-level builder component. Eg. dockerfiles are built with Dockerfile frontend, buildpacks with buildpack frontend. Buildkit core is lower-level API shared by frontends. Frontends are daemon side and may run in containers.

@MichaelKim0407
Copy link
Author

Ok, makes sense. Thank you for your patience.

Although I'd very much love to help, it appears that I don't know the process well enough to be able to contribute anything meaningful. So I'll stop bothering you with questions..

Right now my workaround (for anyone else who runs into this) is using whitespaces and comments to make sure that different files don't have the same size

@WolfspiritM
Copy link

This hit me aswell in a CI environment. At least I think this was the reason. I've changed a file in repository that's ignored in .dockerignore and then I was building many images with docker-compose. The result was an image that was named service-a but used the dockerfile of service-b. That can be really dangerous as it might not be obvious and not produce errors until the image is started and does strange/unexpected things.

@russellcardullo
Copy link

@tonistiigi As a workaround this issue we've been doing docker builder prune in between builds where Dockerfiles have same size and timestamp (but different contents). However this slows down builds and eliminates caching.

Can you think of a nicer way to work around this? I suppose I could force the Dockerfiles to have all different sizes and timestamps but would there be an easier work around?

@nocive
Copy link

nocive commented Nov 23, 2020

Hitting this one as well.
Becomes specially severe when using monolithic repositories with many applications where the likelihood of this happening increases exponentially :(

@nocive
Copy link

nocive commented Nov 26, 2020

Can you think of a nicer way to work around this? I suppose I could force the Dockerfiles to have all different sizes and timestamps but would there be an easier work around?

Would also love to hear about cleaner ways to work around this for the time being. I've tried virtually everything, including using contextual .dockerignore file for each app with something like:

**/Dockerfile
!/this/app/Dockerfile

But that also doesn't seem to do be taken into account when the copy of the cached dockerfile happens.

Ensuring each Dockerfile has a unique filesize is error prone and does not guarantee the issue can't occur anyway when the dockerfiles are changed by multiple developers switching in between different branches.

@nocive
Copy link

nocive commented Dec 3, 2020

For anyone interested, a more elegant way of working around the issue for the time being is to pass your dockerfile contents via stdin with cat Dockerfile | docker build -f - ., in which case the cache issue will not occur.

Unfortunately, such a solution is not suitable if you have no way to intercept your docker build wrapper (eg: docker-compose).

cc @russellcardullo

@kiorky
Copy link

kiorky commented Dec 3, 2020

To be honest, as i also found that workaround of streamlining the Dockerfile in moby/moby#41743 ,
i also found a way to hijack compose with this pseudo code, to factorise in the meantime

service=mysuperservice
# with yq3
buildargs=$(docker-compose config\
    |docker run -i --rm mikefarah/yq:3 yq r - services.${service}.build.args -j\
    |docker run -i --rm imega/jq ".|to_entries[]|\" --build-arg \(.key)=\(.value)\"" -j)
# with latest yq
buildargs=$(docker-compose config\
    |docker run -i --rm mikefarah/yq e ".services.${service}.build.args" - -j\
    |docker run -i --rm imega/jq ".|to_entries[]|\" --build-arg \(.key)=\(.value)\"" -j)

cat Dockerfile | docker build -t mysuperimage $buildargs -f - .

@sirosen
Copy link

sirosen commented Dec 22, 2020

I hit this on day two of a new project, trying to setup builds for multiple different linux distros for testing.
I have a setup something like so:

ubuntu1604/Dockerfile
ubuntu1804/Dockerfile
ubuntu2004/Dockerfile

all of which are identical except for the FROM line, which (because it varies by the tag on Canonical's images) are all the same length.

I suspect this is a pretty common pattern for those who are using containers to do multi-platform testing.

We could make it possible for the frontend to choose what algorithm to use for specific sources. Eg. dockerfile is always small so wouldn't be wasteful to always copy it, or do a full crypto checksum over data.

I would welcome an option like this. Something to the effect of --syncmode=checksum. The raw data going into an image is often very small, and hashing it is easily worth the cost.
I would argue for checksum as the default and timestamp + size (call it --syncmode=stat?) as an option for those who want to avoid hashing very large files.

@petrzjunior
Copy link

I find this bug highly annoying, because it messes up the cache in a very unobvious way and is hard to debug. Timestamp are obviously messed up by git, but there is not much to do in a CI environment. Any chance this could by resolved?

@nocive
Copy link

nocive commented Jan 20, 2021

@petrzjunior it doesn't seem this is getting the attention it deserves.

I would say that more than annoying, this is deeply problematic as it can lead to building the completely wrong application artifact. Would also argue this isn't such an uncommon use case or unlikely to happen if you're dealing with multi-app repositories since one can easily end up with Dockerfiles with the same filesize even if they have different contents and, like you stated, timestamps are messed up by git which also leads to all files having the same timestamp.

What we ended up doing in some cases was marking problematic files (those that share the same build context) with a well known string (eg: BUILDKIT_ISSUE_1368), iterate over all the found files and give them a unique timestamp, ie:

files="$(git grep -l BUILDKIT_ISSUE_1368 '**/Dockerfile')"
d=$(date +%s)
i=0
for file in $files; do
    i=$(( i + 1 ))
    #echo "patching $file"
    touch -d @$(( d + i )) "$file"
done

@martinussuherman
Copy link

I stumble upon this bug today, didn't realize that the build was actually done with the wrong Dockerfile (and wrong base image - wrong arch) until I pull and run it. Here is my repo gh action run with the problem here and here

Agree with @petrzjunior that this is annoying and could easily overlooked at, since the build was a success.
+1 on @sirosen idea of option to use checksum, @nocive idea is nice one too.

Regards,

Martinus

martinussuherman added a commit to martinussuherman/alpine that referenced this issue Apr 6, 2021
@podlesh
Copy link

podlesh commented May 31, 2022

I can confirm: this is not fixed in latest Docker Desktop for Mac (Docker Desktop 4.8.2 (79419), engine version 20.10.14).

docker builder prune -f works as a workaround

@nocive
Copy link

nocive commented Jun 2, 2022

Please reopen /cc @thaJeztah @tonistiigi

@thaJeztah
Copy link
Member

The issue tracker in this repository tracks issues to be fixed in the master/main branch of this repository (which contains the linked fix); looking at the linked PR, #2081, it looks like the version of BuildKit used in Docker 20.10 does not (yet) have that fix, but perhaps @tonistiigi has insight if it's safe to backport (or best left for the upcoming 22 release of docker engine)

@nocive
Copy link

nocive commented Jun 3, 2022

@thaJeztah tyvm for your reply!

I was actually convinced the patch had already made it into one of the 20.10.x versions, that's why I was asking for the issue to be reopened.

if it's safe to backport

I believe you've asked that question at least twice in the past and got no answer.

The lack of importance given to this issue is frustrating as it's quite a severe problem, even if the probability of being affected by it is somewhat low. It was originally reported in February 2020 and more than 2 years later there's still no mainstream docker version that includes the fix.

Can we maybe have more eyeballs here and get some additional insights as to when will this actually ship with docker?

@nocive
Copy link

nocive commented Jul 13, 2022

ping

catusax added a commit to catusax/mesh-gen that referenced this issue Aug 1, 2022
touch Dockerfile so buildx can pick the right Dockerfile
moby/buildkit#1368
@mpietras
Copy link

ran into this as well in our build process... what an odd one to track down

@nocive
Copy link

nocive commented Sep 11, 2022

still reproducible with docker 20.10.18... (using https://github.com/MichaelKim0407/buildkit-wrong-dockerfile)

@nocive
Copy link

nocive commented Sep 15, 2022

glad to see this reopened 🎉

@artyomb
Copy link

artyomb commented Oct 20, 2022

+1 still, suffering from the bug

@nocive
Copy link

nocive commented Nov 21, 2022

bump?

@michielbdejong
Copy link

This is affecting GitHub's default production configuration across their entire Codespaces product, since they use an affected version of your buildkit by default:
https://github.com/orgs/community/discussions/38878

What do you suggest the GitHub Codespaces team should do about this?
Maybe it would be safest if they set DOCKER_BUILDKIT=0 as a default across their product, until they are ready to upgrade to the next Docker version?

@nocive
Copy link

nocive commented Jan 3, 2023

it's 2023! 🎉

@petrzjunior
Copy link

petrzjunior commented Jan 3, 2023

I once again searched through the Docker repos to figure out why this has not been pushed into the Docker yet.
The issue has been fixed since July 2021 in buildkit 0.9.0, but Docker seems to still be using buildkit 0.8. They are actually bumping the version quite regularly in the master branch, but it is never cherry-picked into the release.
The closest one to be published right now is this PR moby/moby#43239 assigned to the 23.0.0 milestone which is probably the next major release. They even have an RC from last week moby/moby#44700 so let's hope they can make it to release!

@nocive
Copy link

nocive commented Feb 3, 2023

docker 23.0.0 just landed, can someone confirm if this is fixed there? I sadly can't do it at the moment.

@jimmidyson
Copy link

Looking at moby/moby@56ea588 this is includes in the v23.0.0 tag 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.