-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] ADD/COPY scripting improvements #2917
Comments
Have you seen moby/moby#35639 (there's also a longer discussion in moby/moby#15858)? The issue proposes a
|
Yes, I think the Wouldn't hurt to read up on how a regular |
Using But, you are precisely correct - apparently, I'm lacking The problem with the above/original proposal is that without Side note - there is another method to consider - include/exclude dirs, e.g. you copy the root directory with From what I can see now, Let me look into it - I'll try to swap Minimalistic "flattening mode" tescase (btw, note how cp complains on conflicts in a flattening mode, while buildkit is silent): Test script source code#!/usr/bin/env bash
TAG=addcopy_scripting_test2
CONTAINER=$TAG
TEST_ROOT=$(mktemp -d)
cd $TEST_ROOT
pwd
mkdir -p ./src/sub{1,2} ./dst
touch ./src/sub{1,2}/f{1,2}
cat << DOCKERFILE > dockerfile
# syntax=dyefimov/dockerfile:addcopy-scripting-improvements-labs
FROM scratch
WORKDIR /dst
ARG SOURCES
COPY --preserve-top-dir --split-sources \$SOURCES .
CMD ["sleep","3600"]
DOCKERFILE
echo "Source tree:"
find ./src | sort
test() {
echo "===================================="
echo "Testing arg: $1"
2>/dev/null 1>&2 docker container rm -f $CONTAINER
2>/dev/null 1>&2 docker buildx build --load --no-cache -t $TAG --build-arg=SOURCES="$1" $TEST_ROOT
2>/dev/null 1>&2 docker container create --name=$CONTAINER $TAG
echo "Container tree: "
docker container export $CONTAINER | tar tf - | grep dst/ | sort
2>/dev/null 1>&2 docker container rm -f $CONTAINER
2>/dev/null 1>&2 docker image rm -f $TAG
echo "cp tree: "
rm -rf ./dst/*
cp $1 ./dst/
find ./dst | sort
echo "cp --parents tree: "
rm -rf ./dst/*
cp --parents $1 ./dst/
find ./dst | sort
}
test "src/sub1/f1"
test "src/sub2/f1"
test "src/sub*/f1 src/sub*/f2" And test script output: $ ./proposal_poc2.sh
/tmp/tmp.lr18fzJ9vb
Source tree:
./src
./src/sub1
./src/sub1/f1
./src/sub1/f2
./src/sub2
./src/sub2/f1
./src/sub2/f2
====================================
Testing arg: src/sub1/f1
Container tree:
dst/
dst/f1
cp tree:
./dst
./dst/f1
cp --parents tree:
./dst
./dst/src
./dst/src/sub1
./dst/src/sub1/f1
====================================
Testing arg: src/sub2/f1
Container tree:
dst/
dst/f1
cp tree:
./dst
./dst/f1
cp --parents tree:
./dst
./dst/src
./dst/src/sub2
./dst/src/sub2/f1
====================================
Testing arg: src/sub*/f1 src/sub*/f2
Container tree:
dst/
dst/f1
dst/f2
cp tree:
cp: will not overwrite just-created './dst/f1' with 'src/sub2/f1'
cp: will not overwrite just-created './dst/f2' with 'src/sub2/f2'
./dst
./dst/f1
./dst/f2
cp --parents tree:
./dst
./dst/src
./dst/src/sub1
./dst/src/sub1/f1
./dst/src/sub1/f2
./dst/src/sub2
./dst/src/sub2/f1
./dst/src/sub2/f2 |
Yes, I found around 5-7 issues, some of them like 5+ years old, all discussing the same or close functionality. Maybe we should consolidate all of them here (point to this very proposal via comment) to bring more opinions, and hopefully distill this into something meaningful and finally fix this annoyance?
That sounds like a complex task and a whole new layer of complexity (source of bugs). Can't we keep it simple but yet flexible? What do you think if we just extend the proposed method with a proper quotation support? e.g. split by separator, while quoted paths not splitted but unquoted. |
Excellent ❤️
I'm sure this one has seen some discussion before as well, #2853, moby/moby#15771, it just looks to me like the discussion has stalled around a lack of people wanting to/able to put in the time to put together a contribution.
Could be worth making sure that similar issues that would be solved by similar solutions are linked. I think though it's probably worth discussing individual flags separately from each other, since I think they're mostly independent.
This is exactly the thing that scares me 😄 Dockerfiles don't implement a fully sh-compatible quote parser, for some fairly good reasons - it gets quite tricky to handle the small details, and the existing structure of the parser makes it quite challenging to do well. Someone else might have some better thoughts than me here, I'm not as aware of the historical context for some of this stuff as others. Slightly related - one thought that I did have - if we're adding more flags to more places (other candidates for this include |
I am just wondering whether this limitation is documented? Are there other cases where build arugment substitution is not working right now? I recall that initially usage was fairly limited, but I had an impression that it improved over time, but it's not like I can recall of concrete changes where substitutions became possible. |
I don't disssagree with the proposal being useful, but I do believe that currently very similar idea can be implemented by specifying alternative build context, or pehrpaps (somewhat more hack-y) context preparation, which could involve generated |
I don't think it's documented anywhere. We do argument substitution pretty much everywhere, the case where it's not working atm is in doing word splitting, like how POSIX sh does - i.e. turning
Not quite sure what this means - I can't easily see how this would help with something like a
I remember some discussion on this internally. The summary as I remember it: it's a good work-around, but it has a couple issues - ExecOp caches much more poorly than the corresponding FileOp, and doesn't work with constructs like |
In fact, that's exactly what I'm trying to avoid here. Of cause, everything in software engineering is a trade of, and there are many ways to achieve the same goal. I'm pushing here on "Linux philosophy", "Simplicity", and "Flexibility". You might be interested to look at #2893 where I posted a more detailed real-world use case that brought me to this proposal. We definitely can do the above the other "hacky" way, but IMHO the change here is very cheep for the benefits we gain.
The root cause is the reversed order of operations in buildkit compared to POSIX shell (POSIX way: parameter expansion, then word splitting). Swapping the order in buildkit seems to be a horrible idea. Proof: frontend/dockerfile/dockerfile2llb/convert.go
That's what I tried first - it's possible - but I was incapable of avoiding cache-misses for file changes not relevant for a particular RUN command. Tbh not sure if I explored all the capabilities of the RUN --mount. Some more details here: #2893 |
There's also a test-case/example (but in relation to permissions for parent directories which could be fixed with the
Yes, that's a tricky one indeed; when discussing some things internally, I also suggested someone to use ... which is what led the internal discussion to the |
@DYefimov sorry for the ping, was wondering if you'd managed to get any further with your PoC? If you've got a working |
@jedevc No worries. |
Not sure why all of a sudden vscode decided to reset my @jedevc @thaJeztah |
No worries, thanks @DYefimov! Will continue the conversation over there 😄 No worries about the tests, that's what CI is for 🎉 |
@tonistiigi @crazy-max @thaJeztah And what are your thoughts on the second proposition here, the |
Hello, One more thing that would be amazing, is to allow filtering out directories that get ADD/COPY-ed. (My use case is that I don't want to pollute the top level of a monorepo, and there's no support for nested .dockerignores.) Basically something like Thanks for considering! |
@PAStheLoD related, but tracked in #2853 |
related;
parents
option to dockerfile COPY command moby#35639Abstract
Currently the following construct is impossible:
Therefore severely limiting metaprogramming capabilities of buildkit and dockerfiles.
The most valuable application of such a construct is dockerfile target reusability, e.g. you can build lots of similar images that differ only by filesystem content without code duplication or any external tools like templating engines.
You can check the real world use case in more detail at #2893 where this RFC originates from.
To achieve such functionality without introducing any breaking change while retaining backward compatibility, two optional flags
--split-sources
and--preserve-top-dir
forADD
andCOPY
instructions are proposed.Implementation details
To make the proposed syntax possible, two separate changes have to be made.
--preserve-top-dir
Due to historical reasons
ADD
andCOPY
instructions behave differently from the well known *nixcp
utility when the sources list contains a directory.COPY ./src_dir ./
will copy only the contents ofsrc_dir
which is analogous to$ cp -R ./src_dir/* ./
.There is no direct way to copy the directory itself in buildkit/dockerfile right now (similarly to
$ cp -R ./src_dir ./
) . When you pair this issue with variable expansion (e.g.COPY $SOURCES ./
) or when there is a need to copy multiple directories with one instruction while excluding some other sibling directories, the flexibility/transparency of the code is lost and many programming patterns rendered impossible.This issue is known since 2015: moby/moby#15858
The following change is proposed: add
--preserve-top-dir
optional flag toADD
andCOPY
commands, to preserve the top level directory on copying, otherwise if flag is not setADD
/COPY
will behave as usual.COPY ./src_dir ./
<->$ cp -R ./src_dir/* ./
COPY --preserve-top-dir ./src_dir ./
<->$ cp -R ./src_dir ./
--split-sources
Historically variable expansion in buildkit happens after word splitting in the dockerfile parsing logic. Hence it is impossible now to supply multiple sources to
ADD
/COPY
via a single variable. In POSIX compliant shells the order of operations is reversed.Swapping the order in buildkit would be harmful and dangerous. Therefore the following change is proposed: add
--split-sources
optional flag toADD
andCOPY
commands, that when set will split every element in sources array after variables are expanded the second time.Similar approach is used by the
env
GNU coreutils utility. Historically in linux kernel users were able to pass only one optional argument via shebang. Starting from version 8.30 ofenv
utility-S
option is available:POC
Compiled frontend @hub.docker.com: dyefimov/dockerfile:addcopy-scripting-improvements-labs
Feature branch @github.com: DYefimov/buildkit/tree/addcopy-scripting-improvements
POC test script:
POC test script output:
The text was updated successfully, but these errors were encountered: