-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Laundry list of smells from this resource #190
Comments
There may be value in adopting one of the alternatives that are emerging -- buildah, Img, kaniko etc. There is also a nascent "Container Builder Interface" (CBI) being spruiked which aims to unify these under a common API. We can definitely seek advice and input from folks in the Google container tooling teams (tagging @mattmoor, @dlorenc and @imjasonh), who are attacking this stuff from a lot of angles. @sclevine is also exploring this problem in the context of buildpacks, including looking closely at the various options I linked. Edit: I should also tag @bparees from Red Hat. They've had to wrestle with Dockerfiles-vs-sanity for OpenShift a bit. |
@jchesterpivotal Yeah, I think that's the future. I'm playing with So the goal with Kaniko would be to run in a regular unprivileged container and not require spinning up a daemon. The downside I suppose would be that there's no caching, which a lot of people seem to care about. There's a somewhat nebulous open issue for it: GoogleContainerTools/kaniko#102 I think we'd at least need the ability to support the |
In terms of |
Also, there's a lot to be said for breaking First, we don't really have reusable tasks as a native concept yet, and it would be a useful one to ship. Second, they can use different bits of software for different tasks. A registry resource could use |
Hello, great to see that this resource may get an big update/refactor my two cents Having different resources make sense in my view. So, build, run, push/pull could be different resources. We are start to using concourse for some pipelines so I am not that experienced with it, but our pipelines seems hacky regarding running tests and etc. |
Thanks a lot for organizing this issue @vito. As @vito rightly points out, one of the big issues with this resource is the way that cache is abstracted. A standalone docker daemon has pretty good cache defaults and people are used to them. I think we should emulate "workstation docker" behavior as best as we can, whether from a central cache or per worker cache. Per worker cache seems like the most natural fit for concourse, and I think would still yield fairly good performance (e.g. frequently built images could likely be in cache). I do not know if concourse supports my ideal cache for docker, or for any other immutable data (because the docs are sparse https://concourse-ci.org/caching-and-retention.html; https://concourse-ci.org/volume-internals.html ). But I would love to see persistent cache that is retained across builds and possibly pipelines for immutable data and only for immutable data. For immutable data such as docker layers or precisely versioned libraries, there are only benefits to caching data between pipelines from an end-user perspective. I think CI/CD is an area that warrants extra complexity in order to achieve higher performance because humans wait on CI/CD. Reducing feedback latency is a big win for development and operations velocity. Here is my wishlist for this resource, but similar things can be said for any immutable resources:
To re-use build steps within a dedicated resource, such as in |
@chrishiestand Yeah, I agree with just about all of that. I'd be really interested in seeing a fork of this resource that is pointed at a centrally deployed Docker daemon, instead of spinning up its own. That would at least get you a lot of the desired caching behavior for free. The downside of this approach is it suddenly adds a hard dependency on a Docker daemon, which isn't commonly deployed standalone, at least in my experience. A lot of people are going to want to forward their existing So you could deploy one on its own and configure it to listen on TCP (maybe with TLS?), and then point your pipelines at it. You'd get much better caching behavior, at the expense of your builds now having to upload their context to a remote Docker daemon as part of the build. Probably a net benfit, but unfortunately it takes a bit more work to set up. I think it's a worthwhile experiment, and shouldn't be too hard to implement as a fork of this resource. It should just be a matter of ripping out all the Docker daemon setup and the params related to it, and then targeting the configured remote daemon with the CLI. |
In terms of alternative implementations, I would think about splitting it into two parts. The first would be a Having the split means that I can also come up with further permutations without needing a single resource to become increasingly busy:
Finally the split opens up seams to allow the introduction of layer-by-layer logic. Right now images are treated as an all-or-nothing affair, but in the near future it will become more common for efficiency and security purposes to think about layers instead of images. Breaking apart the build task from the registry resource will help. I'm making the assumption that redistributable tasks exist and are based on more than shared conventions. I think splitting up docker logic would be a decent way to explore what would be necessary for a higher-level Task concept. Edit: I also just noticed I already said all of this previously, but more concisely. |
Kaniko won't help with caching yet, but I still think it's worth trying, at least to remove the Docker-in-Docker aspect while preserving portability. For what it's worth, there has since been more discussion in GoogleContainerTools/kaniko#102 which shows that their goals there are pretty well aligned with Concourse's (caching
Fully agreed. See recent discussion in #185 - there's currently way too much friction when it comes to redistributable tasks, which forces people to use resources for things they shouldn't be used for. So I like the idea of splitting this up, because there's already clearly a need for build-without-push. (I'm also happy to see any proposal which cuts this resource up into smaller pieces, so that alone is great.) |
@jchesterpivotal Here's a proof-of-concept using Kaniko to build, as a Concourse task: ---
platform: linux
image_resource:
type: docker-image
source: {repository: gcr.io/kaniko-project/executor}
inputs:
- name: source
outputs:
- name: image
run:
path: /kaniko/executor
args:
- --dockerfile=/tmp/build/e55deab7/source/Dockerfile
- "--destination=doesnt/matter:test"
- --context=/tmp/build/e55deab7/source
- --tarPath=/tmp/build/e55deab7/image/image.tar Full usage/output here: https://gist.github.com/vito/ab90980adbeaf000739fff6a64e0407d One gotcha is I had to use absolute paths for the various flags, because the first thing Kaniko does is I also ran into this issue when trying to do build a multi-stage image: GoogleContainerTools/kaniko#245 |
I've started to implement a new resource for fetching images from a registry: https://github.com/concourse/registry-image-resource See the README for comparison/goals/etc. So far so good. Testflight and TopGun are both passing with it as a drop-in replacement for |
It's super nice this is being discussed. If it helps the workaround I have for now is:
This is for our external images, the testing step is basically why I am using a task instead of the old docker image resource. The main pain points for me are caching when it comes to batch building images, I have one folder for all images I use inside concourse and the whole folder is built on changes (internal stuff, image testing not such a priority). I just now started looking at improving this a bit and the caching comes up now :-) So far I think all proposals here could improve stuff for me, I will try to use the new resource here to do some experiments, possibly even look at kaniko. I will continue looking at multi-arch builds so this might be a good chance for me to spend some more time on this. I hope this can be useful input in the design process here, altough admittedly I'm not sure that aarch64/amd64 dual builds are much of a priority for others. |
I redid @vito's kaniko experiments, for me there's a few disadvantages:
In many ways I think that a resource for building based on dind and the go bindings could be a good compromise. I don't feel that depending on dind is a huge issue in itself when it comes to building. The current resource is multi-arch capable by using the alpine repo/architecture ecosystem. The caching in my mind is somewhat independent and it doesn't seem like there's easy solutions for it. I discovered that in my scenario the biggest issue is that I have a slow connection to dockerhub. I'll experiment a bit but maybe this can be better solved by using a local registry (to which there is a good connection) always and have that proxy dockerhub so the caching functionality is dealt with there instead (at least in my mind that keeps the architecture clean) ... in that case the split up in registry and building resources makes a lot of sense, we keep the components the way the were envisioned, we avoid implementing caching logic ourselves, we can get rid of the command line output parsing (if there still is any). I'll try to take a look at @vito's registry resource and try to experiment with the registry proxying/mirroring (it turns out it's a thing: https://docs.docker.com/registry/recipes/mirror/) |
Re: caching, GoogleContainerTools/kaniko#300 is the current best idea for enabling fast incremental builds in kaniko. To me, relying on dind means you simply cannot build on a Kubernetes cluster unless you trust every user on the cluster. By removing the need to trust each user, you can get closer to providing a build service on a single large multitenant cluster instead of having to isolate each user on their own cluster, which unnecessarily wastes resources. |
@neumayer - what kaniko image did you use? Nothing we ship as officially supported from the kaniko repo should require the privileged flag. |
@dlorenc I think you are right, I must have used some old image at some point in my experiments, I don't seem to be able to reproduce the privilege issue. |
I whipped up a task that builds a Docker image using What's neat is caching was pretty easy to support via Concourse task caches, and they're easy to invalidate if needed with We're using this in our new pipeline for the Note that this doesn't yet support running without |
@vito with concourse builder do we consider the laundry list is done? If not could we extract items not done to a new task list for better tracking? Thx! |
I recently tried adopting these new container building tools into my pipelines. I settled on a pretty simple approach using Kaniko in a task. Very similar to @vito's proof-of-concept example above (but having the benefit of two more years of Kaniko development). Notable:
I feel like this is a meaningful improvement over current state docker-image-resource and over oci-build-task. Is there anything beyond sharing some pipeline recipes that would benefit the community? Perhaps replacing the oci-build-task recommendation in the registry-image-resource README? |
@lnhrdt Awesome! Is there anywhere I can take a look and try it out? Cool that it works without |
Hey @vito here's a self-contained pipeline that demonstrates the approach. Add your own registry creds and resources:
- name: test-image
type: registry-image
source:
repository: ((registry-host))/test
username: ((registry-username))
password: ((registry-password))
jobs:
- name: image
plan:
- task: create-image-source
config:
platform: linux
image_resource:
type: docker-image
source:
repository: alpine
outputs:
- name: image-source
run:
path: sh
args:
- -ecu
- |
cat > image-source/message.txt <<EOF
abc123
EOF
cat > image-source/Dockerfile <<EOF
FROM alpine
COPY message.txt /message.txt
EOF
- task: build-image
input_mapping:
context: image-source
config:
platform: linux
image_resource:
type: docker-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: context
outputs:
- name: image
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --tarPath=image/image.tar
- --no-push
- --destination=noop # https://git.io/JJDXu
- put: test-image
params:
image: image/image.tar other thoughtsIn my last comment I was mistaken about supporting caching. Kaniko has a caching mechanism but it works by caching in a repository. It's not compatible with the Concourse task caching paradigm. There are some recent issues in their repo that discuss adding support for "local caching" which would allow us to cache in a Concourse task (GoogleContainerTools/kaniko#1148, GoogleContainerTools/kaniko#923). I don't know how fast ideas move in the project but I'm keeping an eye on those. I thought about swapping out buildkit for kaniko in oci-build-task too, but I ended up thinking that simply replacing it with a "task recipe" would be better because I didn't need to extend their image to add anything. Using Kaniko in a Concourse task was just a matter of mapping inputs to the right arguments. In the future this seems like a fantastic use case for concourse/rfcs#7 but since that's still in idea land, I thought a "task recipe" could help some folks out. That |
@lnhrdt Sweet, thanks for sharing the code and the context! I'll poke around. It kind of sounds like Kaniko could provide a solution for concourse/oci-build-task#1 which has been blocked on FYI: you may be interested in checking out the Prototypes RFC - it'll become the foundation for all shareable executable "things" in the future: resources, var sources, notifications, in addition to arbitrary actions with the |
A hint for anyone who gets to this before I can: Kaniko expects full digests in the However, we could have a pattern like this: FROM ${alpine_image:-alpine} This will work locally (Dockerfiles respect the I like that this makes the cache usage more explicit, though it is a bit more verbose. |
This is a train of thought. Notes mainly. Based on @lnhrdt work, I've tried the caching mechanism. (Included the pipeline below) The warmer will put images from an available container image registry. The issue with caching is that the built image
The {
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"config": {
"mediaType": "application/vnd.docker.container.image.v1+json",
"size": 3352,
"digest": "sha256:56def654ec22f857f480cdcc640c474e2f84d4be2e549a9d16eaba3f397596e9"
},
"layers": [
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 26701612,
"digest": "sha256:171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 852,
"digest": "sha256:419640447d267f068d2f84a093cb13a56ce77e130877f5b8bdb4294f4a90a84f"
},
{
"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
"size": 162,
"digest": "sha256:61e52f862619ab016d3bcfbd78e5c7aaaa1989b4c295e6dbcacddd2d7b93e1f5"
}
]
} When generating the image For example, [{"Config":"sha256:2bc60245855ee71a32c1377437833a38b9371db85a0c27c453ac0d23febbda57","RepoTags":["first"],"Layers":["be79478c86a9fce917b32c6d275e2086022d8951d7aa3d2070617c2b4d6644c7.tar.gz","69ab70f4eb4b0b13e03640757fb2ececa1d31d933d52d9651acc8679320bbf3f.tar.gz","3c829501b976555ea693ab2581721b1dc8e7bb62872732d2ccc0bfea99d17422.tar.gz","36a89f438795720740424e61aa398474b9c97b4db3806502d054e2830a9da37f.tar.gz","a7ed845d8b0dc083d5ba40f2e59900f6e48a76b63d0b9055efe5b981ae96766b.tar.gz"]}] The jobs:
- name: create-container-images
plan:
- task: create-dockerfiles
config:
platform: linux
image_resource:
type: registry-image
source:
repository: ubuntu
outputs:
- name: context
run:
path: bash
args:
- -c
- |
set -eux
cat > context/Dockerfile.1 <<EOF
FROM ubuntu
RUN apt-get -y update
RUN apt-get -y install vim
EOF
cat > context/Dockerfile.2 <<EOF
FROM first
RUN apt-get -y install curl
EOF
- task: cache
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/warmer
caches:
- path: cache
outputs:
- name: cache
run:
path: /kaniko/warmer
args:
- --cache-dir=cache
- --image=ubuntu
- task: build-packages
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: cache
- name: context
- name: images
optional: true
outputs:
- name: images
- name: cache
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --cache-dir=cache
- --no-push
- --destination=first
- --dockerfile=context/Dockerfile.1
- --tarPath=images/1.tar
- --image-name-with-digest-file=images/1.json
- task: build-binaries
config:
platform: linux
image_resource:
type: registry-image
source:
repository: gcr.io/kaniko-project/executor
inputs:
- name: cache
- name: context
- name: images
optional: true
outputs:
- name: images
- name: cache
run:
path: /kaniko/executor
args:
- --context=context
- --reproducible
- --cache-dir=cache
- --no-push
- --destination=second
- --dockerfile=context/Dockerfile.2
- --tarPath=images/2.tar
- --image-name-with-digest-file=images/2.json
ensure:
task: inspet
config:
platform: linux
image_resource:
type: registry-image
source:
repository: ubuntu
inputs:
- name: cache
- name: context
- name: images
run:
path: bash
args:
- -c
- |
set -eux
sleep 1000 |
As a follow-up to #190 (comment) - I was able to use the same build-arg-to- With this feature implemented we're much closer to fully deprecating |
@inhrdt, Thanks. I am able to build image using kaniko in concourse now. |
This resource hasn't aged well. We've long suspected a new approach may be necessary, potentially including a complete rewrite and/or breaking it up into separate resource types, each focused on a specific workflow.
I'm going to use this issue as a dumping ground of sorts for now, in hopes that it eventually leads to a more focused direction that can skewer a few of these piranhas at once.
put
.bash
and are subject to thedocker
CLI output format changing. Some things are already broken because of this.cache: true
even pulls down the image willy-nilly, preventing Concourse caching of the image (ironically). (see load_cache or similar support #121, Multi stages build cache work only with last stage #148, Add cache_from parameter #188)apt-get update && apt-get -y install foo
will need the cache busted in order to get a newfoo
.Dockerfile
s with Concourse-fetched dependencies while also having the image easy to build locally during development (you'd be missing all the dependencies).The text was updated successfully, but these errors were encountered: