Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buildkit support with buildx #43

Merged
merged 2 commits into from
Sep 23, 2022
Merged

Conversation

jedevc
Copy link
Contributor

@jedevc jedevc commented May 12, 2022

Heya 👋

This isn't in a completed state yet, but thought it might be quicker to have a discussion around a PoC I was hacking around with, than to have a long-winded proposal 😄

TL;DR, proposal: official images could opt in to buildkit with a syntax like:

Buildkit: true

In this case, instead of building using docker build, the image is built using docker buildx build. At some point, the default builder will likely switch to buildkit, but it might be nice to start allowing some images to opt-in sooner, to reduce the complexity of adapting many images later down the road (ideally no changes would be needed in any of the Dockerfiles, but I haven't had the time to manually review every single one of them 😢). Additionally, this could open up some nice opportunities, like being able to use the buildkit parser for extracting FROM data (since it's backwards compatible with older dockerfiles), using the new imagetools for manifest management, etc.

Regarding caching, buildkit has a few cache backends, and while it might be worth investigating using them in the future: but for a first pass, I think the inline cache option is most similar to what the other builder uses - this means that no separate cache policy will need to exist, it can be the same. Additionally, buildkit has good support for caching multi-stage images - which might be nice to investigate later down the road 🎉

This would let image maintainers use some newer buildkit features if they want, and hopefully unlock some future performance gains (though I haven't done any benchmarking to test yet).

Looking forward to hearing what you think!

@jedevc jedevc marked this pull request as draft May 12, 2022 13:15
@tianon
Copy link
Member

tianon commented May 12, 2022

Thanks! I agree this is a good place to have a discussion. 😄

Some initial thoughts:

  1. I guess the intent here is to allow us to start phasing in BuildKit slowly over time and eventually switch the default for everyone (once there's a higher general comfort level)?

  2. functionally, what's the difference between DOCKER_BUILDKIT=1 docker build ... and docker buildx build ...? just (potentially) the version of BuildKit that ends up being invoked?

  3. does BuildKit have a way we could restrict the set of allowable syntax= directives (or I guess even enforce a specific one)? are there other features of BuildKit that might potentially pose review, support, or security implications that we should look to restrict/control? (for example, is it possible for a Dockerfile to request a mount from the host?)

@jedevc
Copy link
Contributor Author

jedevc commented May 13, 2022

Point by point:

  1. That's the high-level intent yes - aside from just comfort level, there's a few fiddly technical constraints that mean it's not feasible to switch everyone over yet, notably windows support, and some cases of feature parity (though thankfully those are less common now, and I'd be surprised if any of the official images relied on some of that behavior).
  2. I think buildx is the way to go, as it doesn't aim to provide flag-level feature parity with the build command - it's got quite a different interface. Additionally, it'll provide greater control over things like cache. Another thing it does do is manage different "builders" - for now we can use the default docker builder, but there's functionality for connecting to different builders, which might be something to investigate in the future? I think they'll both use the same version of buildkit behind the scenes, but buildx should provide a more complete view into some buildkit features which might be useful at some point :)
  3. Yes, we can override the syntax= directive by passing --build-arg BUILDKIT_SYNTAX=docker/dockerfile:VERSION to buildx. We'd probably want to set it to docker/dockerfile:latest or docker/dockerfile:1 (or maybe allow configuration via an environment variable/flag?)
    R.e. other security things - buildkit has RUN bind mounts are probably not a concern, since they can only read directories from inside the context, and don't support write-back. By design, there shouldn't be any features that let arbitrary be accessed from the host. Of course, buildkit (like the other builder) can make external http requests, so would be able to access services on the builder network - however, I don't think buildkit increases this risk.

You did mention mounts though, so I remembered - buildkit also keeps a separate private cache in /var/lib/buildkit - so some functionality for cache pruning will need to be added, but it should just be a simple invocation of docker buildx prune (with some parameters). This shouldn't affect build reproducibility, I think we should be free to purge it whenever we want.

"--cache-to", "type=inline",
}
for _, cache := range caches {
args = append(args, "--cache-from", "type=registry,ref="+cache)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't love how I've done this here. Essentially in the current implementation I've told buildkit to extract cache from all the tags of that build previously - which could be a few.
I can't use the bashbrew/cache:digest tag to import cache from, since that changes on each update to the GitCommit - it there a stable place that I could easily access "the last build of this manifest entry"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some other experiments, I've hacked around this by "guessing" a stable tag from the list of tags - it feels like it might be nice to have a stable Name field for each entry that attempts to stay consistent between updates and builds (even if the tags are changing).

@jedevc
Copy link
Contributor Author

jedevc commented May 17, 2022

should just be a simple invocation of docker buildx prune

So I'm not quite sure where this would go - my best guess would be somewhere in docker-library/oi-janky-groovy as part of the CI multiarch scripts, as opposed to this tool directly? I can't quite find where the logic for tidying up cache lives.

@jedevc
Copy link
Contributor Author

jedevc commented May 26, 2022

Heya @tianon 👋

Everything's a bit busy atm, but was wondering if you'd had a chance to think about this PR/had any concerns? Was looking at how to move this forward 🎉

@yosifkit
Copy link
Member

yosifkit commented Jun 4, 2022

My initial thoughts on buildkit builds after doing some research and testing.

cache: how to correctly invalidate and how to get the cache for the "next" build

Pushing cache to a local registry is going to increase overall maintenance, since now there is a service to be maintained and cleaned regularly. "What is useful" cache in the registry is very hard to calculate. There is not a unique identifier each tag grouping and that is not something we want to attempt to push onto the image maintainers, nor is there a way to get the previous set of tags that could be considered for build cache (e.g. for php, any tag that is from the same parent is valid for build cache, even across PHP versions and variants like fpm or cli).

The most correct is that cache can and should be used for all images within a given repo that have at least one common parent image. Ideally we'd use a network mountable disk that works for /var/lib/docker and buildkit build cache (like EBS volumes). Perhaps an operator could be responsible for creating and deleting the set of needed EBS volumes. This would be scoped to the parent/repo combination (like debian:bullseye + php or debian:buster + php). Since this set of "parent/child" combinations is expensive to calculate, it should also be available to jobs for figuring our what to build and triggering children (maybe a config object generated by the operator?).

With moby/buildkit#1368 still not being included in a Docker release and the same possibility still happening for all the other files in the context, we must do buildkit builds via context piped to stdin (e.g. git archive [...] | docker buildx [..]) since that seems to always work correctly (this is how bashbrew currently works anyway).

Dockerfile Syntax:

  • --link
    There are basically zero cases where this could benefit the official images; assuming we have a multistage build, if the base image is changing because of a rebuild, then the image in which the binaries were built is also changing.
  • --mount
    bind makes it harder to enumerate the build context; it could lead to more opaque/obfuscated Dockerfiles, so we will likely disallow using it for official images. Allowing tmpfs for /tmp could be useful to simplify cleanup for many maintainers.
  • --network
    Do we have a reason to give official images the ability to change networking per RUN (like none)?
  • Here-Documents
    If we allow it, it could improve readability of some Dockerfiles.
  • --security
    Since we build images of 3rd-party controlled software, that's is not something we would allow for official-images (at least it is disallowed by default).

Overall decision on syntax additions/changes is that we'll have to disallow them by default until we have specific reasons for their acceptance.

build output

--progress=plain is "better" than the tty version, but mucks about with the order of steps. I'd still like a simpler output like the "classic" builder (esp if parallel is turned off).

parallel

I do not know of any official images that would benefit from parallel build steps. They are designed that each step builds upon another (especially RUN lines). We take advantage of cpu cores by using -j when we can (which, without any cpu limits applied, means that nothing can co-build on the same machine).

@jedevc
Copy link
Contributor Author

jedevc commented Jun 7, 2022

Thanks for your insight @yosifkit, much appreciated! ❤️

Cache

The current bane of my existence 😢 - if only cache could be simple 😄

Pushing cache to a local registry is going to increase overall maintenance, since now there is a service to be maintained and cleaned regularly

I think I might have accidentally been unclear with regards to cache! - buildkit still has a local cache, similar to the old builder. We could simply continue to use that, for now, and it should behave similarly to the existing cache, and not add any additional maintenance overhead for the time being. However, this comes with the disclaimer that this cache is still per-node, and won't sync across multiple nodes - for multiple node caching, we'd need to have some form of cache storage.

There's a couple storage options for that:

  • We could use the registry/inline backend for cache, and push the cache alongside the image to docker hub - if we run the cache in min mode, then this shouldn't add any additional data, it should just attach some new manifests and annotations into the image index.
  • We could also use a network filesystem and use the local cache backend to write the cache to a directory, which can then be synced across nodes - this could be on top of EBS, or whatever storage service we want.

I have also seen a few interesting approaches that attempt to sync the /var/lib/buildkit directory across nodes to get some form of shared cache - though I'm not personally a huge fan.

How is this currently solved for the official images program? Are all the right images pulled directly before the build to ensure that they're present on each node to populate the cache, or is there some more complex logic behind it?

The most correct is that cache can and should be used for all images within a given repo that have at least one common parent image

Working out how to connect different builds to use the same cache is quite difficult - even assuming we have some way of calculating cache dependencies efficiently, some of the complex images could possibly require many different cache dependencies. I'm not quite sure if buildkit has any (soft/hard) limits on this 😕

we must do buildkit builds via context piped to stdin (e.g. git archive [...] | docker buildx [..]) since that seems to always work correctly

Aha, yup that makes sense - we can get buildx to do exactly the same thing here. I think if we were to use the buildkit git contexts at some point, we'd be safe as well, since those would use unique commit hashes in their url, and so couldn't be cached at all.

But agreed, sticking with piped contexts seems like the way to go for now 😄

Syntax

  • --link - as I understand it --link is most useful when doing a lot of copying from various different stages, like when the multistage graph is less a linked-list and more of a generic DAG. I think you're right, this pattern is probably going to be less common in official images, where usually only single units of software are installed per-image.
  • --mount syntax is definitely tricky to read. I think there's still some benefits to allowing it, since the mounted files aren't actually built into any layer in the image, so cleanup of any source files could become a little easier - that said, I'm not an official image maintainer, so I'm fairly ambivalent 😄
  • --network might be somewhat useful to disable network operations during a specific RUN operation, if it was desired to more thoroughly isolates parts of the build? Aside from that, doesn't seem particularly applicable.
  • I'd love to see heredocs used (I'm biased though 😄) - it feels like they could definitely be used to clean up more complex Dockerfiles like python.

I think it makes sense to allow using syntax selectively - though I think it makes sense to have those controls as a check by maintainers, instead of a buildkit feature flag, since maintaining those might quickly become a pain with a matrix of potential controls to test between.

build output + parallel

Think that makes sense! Nothing to add here.

Multi-arch builds

Something not haven't mentioned yet, but might be worth discussing - using buildkit, we could consider combining some of the different templated Dockerfiles together into singular ones. We can still keep using native builds doing this, with exactly the same scheduling mechanism as used today in jenkins, but could simplify some of the templating magic going on to improve readability.

This could potentially look something like:

FROM --platform=$BUILDPLATFORM scratch AS builder
ADD $BUILDPLATFORM.tar.gz /
CMD ["bash"]

FROM amd64-builder AS amd64
FROM arm64-builder AS arm64

# special case for s390x
FROM builder as s930x
RUN echo "do things"

FROM ${TARGETARCH}${TARGETVARIANT}

Then just based on what platform this is built for, the right stages will be selected and built - curious what you think about something like this for some of the simpler templated builds, I think it could potentially help simplify some dockerfiles in some cases (though definitely not everywhere).

@jedevc
Copy link
Contributor Author

jedevc commented Jun 28, 2022

Have pushed a smaller commit based on our conversations, it should be fairly self-explanatory 🎉

A couple notes:

  • Will continue trying to follow up r.e. logs, and better parallelism control. I don't think either of these are necessarily blockers, I'm happy to handle them as follow-ups, or to wait 😄
    • The parallel defaults shouldn't slow anything down, since most of the official images are single-staged, then only one instruction will be processed at a time anyways - parallel builds only come into play with multiple stages.
  • I've forced the DOCKER_BUILDKIT=0 flag in the plain docker build function - that will probably end up being necessary at some point in a future docker release.

If you want, I'm then happy to follow-up and document the new functionality in separate PRs in the right official-images repos once this is in a ready-to-go state 👍

@jedevc jedevc marked this pull request as ready for review June 28, 2022 11:42
@tianon
Copy link
Member

tianon commented Jul 12, 2022

Sorry for the delay!

Here are some thoughts I'd love to discuss more in the spirit of not letting "the perfect review" be the enemy of "useful feedback" that might lead to more useful discussion: 😅 ❤️

  • I think instead of a pure boolean option, we might want to consider something more general that could be flexible later if we needed to change "how things build" again, like Builder: buildx vs Builder: classic or something? 🤔

  • it'll (unfortunately) need to also tie into the global values/per-architecture logic inside the parser such that we could have nasty things like s390x-Builder: foo and a global Builder: bar value in the top-level entry in the library files that gets the correct precedence based on the build 🙈

  • this also probably needs to tie into the "bashbrew cache hash" since changing this value should result in a new attempted build (and the builds are not considered equal) 👀 looking for places we reference GitFetch or GitCommit within the codebase should help you find more places this should probably be injected (which will help you find the above), and I think it thus makes sense to include it in the same whitespace block within type Manifest2822Entry struct as GitRepo, GitFetch, etc (I'd just make sure they're in the same order everywhere, ie Builder comes after File consistently) 😅

  • rather than hard-coding docker/dockerfile:1 as a constant, I think we ultimately want to be more specific (docker/dockerfile:1.4.2, for example), and thus maybe we need to make that something we could specify external to the compiled bashbrew binary -- perhaps some way to control that for the whole program with a buildkit-syntax file we source into an environment variable further down in the pipeline? Maybe that needs to be part of the bashbrew configuration? Perhaps we should introduce something like a library/.meta or library/.config file inside a "bashbrew library" that has configuration like that which applies to a whole "library"?

@jedevc
Copy link
Contributor Author

jedevc commented Jul 13, 2022

I think instead of a pure boolean option, we might want to consider something more general that could be flexible later

✔️ Agreed, I've proposed "classic" and "buildkit" - I think buildx here is an implementation detail, since that might easily change in the future to just use a normal docker build, so don't think it needs to be exposed to image maintainers necessarily.

tie into the global values/per-architecture logic inside the parser

✔️ have done this

needs to tie into the "bashbrew cache hash" since changing this value should result in a new attempted build

✔️ Aha, yup, good point - I've added this to the "unique bits" now. I've specifically only added it if we detect a non-empty value to preserve the contents of the cache hash for unchanged manifests, so only modified manifests with the new Builder key set modify this.

Also caught a couple places where it needs to be used alongside File and Commit, etc.

rather than hard-coding docker/dockerfile:1 as a constant

❓ Would using an environment variable here be sufficient, to allow overriding a default of docker/dockerfile:1? Then this could be setup as a property of the build environment, instead of bashbrew itself. I've implemented that, let me know if you'd rather it read from a config file, etc.

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, really appreciate your patience, @jedevc 🙇

Just one really tiny nit and I think this is ready 👀

(I'm happy to update it if you need me to - the delay is totally my fault 😅)

cmd/bashbrew/cmd-build.go Outdated Show resolved Hide resolved
cmd/bashbrew/docker.go Outdated Show resolved Hide resolved
This patch adds a new `Builder` entry to the RFC2822 image manifest
files, allowing individual image components to opt into buildkit
support, using docker buildx.

Additionally, docker build is changed to explicitly set
DOCKER_BUILDKIT=0 to force a non-buildkit build in preparation for
the switching default in any upcoming docker versions.

Signed-off-by: Justin Chadwell <me@jedevc.com>
Signed-off-by: Justin Chadwell <me@jedevc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants