-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Buildkit caching? #875
Comments
Thanks for creating the issue and answering all our questions. I think this is a cool idea, anything to make builds faster is :) Currently we try and be smart about the order of the statements in the generated I read https://blog.mobyproject.org/introducing-buildkit-17e056cc5317 for a quick introduction what it is and then looked at the repository. With docker v19.x you get BuildKit "by default" but need to enable it. We use docker-py to talk to docker. Currently docker-py does not yet support BuildKit. This issue has some details on what is hard about adding support and where to help make it happen. |
Not sure what to do about the docker-py and buildx issues, but I did manage to get renv working with docker buildx (after stumbling around for quite a while :) https://github.com/howisonlab/test_repo_buildx_renv I tried to keep things as close to REES by using an unchanged |
The ffsync issue in docker/docker-py#2230 (comment) does seem like a long-term blocker. And the main driver for it was docker-compose but the advice is to use the CLI for that? So that thread reads to me that py-docker is pretty much stopped in development? Buildx bake seems like it's an option? I guess whether deciding to depend on buildx depends on the long-term road map for docker, although I'm pretty sure that buildx is part of that, if not becoming the default building setup in the fair near future. I have no special knowledge though! |
I just came back to this idea after having a super positive experience with buildx build caches for a large, repeated conda install with small changes between rebuilds. I think it would be a massive win for one of the major sources of build time on Binder. It seems like waiting for docker-py to support buildkit is not likely to happen any time soon, so we should look into building with the CLI. The main hurdle, I think, is that we currently construct a tarball for the build context in-memory, whereas buildx needs an extracted on-disk directory (which it will then re-serialize and re-send). One way this might work is to take what we already have, and:
This is probably the shortest route to "it works", though there would be quite a few duplicate files (the whole repo, for one), and possibly lose some ownership info we encode, but it ought to work. It could also make debugging a lot easier than it is now, as there would be a directory on disk where one could edit and debug with A second option would be to skip the tarfile, build locally, and put all our staged-in files in a special |
That's pretty much what I'm doing in https://github.com/manics/repo2docker-podman/ Would this be a good use case for #848 (both my above projects rely on it), and putting buildx in a new engine? |
I was exploring using multi-stage docker builds with buildkit and was impressed by the concurrency performance. Is invoking the CLI from repo2docker (via a new interface) still thought to be the preferred method to support it? The docker-py/buildkit issue linked above has not seen much progress. |
#848 was merged a while ago and has been included in several releases, so I think it's the best way to develop or experiment with a new container engine- we can discuss later if it should be merged into the core repo2docker or kept as an optional addon. I've no idea what the best way to implementing buildkit is- if you go down the CLI exec route that's what I did with podman https://github.com/manics/repo2podman/blob/main/repo2podman/podman.py so you might be able to do a search and replace to get started? |
Great, thanks @manics ! |
@manics I was able to get repo2docker --engine podman --PodmanEngine.podman_executable=docker This required patching the json output format in repo2podman a little, and telling docker to default to buildx. For the former, I'll create an issue in repo2podman. It's just kind of funny using the repo2podman plugin to get repo2docker to actually run |
Proposed change
It would be wonderful to use the Docker buildkit caching capabilities. These enable incremental addition of packages, so one can add a single package to lists like
requirements.txt
. That invalidates the standard Docker layer caching, but the layer is quickly rebuilt because all the compilation without triggering entire rebuilds. The actual building happens on a special Docker container (which retains the caches).https://docs.docker.com/develop/develop-images/build_enhancements/
This question includes links to examples for python building (but points out that it doesn't work for R building (which apparently is going to need something like renv to work).
https://stackoverflow.com/questions/59253392/using-docker-buildkit-caching-with-r-packages
Alternative options
I don't know enough to know if repo2docker is already doing something awesome to reuse compilation here. Perhaps a Docker layer per package build? I wonder if that would cause issues, though.
Who would use this feature?
Anyone adding a package would benefit from much quicker rebuilds. Should also help with builds in a place like mybinder.
How much effort will adding it take?
I haven't yet looked at the repo2docker build code, so I don't know. Mea Culpa. Biggest issue is that this requires fairly recent Docker and changes the build process so that builds happen in a Docker container rather than on the host.
Who can do this work?
I could help test, but have not yet dived into the repo2docker code.
The text was updated successfully, but these errors were encountered: