-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to buildx to build airflow images #20664
Switch to buildx to build airflow images #20664
Conversation
0c7a955
to
f52ae4e
Compare
This is a VAST simplification of our docker build caching (using modern BuildKit plugin). I have to do some more testing with it, but on top of removing several hundreds of lines of Bash that I implemented when BuildKit did not have all the capabilities needed, it's also very stable and robust and opens the path to multi-architecture images (for ARM). I would do some more testing - but I would also love to merge #20238 (optimization of Dockerfiles) that this build is based on. |
cc: @Bowrna @edithturn - > that will make your job much easier, it simplifies a lot of the "caching" complexity - that's why I did not want you to start looking at it before as I knew we are going to get it WAAAAY simpler with BuildKit. |
f52ae4e
to
142bacd
Compare
I’m not too familiar with buildx in the first palce and will dismiss my request for review. |
142bacd
to
ca851d3
Compare
Whats buildx vs buildkit? |
Essentially the same. Buildx is a plugin you can install in docker to get more capabilities of buildkit available as "docker buildx build" command (and a number of management commands). You do not need it to run "buildkit-enabled" builds ( An example of that is our prod image. It is multi-segmented image so in order to prepare a good cache for the builds I need to do it with command smilar to This was (about a year ago when I checked last time) missing in buildkit - you had to do some strange combination of not-yet-released-then tools from "moby" to build and refresh the cache- but now with the plugin it's a "breeze" to manage and prepare the cache. In our case users will not have to install buildx plugin, but it will have to be available (eventually) on our self-hosted runners to refresh the cache on main builds (I will add it) and any time you want to manually refresh the cache with https://docs.docker.com/engine/reference/commandline/buildx_build/ |
It also has all the nice things about building multi-platform images. The nice thing about buldx is that you can have mutliple builders - for example different builders for different platforms, or you could have " build cache server" where you builds such caches and organize it in the way that each builder is completely separated from the "docker engine" it runs on. It works in the way that each builder runs as a separate container and has private image 'storage" so when the builder builds an image, it's not visible via Pretty nice solution to organize your builds when you have mutli-platform, multi-branch, multil-whatever case. Initially seems much more complex than original docker build system, but it is actually very intuitive. |
ca851d3
to
85a341c
Compare
85a341c
to
35cd951
Compare
e684937
to
03bc005
Compare
03bc005
to
76f9e73
Compare
76f9e73
to
acadcc9
Compare
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
FYI. I will run a few more tests in my fork - just to make sure everything is fine with caching of build-image workflow results and will merge it when I come back from Slovakia (going for a few days) just to make sure to support any problems with Breeze. So still some time for reviews :) |
The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on apache#20238
acadcc9
to
2bd9001
Compare
The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on #20238 (cherry picked from commit ad28f69)
The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on #20238 (cherry picked from commit ad28f69)
This feature had been removed when recent BUILDX improvements were added. This PR removes the remnants of it. Follow up after apache#20664
This feature had been removed when recent BUILDX improvements were added. This PR removes the remnants of it. Follow up after #20664
The "buildkit" is much more modern docker build mechanism and supports
multiarchitecture builds which makes it suitable for our future ARM
support, it also has nicer UI and much more sophisticated caching
mechanisms as well as supports better multi-segment builds.
BuildKit has been promoted to official for quite a while and it is
rather stable now. Also we can now install BuildKit Plugin to docker
that add capabilities of building and managin cache using dedicated
builders (previously BuildKit cache was managed using rather
complex external tools).
This gives us an opportunity to vastly
simplify our build scripts, because it has now much more robust caching
mechanism than the old docker build (which forced us to pull images
before using them as cache).
We had a lot of complexity involved in efficient caching
but with BuildKit all that can be vastly simplified and we can
get rid of:
always build image with --cache-from and buildkit will pull cached
layers as needed
we simply encourage users to rebuild the image via breeze command)
as buildkit allows to keep cache for all segments of multi-segmented
build in a single cache
spinner
Please take a look only at the last commit as it is based on #20238 #20679
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.