Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use docker buildx to build base Linux images #5975

Merged

Conversation

antoninbas
Copy link
Contributor

We use "docker buildx" when building the base Linux images for Antrea. This gives us access to all the BuildKit capabilities, in particular exporting the build cache to the registry as a separate image, when using the docker-container driver. Exporting the cache means that the base images won't need to be rebuilt from scratch every time (whether it is for local builds or CI builds).

Using buildx with Docker requires Docker Engine 19.03 or newer, but that version came out in 2019. Note that we already require buildx / BuildKit since merging #5918.

Note that in recent Docker versions (starting with Docker Engine 23.0 and Docker Desktop 4.19), "docker build" is an alias for "docker buildx build" (buildx is the default build client). However, explicitly using "docker buildx build" lets us support older versions as well.

While exporting the cache to the registry requires using the docker-container driver, and will not work with the default docker driver, developers are not supposed to export the cache anyway. This is reserved for Github CI jobs.

We also add a new "--no-cache" option to force re-building from scratch. That option should not be required for normal workflows.

The Makefile is improved to reduce verbosity.

Fixes #5944

tnqn
tnqn previously approved these changes Feb 9, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines -199 to -205
ifneq ($(NO_PULL),)
docker build -t antrea/test -f build/images/test/Dockerfile $(DOCKER_BUILD_ARGS) .
else
docker build --pull -t antrea/test -f build/images/test/Dockerfile $(DOCKER_BUILD_ARGS) .
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great improvement

@rajnkamr
Copy link
Contributor

rajnkamr commented Feb 9, 2024

Images built with the docker-container driver must be explicitly loaded with --load unlike docker load command, like
docker buildx build --load -t --builder=container .
In ci jobs couple of places , the base image antrea ubuntu getting loaded using docker load, IMO, it might impact those ?

@antoninbas
Copy link
Contributor Author

@rajnkamr

Images built with the docker-container driver must be explicitly loaded with --load unlike docker load command, like
docker buildx build --load -t --builder=container .

--load is just an alias for -o type=docker / -output type=docker, and I use the latter in the build scripts

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas
Copy link
Contributor Author

Many contributors are on vacation because of Chinese New Year, so we will wait to merge it in case it creates any disruption in CI.

@@ -76,7 +78,10 @@ case $key in
esac
done

THIS_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
if ! check_for_buildx; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are unable to upgrade docker on the current cloud node to support buildx, I believe the current change will cause all cloud jobs to fail in the image building stage. There may be two potential solutions to this problem: 1. Merge this PR after we complete the Jenkins migration (#5759) and upgrade docker on all testbeds. 2. Continue using the docker build command for docker that do not support buildx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XinShuYang

Using buildx with Docker requires Docker Engine 19.03 or newer, but that version came out in 2019. Note that we already require buildx / BuildKit since merging #5918.

This PR does not introduce new version requirements compared to #5918. docker buildx has been supported since 19.03, which is almost 5 years old. Are you saying that the some builds are currently broken because of #5918, with a Docker version older than 19.03 and no possibility of upgrade?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker version is 20.10.7, it showed 'buildx' is not a docker command.

Copy link
Contributor

@XinShuYang XinShuYang Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that the some builds are currently broken because of #5918

#5918 caused issue #5941 due to the --mount compatibility problem.
This issue #5941 has been fixed by #5937, in this PR I enabled DOCKER_BUILDKIT to support --mount in dockerfile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably as simple as setting the DOCKER_CLI_EXPERIMENTAL variable, in order to support these older versions.
Let me update the PR accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made that change. Could you try running ./hack/build-antrea-linux-all.sh on one of the workers? Or maybe I can just trigger Jenkins jobs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I can just trigger Jenkins jobs?

Yes, you can trigger the cloud job manually with custom repo and branch name. e.g. https://jenkins.antrea-ci.rocks/job/cloud-antrea-gke-conformance-net-policy/build?delay=0sec

This is probably as simple as setting the DOCKER_CLI_EXPERIMENTAL variable, in order to support these older versions.

I tried to trigger one job in https://jenkins.antrea-ci.rocks/job/cloud-antrea-gke-conformance-net-policy/696/. Maybe the failure is caused by the result of check_for_buildx, should we run export DOCKER_BUILDKIT=1 and export DOCKER_CLI_EXPERIMENTAL=enabled before running check_for_buildx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't work. It's possible that the buildx plugin is not available at all if docker was installed using distribution packages and not using the Docker installation script, but I would need to ssh into the Jenkins worker to check.

Is the issue only for the public cloud Jenkins jobs (AKS / GKE / EKS)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cloud node was unavailable for ssh access before the migration, but after several attempts on jenkins script, I successfully upgraded the docker. The gke tests now pass with the current code changes so I believe we can proceed with merging this PR.

fi
done
elif [ "$DISTRO" == "ubi" ]; then
docker pull $PLATFORM_ARG centos:centos7
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, is there any plan to upgrade this centos version? I found that centOS 7 will be EOL on June 2024.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be done as part of #5725

-t antrea/cni-binaries:$CNI_BINARIES_VERSION \
--build-arg CNI_BINARIES_VERSION=$CNI_BINARIES_VERSION \
--build-arg BUILD_TAG=$BUILD_TAG .
function docker_build_and_push() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to the one in ovs/build.sh, maybe they can be refined and moved to build-utils.sh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about it originally, and I don't feel strongly either way, but it felt to me like the script was more readable this way, rather than have to pass Docker build args as a function argument.

@antoninbas
Copy link
Contributor Author

/test-all

We use "docker buildx" when building the base Linux images for
Antrea. This gives us access to all the BuildKit capabilities, in
particular exporting the build cache to the registry as a separate
image, when using the docker-container driver. Exporting the cache means
that the base images won't need to be rebuilt from scratch every time
(whether it is for local builds or CI builds).

Using buildx with Docker requires Docker Engine 19.03 or newer, but that
version came out in 2019. Note that we already require buildx / BuildKit
since merging antrea-io#5918.

Note that in recent Docker versions (starting with Docker Engine 23.0
and Docker Desktop 4.19), "docker build" is an alias for "docker buildx
build" (buildx is the default build client). However, explicitly using
"docker buildx build" lets us support older versions as well.

While exporting the cache to the registry requires using the
docker-container driver, and will not work with the default docker
driver, developers are not supposed to export the cache anyway. This is
reserved for Github CI jobs.

We also add a new "--no-cache" option to force re-building from
scratch. That option should not be required for normal workflows.

The Makefile is improved to reduce verbosity.

Fixes antrea-io#5944

Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@antoninbas antoninbas force-pushed the use-docker-buildx-and-registry-cache branch from abd9e12 to 1d90444 Compare February 26, 2024 19:08
Copy link
Contributor

@XinShuYang XinShuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor

/test-all

@antoninbas antoninbas merged commit 0db2efb into antrea-io:main Feb 27, 2024
50 of 55 checks passed
@antoninbas antoninbas deleted the use-docker-buildx-and-registry-cache branch February 27, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build chain (hack/build-antrea-linux-all.sh) always rebuilds base images even when available in registry
5 participants