Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BuildKit configurations to enhance compatibility #5937

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

XinShuYang
Copy link
Contributor

@XinShuYang XinShuYang commented Jan 29, 2024

  • Add BuildKit cache cleanup in CI
  • Declare DOCKER_BUILDKIT when building image for docker compatibility.

fixes #5941

@XinShuYang XinShuYang force-pushed the buildkit branch 3 times, most recently from e5ab0ee to 6669b9b Compare January 30, 2024 02:14
@XinShuYang XinShuYang marked this pull request as ready for review January 30, 2024 02:16
@XinShuYang XinShuYang requested a review from tnqn January 30, 2024 02:19
@XinShuYang
Copy link
Contributor Author

/test-all

@XinShuYang
Copy link
Contributor Author

/test-all

Comment on lines 273 to 274
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}')
free_space_threshold=40
if [[ $free_space -lt $free_space_threshold ]]; then
# If cleaning up dangling images unused in the last hour doesn't free up sufficient disk space,
# we will have to clean up all buildkit cache to release enough disk space.
docker buildx prune -af > /dev/null
fi
docker buildx du
set -e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are code duplication, I feel it's better to move them into a shell file, maybe docker-utils.sh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to utils.sh.

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question about the cache cleanup.

Comment on lines 21 to 23
# If cleaning up dangling images unused in the last hour doesn't free up sufficient disk space,
# we will have to clean up all buildkit cache to release enough disk space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to cleanup all cache? I think cache can still help to speed up the image building. I saw there is an option --keep-storage, could you check if it's useful in this case? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luolanzone We can use this parameter to reduce the cache to 10GB during the first round cleanup. If the storage space is still insufficient, we will have to clean up all build cache.

@XinShuYang XinShuYang force-pushed the buildkit branch 4 times, most recently from 47fba78 to 8709a85 Compare January 30, 2024 08:42
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the right way to fix it. @antoninbas may be more familiar with this.

@@ -296,7 +297,7 @@ function deliver_antrea_multicluster {
chmod -R g-w build/images/ovs
chmod -R g-w build/images/base

DOCKER_REGISTRY="${DOCKER_REGISTRY}" ./hack/build-antrea-linux-all.sh --pull
DOCKER_REGISTRY="${DOCKER_REGISTRY}" DOCKER_BUILDKIT=1 ./hack/build-antrea-linux-all.sh --pull
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ./hack/build-antrea-linux-all.sh assumes that BuildKit is enabled, we could just set DOCKER_BUILDKIT=1 in the script directly, to avoid all these individual changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I have been thinking of using docker buildx build directly as I believe it offers more command-line options (and better caching support), so it would make sense to update Docker everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I have been thinking of using docker buildx build directly as I believe it offers more command-line options (and better caching support), so it would make sense to update Docker everywhere.

Can I add buildx support in a seperate PR once I have upgraded docker on all testbeds? I can include this target in the issue of #5937 (comment).

Comment on lines 17 to 32
function check_and_cleanup_buildkit_cache() {
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}')
free_space_threshold=40
if [[ $free_space -lt $free_space_threshold ]]; then
# If cleaning up unused dangling images doesn't free up sufficient disk space,
# we will have to reduce the buildkit cache to 10GB to release enough disk space.
docker buildx prune -af --keep-storage=10gb > /dev/null
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}')
if [[ $free_space -lt $free_space_threshold ]]; then
# If the first round cleanup doesn't free up sufficient disk space,
# we will have to clean up all buildkit cache to release enough disk space.
docker buildx prune -af > /dev/null
fi
fi
docker buildx du
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we considered some logic like this previously but decided not to use it? Am I misremembering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the previous code changes, we don't want to remove all images because doing so may delete necessary images for other concurrent jobs on the same testbed. However, for the BuildKit cache, it can be safely deleted without encountering any build image errors. The only difference is the building speed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because there is no equivalent command to docker buildx prune in legacy docker? Is docker builder prune different from this one?

I wonder if we should just unconditionally run docker buildx prune -af --keep-storage=10gb instead of checking for free space first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is because there is no equivalent command to docker buildx prune in legacy docker? Is docker builder prune different from this one?
I tried both commands and docker builder prune can achieve the same result of docker buildx prune.

I wonder if we should just unconditionally run docker buildx prune -af --keep-storage=10gb instead of checking for free space first?

Updated, I am okay with both methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant only using docker builder prune -af --keep-storage=10gb, and removing the rest of the logic. However, I am fine with the current approach. Let's see if it eliminates our space issues.

Comment on lines 137 to 140
docker images | grep -E 'mc-controller|antrea-ubuntu' | awk '{print $3}' | xargs -r docker rmi -f || true
# Clean up dangling images generated in previous builds.
docker image prune -f --filter "until=24h" || true > /dev/null
check_and_cleanup_buildkit_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we upgrade Docker on all build machines, and avoid the 2 cases (legacy docker / buildkit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @antoninbas, since we are working on migrating cloud testbeds(some of which can only be accessed by jenkins now), I tend to recover jenkins jobs first. I will create a separate PR to remove all legacy commands once I have upgraded docker on all testbeds. If you agree I can create an issue to track it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@XinShuYang XinShuYang force-pushed the buildkit branch 3 times, most recently from 6ef606f to 7a21195 Compare January 31, 2024 06:18
@XinShuYang
Copy link
Contributor Author

/test-all
/test-windows-all

if [[ $free_space -lt $free_space_threshold ]]; then
# If cleaning up unused dangling images doesn't free up sufficient disk space,
# we will have to reduce the buildkit cache to 10GB to release enough disk space.
docker buildx prune -af --keep-storage=10gb > /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all Jenkins workers can already run docker buildx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't work with some legacy docker versions, the command failure won't block the CI pipeline. However, given that docker builder prune can achieve the same function and can be supported on all docker releases, I have updated it for better compatibility.

@@ -14,6 +14,19 @@
# See the License for the specific language governing permissions and
# limitations under the License.

function check_and_cleanup_buildkit_cache() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we replace buildkit_cache with docker_build_cache in the function name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated.

function check_and_cleanup_buildkit_cache() {
free_space_threshold=40

docker builder prune -af --keep-storage=10gb > /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope that in the general case this is enough and we can keep some cache around to speed up future builds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel reducing cache space doesn't guarantee the effectiveness of the latest used cache. I tend to continuing to use LRU method based on the timer filter. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should start with the size one actually. I imagine that the implementation is "smart" enough to delete old objects first. Otherwise, if we don't trigger a job for 1h, we lose all the cache and it becomes useless.
if you prefer, you can go back to your first version:

if free_space < threshold
    docker builder prune -af --keep-storage=10gb > /dev/null
if free_space < threshold
    docker builder prune -af > /dev/null

I think the size filter is a better fit for us (for the build cache)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I am okay with it.

antoninbas
antoninbas previously approved these changes Feb 1, 2024
docs/minikube.md Outdated
@@ -26,7 +26,7 @@ minikube start --cni=antrea.yml --network-plugin=cni

These instructions assume that you have built the Antrea Docker image locally
(e.g. by running `make` from the root of the repository, or in case of arm64 architecture by running
`DOCKER_BUILDKIT=1 ./hack/build-antrea-ubuntu-all.sh --platform linux/arm64`).
`./hack/build-antrea-ubuntu-all.sh --platform linux/arm64`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are modifying this line, the current name of the script is ./hack/build-antrea-linux-all.sh, it's not ./hack/build-antrea-ubuntu-all.sh anymore :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks:)

@@ -14,6 +14,18 @@
# See the License for the specific language governing permissions and
# limitations under the License.

function check_and_cleanup_docker_build_cache() {
docker builder prune -af --filter="until=1h" > /dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you changed the filter from size-based to time-based?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in #5937 (comment)

* Add BuildKit cache cleanup in CI
* Declare DOCKER_BUILDKIT when building image for docker compatibility.

fixes antrea-io#5941

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@tnqn tnqn merged commit 075efef into antrea-io:main Feb 2, 2024
48 of 53 checks passed
luolanzone pushed a commit to luolanzone/antrea that referenced this pull request Jun 17, 2024
* Add BuildKit cache cleanup in CI
* Declare DOCKER_BUILDKIT when building image for docker compatibility.

fixes antrea-io#5941

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
tnqn pushed a commit that referenced this pull request Jun 17, 2024
* Add BuildKit cache cleanup in CI
* Declare DOCKER_BUILDKIT when building image for docker compatibility.

fixes #5941

Signed-off-by: Shuyang Xin <gavinx@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BuildKit Error for --mount Option in Dockerfile
4 participants