-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BuildKit configurations to enhance compatibility #5937
Conversation
e5ab0ee
to
6669b9b
Compare
/test-all |
/test-all |
ci/test-conformance-eks.sh
Outdated
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}') | ||
free_space_threshold=40 | ||
if [[ $free_space -lt $free_space_threshold ]]; then | ||
# If cleaning up dangling images unused in the last hour doesn't free up sufficient disk space, | ||
# we will have to clean up all buildkit cache to release enough disk space. | ||
docker buildx prune -af > /dev/null | ||
fi | ||
docker buildx du | ||
set -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are code duplication, I feel it's better to move them into a shell file, maybe docker-utils.sh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to utils.sh.
66e47d9
to
0d51aef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question about the cache cleanup.
ci/jenkins/utils.sh
Outdated
# If cleaning up dangling images unused in the last hour doesn't free up sufficient disk space, | ||
# we will have to clean up all buildkit cache to release enough disk space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to cleanup all cache? I think cache can still help to speed up the image building. I saw there is an option --keep-storage
, could you check if it's useful in this case? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luolanzone We can use this parameter to reduce the cache to 10GB during the first round cleanup. If the storage space is still insufficient, we will have to clean up all build cache.
47fba78
to
8709a85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the right way to fix it. @antoninbas may be more familiar with this.
ci/jenkins/test-mc.sh
Outdated
@@ -296,7 +297,7 @@ function deliver_antrea_multicluster { | |||
chmod -R g-w build/images/ovs | |||
chmod -R g-w build/images/base | |||
|
|||
DOCKER_REGISTRY="${DOCKER_REGISTRY}" ./hack/build-antrea-linux-all.sh --pull | |||
DOCKER_REGISTRY="${DOCKER_REGISTRY}" DOCKER_BUILDKIT=1 ./hack/build-antrea-linux-all.sh --pull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that ./hack/build-antrea-linux-all.sh
assumes that BuildKit is enabled, we could just set DOCKER_BUILDKIT=1
in the script directly, to avoid all these individual changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I have been thinking of using docker buildx build
directly as I believe it offers more command-line options (and better caching support), so it would make sense to update Docker everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I have been thinking of using
docker buildx build
directly as I believe it offers more command-line options (and better caching support), so it would make sense to update Docker everywhere.
Can I add buildx support in a seperate PR once I have upgraded docker on all testbeds? I can include this target in the issue of #5937 (comment).
ci/jenkins/utils.sh
Outdated
function check_and_cleanup_buildkit_cache() { | ||
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}') | ||
free_space_threshold=40 | ||
if [[ $free_space -lt $free_space_threshold ]]; then | ||
# If cleaning up unused dangling images doesn't free up sufficient disk space, | ||
# we will have to reduce the buildkit cache to 10GB to release enough disk space. | ||
docker buildx prune -af --keep-storage=10gb > /dev/null | ||
free_space=$(df -h -B 1G / | awk 'NR==2 {print $4}') | ||
if [[ $free_space -lt $free_space_threshold ]]; then | ||
# If the first round cleanup doesn't free up sufficient disk space, | ||
# we will have to clean up all buildkit cache to release enough disk space. | ||
docker buildx prune -af > /dev/null | ||
fi | ||
fi | ||
docker buildx du | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we considered some logic like this previously but decided not to use it? Am I misremembering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the previous code changes, we don't want to remove all images because doing so may delete necessary images for other concurrent jobs on the same testbed. However, for the BuildKit cache, it can be safely deleted without encountering any build image errors. The only difference is the building speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is because there is no equivalent command to docker buildx prune
in legacy docker? Is docker builder prune
different from this one?
I wonder if we should just unconditionally run docker buildx prune -af --keep-storage=10gb
instead of checking for free space first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is because there is no equivalent command to
docker buildx prune
in legacy docker? Isdocker builder prune
different from this one?
I tried both commands and docker builder prune can achieve the same result of docker buildx prune.
I wonder if we should just unconditionally run
docker buildx prune -af --keep-storage=10gb
instead of checking for free space first?
Updated, I am okay with both methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant only using docker builder prune -af --keep-storage=10gb
, and removing the rest of the logic. However, I am fine with the current approach. Let's see if it eliminates our space issues.
ci/jenkins/test-mc.sh
Outdated
docker images | grep -E 'mc-controller|antrea-ubuntu' | awk '{print $3}' | xargs -r docker rmi -f || true | ||
# Clean up dangling images generated in previous builds. | ||
docker image prune -f --filter "until=24h" || true > /dev/null | ||
check_and_cleanup_buildkit_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we upgrade Docker on all build machines, and avoid the 2 cases (legacy docker / buildkit)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @antoninbas, since we are working on migrating cloud testbeds(some of which can only be accessed by jenkins now), I tend to recover jenkins jobs first. I will create a separate PR to remove all legacy commands once I have upgraded docker on all testbeds. If you agree I can create an issue to track it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
6ef606f
to
7a21195
Compare
/test-all |
ci/jenkins/utils.sh
Outdated
if [[ $free_space -lt $free_space_threshold ]]; then | ||
# If cleaning up unused dangling images doesn't free up sufficient disk space, | ||
# we will have to reduce the buildkit cache to 10GB to release enough disk space. | ||
docker buildx prune -af --keep-storage=10gb > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so all Jenkins workers can already run docker buildx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't work with some legacy docker versions, the command failure won't block the CI pipeline. However, given that docker builder prune can achieve the same function and can be supported on all docker releases, I have updated it for better compatibility.
ci/jenkins/utils.sh
Outdated
@@ -14,6 +14,19 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
function check_and_cleanup_buildkit_cache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we replace buildkit_cache
with docker_build_cache
in the function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated.
ci/jenkins/utils.sh
Outdated
function check_and_cleanup_buildkit_cache() { | ||
free_space_threshold=40 | ||
|
||
docker builder prune -af --keep-storage=10gb > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope that in the general case this is enough and we can keep some cache around to speed up future builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel reducing cache space doesn't guarantee the effectiveness of the latest used cache. I tend to continuing to use LRU method based on the timer filter. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should start with the size one actually. I imagine that the implementation is "smart" enough to delete old objects first. Otherwise, if we don't trigger a job for 1h, we lose all the cache and it becomes useless.
if you prefer, you can go back to your first version:
if free_space < threshold
docker builder prune -af --keep-storage=10gb > /dev/null
if free_space < threshold
docker builder prune -af > /dev/null
I think the size filter is a better fit for us (for the build cache)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I am okay with it.
docs/minikube.md
Outdated
@@ -26,7 +26,7 @@ minikube start --cni=antrea.yml --network-plugin=cni | |||
|
|||
These instructions assume that you have built the Antrea Docker image locally | |||
(e.g. by running `make` from the root of the repository, or in case of arm64 architecture by running | |||
`DOCKER_BUILDKIT=1 ./hack/build-antrea-ubuntu-all.sh --platform linux/arm64`). | |||
`./hack/build-antrea-ubuntu-all.sh --platform linux/arm64`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are modifying this line, the current name of the script is ./hack/build-antrea-linux-all.sh
, it's not ./hack/build-antrea-ubuntu-all.sh
anymore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks:)
9d3736f
to
33dabc9
Compare
ci/jenkins/utils.sh
Outdated
@@ -14,6 +14,18 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
function check_and_cleanup_docker_build_cache() { | |||
docker builder prune -af --filter="until=1h" > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you changed the filter from size-based to time-based?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied in #5937 (comment)
* Add BuildKit cache cleanup in CI * Declare DOCKER_BUILDKIT when building image for docker compatibility. fixes antrea-io#5941 Signed-off-by: Shuyang Xin <gavinx@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
* Add BuildKit cache cleanup in CI * Declare DOCKER_BUILDKIT when building image for docker compatibility. fixes antrea-io#5941 Signed-off-by: Shuyang Xin <gavinx@vmware.com>
fixes #5941