Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix ci/docker_cache.py #18056

Merged
merged 2 commits into from
Apr 15, 2020
Merged

Fix ci/docker_cache.py #18056

merged 2 commits into from
Apr 15, 2020

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Apr 14, 2020

Description

#17984 added a cache_intermediate option to build_docker enabling developers to speed up local docker builds during development. As no default value was provided, this broke the use of build_docker function in other places.

Reference: Failing ci/jenkins/restricted-docker-cache-refresh on master after merge: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-docker-cache-refresh/detail/master/2789/pipeline

Also

Fixes #18061

@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [edge, windows-cpu, centos-gpu, unix-cpu, unix-gpu, website, centos-cpu, sanity, windows-gpu, clang, miscellaneous]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@@ -71,7 +71,7 @@ def get_docker_binary(use_nvidia_docker: bool) -> str:


def build_docker(platform: str, docker_binary: str, registry: str, num_retries: int, no_cache: bool,
cache_intermediate: bool) -> str:
cache_intermediate: bool = False) -> str:
Copy link
Contributor

@marcoabreu marcoabreu Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be true. Otherwise, a single change will result in a full rebuild, drastically increasing build times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind changing it to True, but the existing behaviour before introducing the flag was False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One issue I noted during local development when setting this to True, is that my disk can fill up very quickly because Docker will not clean any of the cached layers automatically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there has been some back and forth between refactors - maybe somebody removed it :/. Initially, I designed it that way that intermediary layers were kept since that's the whole point of the partial cache. This test validates that assumption: https://github.com/apache/incubator-mxnet/blob/master/ci/test_docker_cache.py#L184

Note that this test suite is not run in CI...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well you can't have everything - either partial caching and manual cleanup or automatic cleanup but no partial caching. I think it is fine to expect that people use docker purge every now and then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it should be possible. It will just cause issues if the test fails since that might mean that stuff gets left dangling, but due to the mentioned lifecycle, that shouldn't have a big impact.

Feel free to enable it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I can't fully recall what was the thing around --rm=false, but I recall that I wrote tests to cover all these unknowns. Rule of thumb is that if the tests pass, we're good to go (given they use the same parameters of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI does not docker purge automatically. But since the lifecycle of unix slaves is very very short, it's not an issue

If the intermediate containers are only kept on the slave, and we only generate them in PRs that change the Dockerfile, I think we don't need to use them at all on CI. --rm=false then seems only useful for development.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep exactly. But please make sure that the tests pass before merging - otherwise we might break the partial caching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does not change the status quo. There is no risk.

@leezu leezu merged commit 2c4732b into apache:master Apr 15, 2020
@leezu leezu deleted the fixbuildpy branch April 15, 2020 05:31
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
* Fix Julia tests always testing master branch instead of PR branches.

* Fix ci/docker_cache.py

* Fix Dockerfiles for nightly tests
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Julia tests always test master branch of MXNet instead of the PR branch
3 participants