-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conda clean command - what flags should we use? #861
Comments
You beat me to it. I had planned on making this issue, and had also noticed the |
IIRC, @jakirkham and I setup some of the initial clean commands a long time ago, probably when we were still on conda 3.x? I have few doubts that there are better options today. #406 has been open for some time as a thought about how to test the behavior of the various packages and language kernels. Maybe putting some lightweight notebook tests in place before making the change would provide some confidence that the changes are not detrimental? Or maybe that's too much work. Perhaps some simple unit test additions against the base container to ensure that a package can be imported (e.g., jupyter) and a new package can be installed and then imported in the built image would be enough? |
Perhaps we could add some tests to the datascience-notebook where we do something similar to the test below from base-notebook, but try to do a command to verify various import statements functions. docker-stacks/base-notebook/test/test_container_options.py Lines 37 to 47 in 185a9f7
|
Following the discussion in issue jupyter#861, this changes the `conda clean` flags.
With the
The problem is conda build isn't installed, and I'm not sure it makes sense to install something extra to try to reduce size. I have a branch for this issue now with the initial changes. I'll have an additional commit to test the |
@echowhisky I reduced my 3.6 GB image that is a bloated datascience-notebook image built from scratch to 3.4 GB by going for |
@parente got an idea on how to test various python import statements as part of a test? If I could get a bit confident about how it makes sense to go about it I'd be happy to add such test along with a PR to change the conda clean command parameters. Furthermore, would various import statements to various packages be good enough of a test if we broke something? I'm not confident about these things at all... @betatim - it is my understanding from what you said, without technically understanding it, that you think it would make sense to have the |
My docker image built with |
@consideRatio If you're planning on write a pytest, you could mimic what we do in the base notebook tests to launch a docker container and run |
This commit adds the additional `-f` force command to all uses of `conda clean --all` through the repo. Size should be smaller, but still testing if anything breaks. See issue jupyter#861.
@consideRatio I would start with a single pytest setup like https://github.com/jupyter/docker-stacks/blob/master/base-notebook/test/test_container_options.py#L8 where the test body is something like the following pseudocode:
This page in the doc (https://jupyter-docker-stacks.readthedocs.io/en/latest/contributing/tests.html) describes where to put the test file so that it executes against the correct image. You'll need to create a pipenv/venv/conda env containing the contents of the |
We have been doing what I think is the equivalent of |
I would not be surprised if there is a bug in how Docker is reporting hard-links in image size. This has happened before ( moby/moby#9283 ). |
I finally learned more about hard-links: https://en.wikipedia.org/wiki/Hard_link |
For comparison, in the jovyan@6e6dbadc1fe3:/opt/conda$ du -sch *
182M bin
2.4M compiler_compat
8.0K condabin
7.3M conda-meta
4.0K envs
44K etc
13M include
312M lib
8.0K LICENSE.txt
56K man
135M pkgs
412K sbin
3.5M share
12K shell
24K ssl
12K x86_64-conda_cos6-linux-gnu
655M total And then when I built the same container with the jovyan@145c591d5eea:/opt/conda$ du -sch *
182M bin
2.4M compiler_compat
8.0K condabin
7.3M conda-meta
4.0K envs
44K etc
13M include
312M lib
8.0K LICENSE.txt
56K man
796K sbin
39M share
24K shell
336K ssl
12K x86_64-conda_cos6-linux-gnu
556M total Even though the Not sure any of this is super important, but was a hard-linked related artifact that tripped me up at first. In the end, using The issues with the R environment failing seems to have resolved, but if it turns back up it might be fixable by using the I've updated my fork's commits (167c686) to be in sync with the latest updates to the master (2662627) but haven't submitted a PR yet, waiting to see if there's additional testing coming. If I can get my head around it a bit better and carve out the time, I'll try to contribute to the testing pieces as well. |
@echowhisky, do you think we could raise a simplified version of this to the Docker team as an issue and see if they have any thoughts? |
@echowhisky @consideRatio Given what we found in repo2docker, I think it's OK to proceed with the |
@jakirkham checkout jupyterhub/repo2docker#666 (comment) for a "minimal" example on hardlinks. Seems like I think I read somewhere in the conda docs that it will try to hardlink what it can from |
@jakirkham - the info I quoted above wasn't a docker issue, just the normal reduction from tweaking the There are very likely some additional size savings that can be squeezed out, possibly using some other techniques like cleaning out all of the compiled python (.pyc) files and some of the other pieces mentioned in the conversation you referenced, but that probably deserves its own issue thread. |
If the Docker Hub tag size reports are to be believed, PR #867 caused the following changes in image size, starting in tag 4d7dd95017ed:
The Spark images have not changed in size because they failed to rebuild on Docker Hub. (#871 is addressing). I don't have an explanation for how the tensorflow image increased in size, but maybe there's evidence in in the build manifests (https://github.com/jupyter/docker-stacks/wiki/tensorflow-notebook-4d7dd95017ed vs https://github.com/jupyter/docker-stacks/wiki/tensorflow-notebook-2662627f26e0). I used this notebook / binder to fetch the stats from Docker Hub. |
Thanks for the summary @parente ! Closing time? No clue about how tensorflow-notebook icnreased in size, perhaps there was some package that got itself a version bump? |
Thanks for the discussion and work put in here, folks. Less is more FTW! |
change flags on "conda clean" from "-tipsy" to "--all -f -y". See jupyter/docker-stacks#861.
Following the discussion in issue jupyter#861, this changes the `conda clean` flags.
This commit adds the additional `-f` force command to all uses of `conda clean --all` through the repo. Size should be smaller, but still testing if anything breaks. See issue jupyter#861.
Following the discussion in issue jupyter#861, this changes the `conda clean` flags.
This commit adds the additional `-f` force command to all uses of `conda clean --all` through the repo. Size should be smaller, but still testing if anything breaks. See issue jupyter#861.
Hello everyone! I just discovered this thread after running into the |
@echowhisky noted that
conda clean
with the-f
or--force-pkgs-dirs
flag can do significant reduction in image sizes, but we do not grasp of the consequences of doing so. I created this post to have a gather discussion regarding this and what flags we should use in this command!I found the following documentation about the flag:
https://conda.io/projects/conda/en/latest/commands/clean.html#Removal%20Targets
In the docs they write:
The concrete question I mind that we need to answer is more specifically, what flags do we pass to the clean command? The flags detail what to cleanup.
Currently we are using the
-tipsy
flag, but I fail to find any documentation about this flag, I think it is deprecated in favor of other flags. So, we should probably update this no matter what.Usage of
conda clean
building up to datascience-notebookBase notebook
docker-stacks/base-notebook/Dockerfile
Lines 83 to 105 in ae5f7e1
Minimal notebook
None
Scipy notebook
docker-stacks/scipy-notebook/Dockerfile
Line 47 in ae5f7e1
Datascience notebook
docker-stacks/datascience-notebook/Dockerfile
Line 68 in ae5f7e1
UPDATE
Woops, so tipsy is simply the list of flags
-t
-i
-p
-s
-y
.-t
or--tarballs
-i
or--index-cache
-p
or--packages
-s
or ... eh that was not documented, it probably relates to--source-cache
and this issue-y
or--yes
Then I ask myself.
-l
or--lock
flag that is part of the--all
flag?-f
orforce-pkgs-dirs
that is not part of the--all
flag?-s
or--source-cache
flag that is deprecated?The text was updated successfully, but these errors were encountered: