-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Remove the conda package cache as we can't hardlink to it #666
[MRG] Remove the conda package cache as we can't hardlink to it #666
Conversation
Found https://github.com/wagoodman/dive as a way to look at layers and through that found 3a6e4b4. |
@@ -69,9 +69,12 @@ fi | |||
|
|||
# Clean things out! | |||
conda clean -tipsy | |||
rm -rf /srv/conda/pkgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this entry is the same as adding the -f
or --force-pkgs-dirs
command line argument of conda clean
. Really not confident about this, but see the #667.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good pointers. I did some googling about caches and temporary files used by conda
for #638. Mostly you find "disinformation" or confused people :-/
My thoughts from back then are in #638 (comment). I think we can't use hardlinks within a docker image (maybe we could within a single layer?) so there is no benefit to having /srv/conda/pkgs
available for later layers that install the same conda package. On a normal filesystem we'd just hardlink to /srv/conda/pkgs
if it contains the package we are installing, but on docker we can't do that :(
In conclusion I think we should try out -f
to see if that has the same effect as rm -rf /srv/conda/pkgs
. Relevant discussion: jupyter/docker-stacks#861
Investigating hardlinks in docker images a bit.
This is a Now I am not completely sure any more why removing the package cache actually reduces the image size. Maybe conda doesn't actually use hardlinks after all? |
I'm out of my depth about this, I don't yet understand hardlinks yet for starters. I need to read up :p |
I think we should merge this, track the discussion in jupyter/docker-stacks#861 and then make a new PR to implement the things learned from the discussion there. Or someone else who is confident they understand how all this works chimes in and we can implement "the right thing" straight away :) |
The section "Linking packages from package cache into environments" on https://www.anaconda.com/understanding-and-improving-condas-performance/ explains the conda behavior.
I believe this is correct after reading the above documentation and the results of your experiment. |
It is my understanding that the added remove pkg folder thing is what |
Using |
Hmmm... One R pytest failed in build 1407
|
And now both stencila-r and R pytest failed in build 1408: stencila-r logs
|
Restarted both failed builds. They have been failing over the last few days due to (I think) networking problems or slow servers. So for now hitting "restart", if this keeps happening we need to investigate how to make them more resilient. |
Checks pass now! |
…install [MRG] Remove the conda package cache as we can't hardlink to it
This shrinks our "base" image by about 50MB plus 40MB from the second commit.
You can see the size of the "install miniconda" layer in various scenarios here:
master
https://microbadger.com/images/betatim/r2d-minimal-python (330MB)Maybe there is more saving potential somewhere, this was low hanging fruit. If someone knows a tool to see which files get created by each layer we could check for other temporary files left over that can be deleted.