-
-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrink Docker Image #47
Conversation
Ooh, nice. cc @quasiben who may have experience here |
Only thought is that @jakirkham pointed out to me the The above has been more recently split into two files: |
&& find /opt/conda/ -type f,l -name '*.a' -delete \ | ||
&& find /opt/conda/ -type f,l -name '*.pyc' -delete \ | ||
&& find /opt/conda/ -type f,l -name '*.js.map' -delete \ | ||
&& rm -rf /opt/conda/pkgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am a little wary of this last rm
line. The tarballs should have already been cleaned by conda clean -t
. The rest of this content here should be unpacked tarballs, which are actively in use. I don't think we should remove those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're hardlinked, so the actual files are still there. I was surprised that this reduced the image size - I wonder if when docker compresses a layer it doesn't catch that hardlinks are identical and thus duplicates the data (the savings here are non-negligible). The savings might also be all the duplicate files left after prefix-rewriting (files that need prefixes rewritten are effectively copies). Conda is robust to a missing package cache, so I don't see this being a problem at runtime (I also tested installing/upgrading packages at runtime and things seemed to work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, they are hard-linked. So things may work ok.
Am also surprised to hear this decreased image size. I wonder to what extent the union filesystem used impacts this and to what extent Docker itself does. FWIW in my search to answer these questions, I came across this old PR ( moby/moby#16960 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been some discussion about this in Jupyter docker-stacks. In particular, this comment seems relevant.
&& conda clean -tipsy | ||
&& conda update conda -y \ | ||
&& conda clean -tipsy \ | ||
&& find /opt/conda/ -type f,l -name '*.a' -delete \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to get a list of the top 10(?) large static libraries. We may decide that the conda packages of these should split out the static libraries so they don't wind up getting installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openblas is the biggest one by far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that one I would have guessed. Am curious about the rest as there may be some things we don't know about.
&& conda update conda -y \ | ||
&& conda clean -tipsy \ | ||
&& find /opt/conda/ -type f,l -name '*.a' -delete \ | ||
&& find /opt/conda/ -type f,l -name '*.pyc' -delete \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These will probably be regenerated on first import. My guess is the actual files are smallish ~1KB. That said, would be curious to know if that matches with your experience or not. For instance how much space do all of the pyc files take up?
&& conda clean -tipsy \ | ||
&& find /opt/conda/ -type f,l -name '*.a' -delete \ | ||
&& find /opt/conda/ -type f,l -name '*.pyc' -delete \ | ||
&& find /opt/conda/ -type f,l -name '*.js.map' -delete \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea as I'm guessing people are not debugging JavaScript code in the container. Would encourage you to raise an issue in conda-forge about doing this in all cases. There may be some people that want these for debugging, in which case we can split them out. However if no one wants these files, we could start just removing them.
Thanks for working on this Jim. Gave this a somewhat detailed review. Not sure if that is what you were looking for. It was useful to see that things like As to Ben's point, we could There are probably other files we could strip if we want to get more aggressive about cleaning. For example, We could also look at more barebones base images like As I probably don't know enough about the motivations for making the image smaller, I'm not sure what the best advice between minimalism and functionality is here, but there are some thoughts. Hope that is helpful. |
As I probably don't know enough about the motivations for making the
image smaller
When using Dask on distributed clusters the size of the image affects
startup time (we have to move the image to the node that is about to run a
dask worker). Small image sizes make things feel more responsive.
…On Tue, Apr 30, 2019 at 3:11 PM jakirkham ***@***.***> wrote:
Thanks for working on this Jim. Gave this a somewhat detailed review. Not
sure if that is what you were looking for.
It was useful to see that things like .js.map files are pain points,
which I wasn't aware of. Would be great if we can raise some of these to
conda-forge so we can address them.
As to Ben's point, we could tini instead of dumb-init. It is used by both
conda-forge and Jupyter's docker-stacks. The binary is a little more than
half the size of dumb-init.
There are probably other files we could strip if we want to get more
aggressive about cleaning. For example, pkgconfig and cmake directories,
header files, pip cache (if used), etc.
We could also look at more barebones base images like busybox if the
image size itself is too large.
As I probably don't know enough about the motivations for making the image
smaller, I'm not sure what the best advice between minimalism and
functionality is here, but there are some thoughts. Hope that is helpful.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#47 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTC2MQCLGXUOFCCK6YTPTCRYNANCNFSM4HJHFYUQ>
.
|
Checking in. What's the status here? Also, @jhamman you may be interested in these changes for Pangeo. |
People here may want to take a look at #49 |
We use |
@jcrist @jakirkham is there anything left to do here? Should this be merged? If so, would one of you mind merging? |
@jcrist, please feel free to merge when you are ready. |
I'm working on this today, will push updates later. |
This does a few cleanups after a conda install to reduce image size. In order of importance: - Remove `*.pyc` files - Remove static libraries - Remove package cache - Remove `*.js.map` files (more important with jupyter-lab extensions) Overall this drops the image size from `834 MB` to `616 MB`.
- Use --freeze-installed to not update base images - Use tini in dask base image instead of dumb-init - Reduce image size of notebook image as well.
184ca00
to
94288a0
Compare
This may be ready for merge. A few additional changes:
This got things down to:
|
The notebook image could be made smaller further by the following optional changes:
The last 2 may still be wanted features, but I can't think of a good reason to have vim installed so we may want to still handle that here. |
I went ahead and added a commit removing vim (can revert if needed). Down to 1.3 GB now for the notebook image. I think this PR is done for now, looking for review. |
@jakirkham - if I was to create a general issue for splitting out static libraries in conda-forge, where would I put that? |
It can always be installed later when debugging via conda. |
- Remove unminified bokeh js - Cleanup jupyterlab staging files
Couldn't resist, got things even smaller (lots smaller for the notebook image):
With these changes we're down to:
I'm happy with this for now. |
I have verified that things still work fine with the docker-compose setup. Planning on merging tomorrow if no more comments. |
Woot. |
The webpage repo is the best place for this sort of thing. |
Since people are already using these images, I'm not sure if we can do significant changes. I played around with using alpine linux last night, as well as not depending on the Jupyter docker stacks. These images are around 1/2 - 1/3 the size of their counterparts here, and should be drop in replacements. Repo is here: https://github.com/jcrist/alpine-dask-docker |
Wow. I'm surprised that you were able to get things to be that small.
…On Wed, May 29, 2019 at 12:03 PM Jim Crist ***@***.***> wrote:
Since people are already using these images, I'm not sure if we can do
significant changes. I played around with using alpine linux last night, as
well as not depending on the Jupyter docker stacks. These images are around
1/2 - 1/3 the size of their counterparts here, and should be drop in
replacements. Repo is here: https://github.com/jcrist/alpine-dask-docker
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#47?email_source=notifications&email_token=AACKZTENNSZDZCO5YD62M5DPX2ZPBA5CNFSM4HJHFYU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWP7YYQ#issuecomment-497024098>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACKZTDLBC7NAWRF5GAKUNTPX2ZPBANCNFSM4HJHFYUQ>
.
|
This does a few cleanups after a conda install to reduce image size. In order of importance:
*.pyc
files*.js.map
files (more important with jupyter-lab extensions)Overall this drops the image size from
834 MB
to616 MB
.So far I've only applied this to the base image, I'd expect the savings to be higher on the notebook image.