-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
images: some refactoring and version bumps for arm64 compatible Dockerfiles #423
images: some refactoring and version bumps for arm64 compatible Dockerfiles #423
Conversation
@@ -1,18 +1,19 @@ | |||
FROM python:3.8.6-slim-buster as dependencies | |||
FROM python:3.8.11-slim-buster as dependencies |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General maintenance keeping the base Python version updated
LABEL MAINTAINER="Jim Crist-Harif" | ||
|
||
RUN apt-get update \ | ||
&& apt-get install -y tini \ | ||
&& apt-get clean \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apt-get clean is automatically run in this and most images via a config (/etc/apt/apt.conf.d/docker-clean).
aiohttp==3.7.2 \ | ||
aiohttp==3.7.4 \ | ||
colorlog \ | ||
cryptography \ | ||
traitlets==4.3.3 \ | ||
traitlets==5.0.5 \ | ||
pyyaml \ | ||
kubernetes-asyncio==11.0.0 | ||
kubernetes-asyncio==12.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Systematic version updates as I had to update aiohttp in order to have a arm64 (aka. aarch64) compatible conda package in the dask-gateway image.
@@ -2,27 +2,34 @@ | |||
FROM debian:buster-slim as miniconda | |||
LABEL MAINTAINER="Jim Crist-Harif" | |||
|
|||
ARG CONDA_VERSION=py38_4.8.3 | |||
ARG CONDA_SHA256=879457af6a0bf5b34b48c12de31d4df0ee2f06a8e68768e5758c3293b2daf688 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removal of checksums as amd64 / arm64 will have different checksums and it is a bit tedious to keep track of both.
# - Create user dask | ||
RUN useradd -m -U -u 1000 dask | ||
|
||
# - Install tini | ||
# - Install miniconda build dependencies | ||
# - Download miniconda and check the sha256 checksum | ||
# - Install miniconda | ||
RUN apt-get update \ | ||
&& apt-get install -y tini wget bzip2 \ | ||
&& rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split this apart from a section below making so many different things in the same RUN layer that it became a bit hard to read.
@@ -32,39 +39,40 @@ RUN useradd -m -U -u 1000 dask \ | |||
&& echo "aggressive_update_packages: []" >> /home/dask/.condarc \ | |||
&& find /opt/conda/ -follow -type f -name '*.a' -delete \ | |||
&& /opt/conda/bin/conda clean -afy \ | |||
&& chown -R dask:dask /opt/conda \ | |||
&& apt-get autoremove --purge -y wget bzip2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By not removing wget / bzip2 the image will not grow further because adding + removing won't help if its not done in the same layer, and since it is no longer in the same layer I don't remove these binaries later.
The reason for having one more RUN statements was for readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a rough idea of how much larger the image is now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My rough idea is that it is minimal (<1MB) and not worth worrying about. I did not confirm this thoroughly in an e2e build comparison though.
$ apt-cache --no-all-versions show wget | grep '^Size: '
Size: 348824
$ apt-cache --no-all-versions show bzip2 | grep '^Size: '
Size: 34064
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so it seems like 992 kB + 195 kB.
$ apt-cache --no-all-versions show wget | grep Installed-Size
Installed-Size: 992
$ apt-cache --no-all-versions show bzip2 | grep Installed-Size
Installed-Size: 195
ARG DASK_VERSION=2021.7.1 | ||
ARG DISTRIBUTED_VERSION=2021.7.1 | ||
ARG DASK_VERSION=2021.8.1 | ||
ARG DISTRIBUTED_VERSION=2021.8.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just bumped these while I was at it so all the conda/pip packages installed were updated across now that I had to update at least aiohttp.
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability as separation between the build steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally looks good to me!
We recently added Arm support to the Dask docker images in dask/dask-docker#166.
I wonder if there are any learnings we can take from that PR, or anything we can reuse to ensure a bit more consistency between projects.
Ah one key thing to keep consistent is a transition to mamba and mambaforge to install it. I'd be happy to help with that as well but it felt like out of scope for this PR to do that also. Oh @holdenk and I worked on these kinds of things in jupyter/docker-stacks also btw! :) 👋 Happy to see you contribute here as well @holdenk!!! Want to help by contributing a review of this PR also? |
I can address the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, one question about image sizes inline.
I'd love it if we could come up with a policy about what library versions to use in the Dockerfiles, to avoid having repeated conversations about versions. Is there any reason not to use the latest version of all libraries (dask, distributed, aiohttp, etc) at the time of release?
One maybe special consideration is Python itself. Right now we're on 3.8.6 (bumping to 3.8.11). Is there any reason to keep that on 3.8? Or should we move to 3.9?
@@ -32,39 +39,40 @@ RUN useradd -m -U -u 1000 dask \ | |||
&& echo "aggressive_update_packages: []" >> /home/dask/.condarc \ | |||
&& find /opt/conda/ -follow -type f -name '*.a' -delete \ | |||
&& /opt/conda/bin/conda clean -afy \ | |||
&& chown -R dask:dask /opt/conda \ | |||
&& apt-get autoremove --purge -y wget bzip2 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a rough idea of how much larger the image is now?
UPDATE: @TomAugspurger I created #437 regarding this question. Let's move this discussion to there. I think a good policy is to use a requirements.in file that is automatically refrozen to requirements.txt by dependabot, and then we install such requirements.txt in the dockerfile. That helps users get info about what version has been used in each release via git history, we avoid a maintenance burden, and we get security warnings via automation etc. We have a mix of conda/pip in these files though, making my idea a bit troublesome. I'd consider going 100% pip in these images or 100% conda and hope there is a good tool like there is for pip that dependabot can use for an environment.yml file |
Go for merge on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @consideRatio!
Thanks @jacobtomlinson @TomAugspurger @jcrist for your review efforts! |
This is a step along the way for #421 about compatibility on arm64 machines like Raspberry Pi.