-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Alpine-based image with a Debian-based image #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't had a chance to test this, but reading through your changes this looks like a great direction. 💯
It looks like the final, compressed image size is 5MB smaller than the current image from master! Nice!
It might also be informative to build this image locally and look at its docker image history
which will show the sizes of all the intermediate layers. That can be useful for identifying large layers for further inspection.
I was slightly worried that not having a Python 2 install would break some repos/uses of the image which expected python2
to exist, but I don't actually find any of those when searching the github.com/nextstrain repos.
@@ -143,6 +97,13 @@ RUN pip3 install --requirement=/nextstrain/fauna/requirements.txt | |||
# accessible and importable. | |||
RUN pip3 install --editable /nextstrain/augur | |||
|
|||
# Install additional "full" dependencies for augur. | |||
# TODO: these versions should be updated in augur's setup.py to work with the | |||
# pip install nextstrain-augur[full] approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for updating setup.py to declare these extras and then using that extra tag here.
@huddlej I really want to merge this and leave Alpine behind, but I'm hesitant because it seems like a large underlying change that could break existing |
@tsibley I feel exactly the same way! I would prefer for this image to get more rigorous testing before we merge. For local builds, it should be as easy as this, right?
Maybe @lmoncla would be able to try it out since she just got ncov builds up and running. I'll ping her about this. |
That An important caveat is that builds run via |
I've just tested this PR's Docker Image with the the current Please let me know if you have other things you need to test. Thanks |
Thank you for testing this out, @jstoja! Would you be willing to try the same type of test with our mumps build? We think this mumps build will not work with the image defined in this PR, but it would be helpful to get confirmation about how this build is failing (assuming it does so with the example data). |
No problem, I'll test this one too. Small additional comment, I can see that in the |
Right, the CI tests in nextstrain/cli run in Ubuntu (e.g. the commands inside .travis.yml), but when the example build using Docker is run, it'll run inside the container, which would be Debian with this PR. I don't think this is an issue for testing, and generally we do expect that the container OS won't match the host OS. Are there particular future issues you think might crop up? |
The only issues that I might see is the dynamic libraries being different, it might cause issues. Albeit this should be VERY rare, so probably not a huge issue. |
So I've been testing this a bit more and what fails currently is an I'll try to set this up and continue the testing. |
Rebased on top of the latest @huddlej I'd still love to pick a week to make this the default image (by merging it) and be ready to quickly turnaround hot fixes if/when issues arise. We could switch the ncov build to it earlier than that, I suppose, to try to shake out any issues before pushing it out to external users. |
Sounds great, @tsibley! Any time Wednesday-Friday of next week should work. |
6850d98
to
71fc8bd
Compare
This helps us test the new image internally before making it the default.¹ ¹ nextstrain/docker-base#21
This week was unexpectedly scattered for me due to childcare, so let's plan on the coming week? My thought is to:
and then watch the regular builds next week on
If we don't see any problems, then
It would also still be good to follow up with: Lines 96 to 101 in 71fc8bd
The current state of Augur has slightly different versions: I don't think I have enough recent context here to know if bumping the Augur deps and cutting a release is all we're talking about, or if there's more to it than that. Can you weigh in? |
Alpine Linux uses musl libc instead of glibc which means it cannot use most pre-built Python wheels for glibc-based Linux distributions [1]. As a result, Python packages need to be built from source, leading to longer build times and increased complexity in the Dockerfile for more complex packages. For complex packages like opencv-python, building from source is especially non-trivial. This commit ports the existing Alpine-based Dockerfile to a minimal Debian-based file with pre-installed Python 3 (the official Python Docker image) and removes dependencies that were originally required to build Python packages from sources (e.g., cvxopt) where installation from a wheel is now possible. This commit also removes all explicit references to Python 2 dependencies, as fauna is compatible enough with Python 3 for downloading to work and sacra was previously removed. [1] docker-library/docs#904
(Rebased onto latest master to fix merge conflict before merge.) |
Thank you for pushing this forward with a great plan, @tsibley! The only notable version difference in this dependencies is the cvxopt version. I've tested cvxopt with versions 1.2.X and actually use versions 1.X in the Augur bioconda recipe. I can update the Augur setup.py to reflect this. Is there a Docker-specific reason why we don't just run |
Excellent!
With the update to Augur's That original reason Augur's deps were explicitly denormalized into the Dockerfile here was to avoid lengthy image rebuilds. That was done because several of the deps didn't have wheels for Alpine, so had to be compiled from source. With the new image base, we should be able to rely on wheels, so installing the deps every time won't be much of a burden and worth the simplicity of having a single source of truth. |
Perfect! It's all coming full-circle in my mind. :) |
Making the switch to |
# Install Node deps, build Auspice, and link it into the global search path. A | ||
# fresh install is only ~40 seconds, so we're not worrying about caching these | ||
# as we did the Python deps. Building auspice means we can run it without | ||
# hot-reloading, which is time-consuming and generally unnecessary in the | ||
# container image. Linking is equivalent to an editable Python install and | ||
# used for the same reasons described above. | ||
RUN cd /nextstrain/auspice && npm install && npm run build && npm link | ||
|
||
RUN cd /nextstrain/auspice && npm update && npm install && npm run build && npm link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsibley I'm sorry I don't remember. Since I know very little about the npm ecosystem, it seems likely that I saw this pattern somewhere during work on this PR and copied it while trying to debug build issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
This was introduced incidentally, and seemingly unintentionally¹, in "Replace Alpine-based image with a Debian-based image" (ce07dad), but no one noticed. `npm update` is unwarranted because it's intended for maintainers of a package, not downstream user installs, which is the role we have here. Its effect was to bump the minimum versions in Auspice's package.json to the latest available (while still respecting SemVer constraints) and then install all of Auspice's deps into node_modules/.² That's unnecessary because we then run `npm install`, which unlike `npm update`, also runs pre/post-installation steps Auspice includes. ¹ <#21 (comment)> ² Notably, `npm update` was added when the image had npm v5, which updates both package.json and package-lock.json. We currently have npm v6, which does the same. Subsequent versions, e.g. npm v8, stopped updating package.json and only update package-lock.json.
This was introduced incidentally, and seemingly unintentionally¹, in "Replace Alpine-based image with a Debian-based image" (ce07dad), but no one noticed. `npm update` is unwarranted because it's intended for maintainers of a package, not downstream user installs, which is the role we have here. Its effect was to bump the minimum versions in Auspice's package.json to the latest available (while still respecting SemVer constraints) and then install all of Auspice's deps into node_modules/.² That's unnecessary because we then run `npm install`, which unlike `npm update`, also runs pre/post-installation steps Auspice includes. ¹ <#21 (comment)> ² Notably, `npm update` was added when the image had npm v5, which updates both package.json and package-lock.json. We currently have npm v6, which does the same. Subsequent versions, e.g. npm v8, stopped updating package.json and only update package-lock.json.
Alpine Linux uses musl libc instead of glibc which means it cannot use most
pre-built Python wheels for glibc-based Linux distributions [1]. As a result, Python
packages need to be built from source, leading to longer build times and
increased complexity in the Dockerfile for more complex packages. For complex
packages like opencv-python, building from source is especially non-trivial.
This commit ports the existing Alpine-based Dockerfile to a minimal Debian-based
file with pre-installed Python 3 (the official Python Docker image) and removes
dependencies that were originally required to build Python packages from
sources (e.g., cvxopt) where installation from a wheel is now possible.
This commit also removes all explicit references to Python 2 dependencies, as
fauna is compatible enough with Python 3 for downloading to work and sacra is
not actively under development.
This new image has been tested with the zika-tutorial, zika, and seasonal-flu
repositories.
[1] docker-library/docs#904