Idea: multi-stage docker build for a leaner result #64

mcg1969 · 2019-10-08T04:52:01Z

@AlbertDeFusco check this out. I was experimenting with using Docker multi-stage builds to do a slimmed-down docker image.

In short, what you do is you build the Docker image the way that it is currently done. Then you build a second image that copies over nothing but /home/anaconda from the first. So the final image does not have the original Miniconda installation. Multi-stage builds also mean you don't have to be so darn careful about cleaning up after yourself.

The one trick of course is now you don't have anaconda-project run. So what do you replace it with? Well, you grab the command itself from anaconda-project list-commands, and stick that in a launch script along with manual activation of the environment.

When I tried it, this is what I got—and it works! Now, the one wrinkle is that it currently only works with the default command, and it only works if the environment doesn't have post-activate scripts to run. But both of these issues can be addressed.

# The base image is flexible; it simply needs to be able
# to support Anaconda-built glibc binaries.
FROM centos:7

# Miniconda is a minimal Python environment, consisting only of Python
# and the conda package manager. Instead of hosting it in the same directory
# as this Dockerfile, it could be downloaded directly from repo.anaconda.com
# using a curl command in the RUN statement below. The only additional package
# we install in the environment is anaconda-project.
ADD https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh miniconda.sh

ENV LANG=en_US.UTF-8 \
    LC_ALL=en_US.UTF-8
COPY condarc project.tar.gz /src/
RUN yum install -y bzip2 && \
    bash miniconda.sh -b -p /opt/conda && \
    cp /src/condarc /opt/conda/.condarc && \
    source /opt/conda/bin/activate && \
    conda config --set auto_update_conda False --system && \
    conda install anaconda-project --yes && \
    useradd -M anaconda && \
    mkdir /home/anaconda && \
    chown anaconda:anaconda /home/anaconda
USER anaconda
WORKDIR /home/anaconda
RUN source /opt/conda/bin/activate && \
    tar xfz /src/project.tar.gz --strip-components=1 && \
    anaconda-project --verbose prepare && \
    printf '#!/bin/sh\nbin=$(compgen -G envs/*/bin)\nexport PATH=$bin:$PATH\n' > .launch.sh && \
    anaconda-project list-commands | grep -A1 '=' | tail -n 1 | sed 's@^[^ ]*@exec@' >> .launch.sh && \
    chmod +x .launch.sh

# The base image is flexible; it simply needs to be able
# to support Anaconda-built glibc binaries.
FROM centos:7
RUN useradd anaconda
USER anaconda
WORKDIR /home/anaconda
COPY --from=0 /home/anaconda .
CMD ./.launch.sh

The text was updated successfully, but these errors were encountered:

mcg1969 · 2019-10-08T04:52:32Z

Honestly this should go into anaconda-project, not ae5-tools.
cc: @jbednar @jsignell

AlbertDeFusco · 2019-10-08T12:46:06Z

That's cool. Does this work with environment variables set in the anaconda-projec.yml file?

mcg1969 · 2019-10-08T13:28:48Z

Probably not. I suspect that what I really need is some sort of anaconda-project feature to generate the startup commands

jbednar · 2019-10-08T18:29:22Z

Well, you grab the command itself from anaconda-project list-commands

What happens when there are multiple commands? I'd prefer to have anaconda-project available in the final Docker image, so that someone who has the Docker image hasn't lost any functionality compared to the .zip file, only gained it. But sure, it makes sense to eliminate anything else specific only to the build process, not the final result.

I suspect that what I really need is some sort of anaconda-project feature to generate the startup commands

It seems to me that all of the docker-image generation code could be part of anaconda-project rather than ae5-tools; it seems like an alternative way to package up a project, similar to a zip file but with other affordances...

AlbertDeFusco · 2019-10-08T18:32:30Z

I'm on board with developing anaconda-project dock

mcg1969 · 2019-10-08T18:35:55Z

What happens when there are multiple commands?

If you want to be able to use the same Docker container to run all commands, that's a different use case. And it effectively requires installing the full anaconda-project environment inside it.

But I don't think that's what people should do. Rather, I think there should be a separate Docker image for each command. If the Dockerfile is designed correctly, can have these different Docker images share the same environment layer (assuming they use the same environment for each command). Still, a Docker container is supposed to have a somewhat immutable function, which suggests to me that it needs to "focus", if you will, on each command.

It seems to me that all of the docker-image generation code could be part of anaconda-project rather than ae5-tools

Agreed. However, note that as currently constituted, the ae5-tools approach doesn't require anaconda-project to be a dependency of ae5-tools. Rather, it is installed into the docker containers themselves.

jbednar · 2019-10-08T18:43:44Z

I think there should be a separate Docker image for each command.

We're probably imagining different scenarios here. I'm imagining something like the various projects on https://examples.pyviz.org, where each project is designed to be some reproducible content, with some commands allowing the project to be tested, some allowing it to be deployed, and some allowing it to be opened as a notebook for the user to explore. See e.g. https://github.com/pyviz-topics/examples/blob/master/attractors/anaconda-project.yml . I'm imagining someone being able to pass around a Docker image that by default does one thing, but which can also be tested by running other commands. Having separate Docker images doesn't seem like it would work, because part of the point is to test that the (first) Docker image is complete and runnable.

mcg1969 · 2019-10-08T18:47:06Z

That sounds like a reasonable workflow for anaconda-project but for ae5-tools the deployment model is more constrained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: multi-stage docker build for a leaner result #64

Idea: multi-stage docker build for a leaner result #64

mcg1969 commented Oct 8, 2019 •

edited

Loading

mcg1969 commented Oct 8, 2019

AlbertDeFusco commented Oct 8, 2019

mcg1969 commented Oct 8, 2019

jbednar commented Oct 8, 2019

AlbertDeFusco commented Oct 8, 2019

mcg1969 commented Oct 8, 2019

jbednar commented Oct 8, 2019 •

edited

Loading

mcg1969 commented Oct 8, 2019

Idea: multi-stage docker build for a leaner result #64

Idea: multi-stage docker build for a leaner result #64

Comments

mcg1969 commented Oct 8, 2019 • edited Loading

mcg1969 commented Oct 8, 2019

AlbertDeFusco commented Oct 8, 2019

mcg1969 commented Oct 8, 2019

jbednar commented Oct 8, 2019

AlbertDeFusco commented Oct 8, 2019

mcg1969 commented Oct 8, 2019

jbednar commented Oct 8, 2019 • edited Loading

mcg1969 commented Oct 8, 2019

mcg1969 commented Oct 8, 2019 •

edited

Loading

jbednar commented Oct 8, 2019 •

edited

Loading