Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IA-4839] [DO NOT MERGE] Terra on Azure (ToA) base jupyter docker image #483

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

LizBaldo
Copy link
Collaborator

@LizBaldo LizBaldo commented Apr 11, 2024

JIRA Ticket: https://broadworkbench.atlassian.net/browse/IA-4839

Our current offering of Jupyter Notebook on Terra on Azure is currently done through the Azure DSVM (Data Science VM). This PR is proposing a base docker image (similar to the one provided on Google) as an alternative to the DSVM to provide the following improvements:

  • Make the image GPU-enabled, while staying as lean as possible (it is currently at ~3GB uncompressed) so it can be pulled and deployed without needing any form of caching
  • Isolate the user python environment from the jupyter server python environment
  • Package the Jupyter Terra offering into an isolated container that can be isolated from the other containers (Welder, Proxy, etc.) so we can safely run it as root and let user perform any operation they might need

Note: Even though this container is meant to work for both Azure and Google, the focus is to make it work on Azure first. This means that it might not work immediately out of the box as a custom image on Google. I am working on deploying this container via Terra next, and might still iterate on the dockerfile if (more like when) I find bugs.

Comment on lines +81 to +89
docker run \
--env GOOGLE_PROJECT \
--volume "${{ steps.auth.outputs.credentials_file_path }}":/tmp/credentials.json:ro \
--env GOOGLE_APPLICATION_CREDENTIALS="/tmp/credentials.json" \
--volume $GITHUB_WORKSPACE/terra-base-jupyter/tests:/tests \
--workdir=/tests \
--entrypoint="" \
terra-base-jupyter:smoke-test \
/bin/sh -c "pip3 install pytest; pytest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to either drop this into a scripts/run tests (or something like this) and/or in combination use docker compose to set this up for the next person working on this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I would love to brainstorm that with you because that might slightly fall outside of this PR scope. But the question here is how can we easily test these docker images? I am not convinced that the current GHAs that we have is the way to go

terra-base-jupyter/custom/test/test-requirements.txt Outdated Show resolved Hide resolved
# `brew install eslint`
# `brew install npm`
# `npm i`
# `eslint --fix *.js`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to either:

  • tie this in to a pre-commit local hook?, or
  • make the build (GHA) fail if these are not run?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely yes. These are very outdated scripts that I left in there for now, but i am not sure if we even need it. They might be gone in a sprint or two once I am done setting this up on a BEE 🤞

terra-base-jupyter/README.md Outdated Show resolved Hide resolved
terra-base-jupyter/README.md Show resolved Hide resolved
terra-base-jupyter/README.md Show resolved Hide resolved
Copy link
Contributor

@lucymcnatt lucymcnatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks sound and well-commented! Just left a few questions

terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Outdated Show resolved Hide resolved
terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Outdated Show resolved Hide resolved
terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Outdated Show resolved Hide resolved
Copy link

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the heroic work in putting all these together @LizBaldo !

I am no expert in the Terra IA system, so only focused on the dockerfile itself.

The biggest concern I have, which I don't have a solution for, is python 3.10 is hardcoded in many places, which might make maintenance a little difficult down the road.

terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Show resolved Hide resolved
terra-base-jupyter/Dockerfile Outdated Show resolved Hide resolved
terra-base-jupyter/Dockerfile Outdated Show resolved Hide resolved
terra-base-jupyter/README.md Outdated Show resolved Hide resolved
terra-base-jupyter/conda-environment.yml Show resolved Hide resolved
## create new conda environments on top of it. The important part is that jupyter IS NOT installed
## in the base environment to provide isolation between the user environment, and the jupyter server
## to avoid cross-contamination
COPY conda-environment.yml .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly the goal here, maybe using conda-pack can help further reduce the image size.
When we build dockers that uses conda to manage envs, we use conda-pack combined with multi-stage builds to reduce image sizes.
Example here:
https://github.com/broadinstitute/long-read-pipelines/blob/4b50b3857d33fd195461e5eb5c8a83d7fe6dda27/docker/lr-papermill-base/Dockerfile#L8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to this, it'd be good to have an understanding how large each layer is, before optimizing the sizes further.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good call out. I will add a detail of how big each layer is in the Readme. Regarding conda-pack, I definitely get why you are using it, but since the plan is to have the majority of Terra user use this base image, I was thinking about setting up the base conda environment directly.
The use case of building a custom image on top of it is a bit of an edge case, and so the base image might not be perfectly curated for your need. Would this be a problem?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would not be a problem, just an optimization (which could be evil as it might be pre-mature). OTH, our experience with conda-pack is that it reduces the image size further.
This is where we borrowed the lesson. https://pythonspeed.com/articles/conda-docker-image-size/

I can give it a try once the other components are steady, actually since I'm the one mostly interested in it.

@LizBaldo LizBaldo changed the title [IA-4839] Terra on Azure (ToA) base jupyter docker image [IA-4839] [DO NOT MERGE] Terra on Azure (ToA) base jupyter docker image May 3, 2024
ENV PATH "${PATH}:${JUPYTER_HOME}/bin"

# Install Poetry, set up the virtual environment for jupyter to run and then cleanup / uninstall poetry
RUN curl -sSL https://install.python-poetry.org | POETRY_HOME=$POETRY_HOME $BASE_PYTHON_PATH \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I was thinking about is maybe not using the user python to install the jupyter server, but instead use the system one to provide another layer of isolation


# Set the python version and corresponding conda installer
ENV PYTHON_VERSION 3.10
ENV CONDA_INSTALLER https://repo.anaconda.com/miniconda/Miniconda3-py310_23.5.1-0-Linux-x86_64.sh
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants