Avoid code duplication in Dockerfiles for Jupyter notebook images #321

jlewi · 2018-03-01T19:15:32Z

We build two versions of the Jupyter notebook images; one for CPU and one for GPU.

There's a lot of code duplication (e.g. list of python packages to install).

To avoid this duplication we should create a common shell script that can be invoked in the Dockerfile to perform the steps that should be the same.

This will help ensure there's less divergence over time.

jlewi · 2018-03-03T21:20:06Z

/cc @ojarjur

This change moves almost all of the commands run during the building of our CPU and GPU Docker images into a single, shared script called `install.sh`. There are currently only two differences between the two image types (those being the base image and the version of the tensorflow build to install). The rest of the contents of the Dockerfiles is meant to be identical. However, maintaining two nearly-identical sets of instructions is not practical. It requires that each change be duplicated in both files, and we've seen that, in practice, this doesn't happen reliably and the files drift over time. By placing the bulk of the common code into a shared file, we can reduce the risk of that happening in the future, while also reducing the amount of work developers need to do when making a change. However, this change does have one downside; since all of the install steps happen in a single Docker `RUN` command, modifying any of them will result in the entire image having to be rebuilt. This fixes kubeflow#321

This change eliminates the duplication between the two Dockerfiles (one for CPU-only instances and one for GPU-enabled instances) by eliminating the second file. The two files were meant to differ in two ways: 1. The base image used (ubuntu vs. nvidia/cuda) 2. The tensorflow package installed (tf-nightly vs. tf-nightly-gpu) It turns out that we can just make both of those arguments (defined by an `ARG` statement with a default value), and then the two files can be merged into one. The default values chosen for the args are the ones for the CPU image, so running `docker build .` without any `--build-arg` flags will produce the CPU image. To build a GPU image, pass in the corresponding values for the `BASE_IMAGE` and `TF_PACKAGE` arguments using `--build-arg` flags. This fixes kubeflow#321

* Consolidate down to a single Dockerfile This change eliminates the duplication between the two Dockerfiles (one for CPU-only instances and one for GPU-enabled instances) by eliminating the second file. The two files were meant to differ in two ways: 1. The base image used (ubuntu vs. nvidia/cuda) 2. The tensorflow package installed (tf-nightly vs. tf-nightly-gpu) It turns out that we can just make both of those arguments (defined by an `ARG` statement with a default value), and then the two files can be merged into one. The default values chosen for the args are the ones for the CPU image, so running `docker build .` without any `--build-arg` flags will produce the CPU image. To build a GPU image, pass in the corresponding values for the `BASE_IMAGE` and `TF_PACKAGE` arguments using `--build-arg` flags. This fixes #321 * Move the TF_PACKAGE arg down to right before it is used * Update the release steps to use the consolidated Dockerfile

…low#321) * Add doc for tfjob and pytorch examples in Katib * Add contents * Fix README * Fix link to examples in README * Fix README * Add information about Katib UI and status of StudyJob * Add Ambassador information

jlewi added area/jupyter Issues related to Jupyter good first issue help wanted priority/p2 labels Mar 1, 2018

ojarjur mentioned this issue Mar 5, 2018

Avoid code duplication between the two Dockerfiles #354

Closed

ojarjur mentioned this issue Mar 6, 2018

Consolidate down to a single Dockerfile #366

Merged

k8s-ci-robot closed this as completed in #366 Mar 7, 2018

elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022

add new community meeting hosts (kubeflow#321)

3951ca2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid code duplication in Dockerfiles for Jupyter notebook images #321

Avoid code duplication in Dockerfiles for Jupyter notebook images #321

jlewi commented Mar 1, 2018

jlewi commented Mar 3, 2018

Avoid code duplication in Dockerfiles for Jupyter notebook images #321

Avoid code duplication in Dockerfiles for Jupyter notebook images #321

Comments

jlewi commented Mar 1, 2018

jlewi commented Mar 3, 2018