Skip to content
This repository has been archived by the owner on May 4, 2020. It is now read-only.

Use multistage Docker build for worker base docker #43

Open
Panaetius opened this issue Aug 22, 2018 · 3 comments
Open

Use multistage Docker build for worker base docker #43

Panaetius opened this issue Aug 22, 2018 · 3 comments

Comments

@Panaetius
Copy link
Member

Currently the worker base image is around 3GB in size, meaning it takes quite long to install everything (longer than the 5 min default timeout of helm).

All the temporary files and build tools etc. are inside the docker, even though they are not needed later on.

We should do multistage builds, only copying what is needed from one stage to the next.

That should help reduce the size by a lot.

@liehe
Copy link
Member

liehe commented Aug 23, 2018

I run du -h -d 1 / in a container of image mlbench/mlbench_worker:mlbench-worker-base. The results are

32K     ./run
4.0K    ./opt
4.0K    ./tmp
50M     ./var
0       ./dev
0       ./sys
4.0K    ./boot
25M     ./lib
4.0K    ./home
du: cannot access './proc/23/task/23/fd/3': No such file or directory
du: cannot access './proc/23/task/23/fdinfo/3': No such file or directory
du: cannot access './proc/23/fd/4': No such file or directory
du: cannot access './proc/23/fdinfo/4': No such file or directory
12K     ./proc
4.0K    ./mnt
17M     ./root
3.5M    ./sbin
4.0K    ./media
2.3M    ./etc
4.0K    ./lib64
4.0K    ./srv
2.9G    ./usr
7.3M    ./bin
44K     ./.sshd
4.0K    ./app
14M     ./.openmpi
3.6G    ./conda
4.7M    ./vision
8.0K    ./ssh-key
6.5G    .

Most of the spaces are consumed by /usr and /conda directory. /usr is large because nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 is used as base image of mlbench/mlbench_worker:mlbench-worker-base. See

28K     /run
4.0K    /opt
4.0K    /tmp
8.6M    /var
0       /dev
0       /sys
4.0K    /boot
24M     /lib
4.0K    /home
du: cannot access '/proc/15/task/15/fd/3': No such file or directory
du: cannot access '/proc/15/task/15/fdinfo/3': No such file or directory
du: cannot access '/proc/15/fd/4': No such file or directory
du: cannot access '/proc/15/fdinfo/4': No such file or directory
12K     /proc
4.0K    /mnt
12K     /root
3.5M    /sbin
4.0K    /media
1.8M    /etc
4.0K    /lib64
4.0K    /srv
2.5G    /usr
7.3M    /bin
2.6G    /

This part is necessary in order to use nvidia's driver. As for the /conda, we have already used conda clean --all. So we cannot minimize it a lot if we want to keep large packages in the image, like PyTorch, torchvision, etc.

So we may not drastically reduce the size using multistage in this sense. Instead, it is even slower to build because we need to copy large directories.

@Panaetius
Copy link
Member Author

I'm not sure we need to use conda, we could use pip instead and only copy the python site-packages folder to a new docker stage. That might already cut down on the size, since the conda lib folder is 1.1G, with many libraries we don't need, like Qt. This might also be due to us installing opencv, which has dependencies on ffmpeg (video encoding) and Qt (GUI Library), both of which we don't use. I'm not sure what we need opencv for, but removing it would already reduce size by 700Mb.

We can also remove all the dev packages installed in the beginning, since we don't need gcc and so on later on.

I don't think it's much slower. It's the base image, so we don't need to build it every 5 minutes, and at most, on a non-SSD, it'll add 2 minutes for copying 6Gb, 15 seconds on a modern SSD. But it makes cleaning up a lot easier than having to track everything ourselves.

I think realistically, we can reduce the size by 1-2 Gb

@liehe liehe self-assigned this Aug 24, 2018
@Panaetius Panaetius removed this from the v0.1.0 milestone Aug 24, 2018
@liehe liehe removed their assignment Aug 24, 2018
@tlin-taolin
Copy link
Collaborator

tlin-taolin commented Aug 27, 2018 via email

@Panaetius Panaetius removed the Worker label Oct 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants