Skip to content
This repository has been archived by the owner on Oct 18, 2022. It is now read-only.

Docker images contain 20MB of deleted /var/lib/apt/lists/ files #90

Closed
edmorley opened this issue Jun 15, 2017 · 26 comments
Closed

Docker images contain 20MB of deleted /var/lib/apt/lists/ files #90

edmorley opened this issue Jun 15, 2017 · 26 comments

Comments

@edmorley
Copy link

edmorley commented Jun 15, 2017

Hi! Before I dive in, I just want to say thank you for maintaining these images :-)

So I happened to notice this section present in the Dockerfiles (generated here):

# delete all the apt list files since they're big and get stale quickly
RUN rm -rf /var/lib/apt/lists/*
# this forces "apt-get update" in dependent images, which is also good

I agree it's a good idea to remove these files to force later "apt-get update", however the comment about saving space is not correct, since deleting files in a layer after they've already been added won't free up the space. The comment seems to have been copy-pasted from this script (which isn't run across multiple layers so actually does save space).

Rather than just correcting the comment, it would be best to avoid the 20MB wasted space in the first place.

The files in /var/lib/apt/lists/ come from the base image archive from Canonical, which is directly extracted using the ADD command's tar file support. This cannot be switched to the curl/untar/delete pattern used in downstream images, since until the base archive is extracted there are no binaries in the image to use. As such, the removal of /var/lib/apt/lists/ needs to occur prior to the Docker build process.

This example shows the Ubuntu 16.04 image being reduced from 118MB to 97.6MB by doing exactly that...

#!/bin/bash

# Fetch base archive and Dockerfile used for the existing Ubuntu 16.04 image
curl -fLO https://partner-images.canonical.com/core/xenial/current/ubuntu-xenial-core-cloudimg-amd64-root.tar.gz
curl -fLO https://raw.githubusercontent.com/tianon/docker-brew-ubuntu-core/dist-amd64/xenial/Dockerfile

# Prepare a slimmed down version
gzip -dc ubuntu-xenial-core-cloudimg-amd64-root.tar.gz | tar --delete --wildcards 'var/lib/apt/lists/*' | gzip > rootfs-minimised.tar.gz
sed 's/ubuntu-xenial-core-cloudimg-amd64-root\.tar\.gz/rootfs-minimised\.tar\.gz/' Dockerfile > Dockerfile-new

# Compare the before/after
docker build -t ubuntu-16.04-test:before .
docker build -t ubuntu-16.04-test:after -f Dockerfile-new .
docker images ubuntu-16.04-test

Output:

REPOSITORY          TAG                 IMAGE ID            CREATED                  SIZE
ubuntu-16.04-test   after               7b258205a6b1        Less than a second ago   97.6MB
ubuntu-16.04-test   before              65cb86c05710        13 seconds ago           118MB

I guess the question will be whether to store both the original base archive and the processed one in this repo (so people can still use the hashes and compare), or whether to just store the processed one.

Also, I think it's worth pushing the upstream maintainers of these base images to remove the APT lists from them, which will avoid all of this busywork. Perhaps this size-reduction use-case is a more compelling one for them than that outlined here:
https://bugs.launchpad.net/cloud-images/+bug/1685399

Many thanks!

@tianon
Copy link
Owner

tianon commented Jun 15, 2017 via email

@edmorley
Copy link
Author

edmorley commented Jun 16, 2017

Thank you for the reply!

Unfortunately, repacking the tarballs is out of the question

Out of curiosity, could you elaborate? Is it for reproducibility?

the best we can do here is fix the comment and file an issue upstream (perhaps linking to the upstream issue in our comment?)

Sounds good on both counts! Would you mind filing upstream? (I'd have to create a launchpad account just for this)

If this does get fixed upstream, we can leave our "RUN" line in place as a fallback,
... (especially since the upstream tarballs have flip-flopped on this a few times in the past, IIRC)

I think it would be good to make upstream regressions for this more obvious. So perhaps either:

  • using rm -rfv /var/lib/apt/lists/* (ie with the verbose flag, which will make it clear from the logs whether anything was deleted)
  • replacing the rm entirely with an assertion that the directory is empty. (In this case, if upstream did regress, the workflow would be to file an upstream issue, replace the assertion with the rm again and wait for upstream to fix before reverting back.)

@tianon
Copy link
Owner

tianon commented Jun 22, 2017

Out of curiosity, could you elaborate? Is it for reproducibility?

Partially for reproducibility, but more because Canonical has asked to be the "source of truth" for the bits that become "Ubuntu" on the Docker Hub (I'm merely a proxy for their builds).

Sounds good on both counts! Would you mind filing upstream? (I'd have to create a launchpad account just for this)

Yeah, I'll go ahead and file a launchpad issue -- having this directory be explicitly empty by design in the tarballs they provide isn't something they'd ever committed to doing, so I didn't want to assume that it was a "bug" (I'm not sure 100% whether we're the only consumers of these particular tarballs), but if we file an issue and they're warm to the idea (and modify their tarballs), then I'd be more amenable to making it an assertion of emptiness instead. 👍

@tianon
Copy link
Owner

tianon commented Jun 22, 2017

@edmorley
Copy link
Author

Thank you for filing that. Since there has been no response to the ticket, I was going to register for a launchpad account and leave another comment explaining further, however my attempts to log in using a newly created Ubuntu One account failed with "Oops! Sorry, something just went wrong in Launchpad." :-(

@edmorley
Copy link
Author

edmorley commented Aug 10, 2017

Is there anyone we can ping to get some movement on that launchpad ticket?

@Foorack
Copy link

Foorack commented Aug 18, 2017

@edmorley I was able to login to Launchpad with my Ubuntu One account today. I recommend trying again and see if it works now. 👍

@edmorley
Copy link
Author

Ah I forgot to update the above - someone kindly helped fixed the issue with my account since then :-)

@kulprasanna

This comment has been minimized.

@tianon

This comment has been minimized.

@edmorley
Copy link
Author

Does anyone have any contacts at Ubuntu who might be able to help get some traction in the launchpad issue?
https://bugs.launchpad.net/cloud-images/+bug/1699913

@pranas
Copy link

pranas commented Jan 3, 2018

Looking forward to Ubuntu taking this stuff out of their tarball. Until that happens, a new --squash docker build option would be a good workaround to get rid of it in the meantime. Or the multi stage build..

@mickare
Copy link

mickare commented Feb 27, 2018

I support @pranas suggestion. A simple multi stage build does save ~26MB.
No "repacking" needed and removed artifacts are "cleared".

Btw. I don't see any drawbacks of a single-layer base image. 🤔

@edmorley
Copy link
Author

Does anyone have any contacts at Ubuntu who might be able to help get some traction in the launchpad issue?
https://bugs.launchpad.net/cloud-images/+bug/1699913

Still no reply in that launchpad issue after 10 months sadly.

Does anyone have any idea who else we can CC to it? It's not clear where the code for generating the cloud images lives, so I can't even fix it myself 😞

@edmorley
Copy link
Author

edmorley commented Jun 7, 2018

Success! 🎉
https://bazaar.launchpad.net/~ubuntu-core-dev/livecd-rootfs/trunk/revision/1675

@tianon
Copy link
Owner

tianon commented Jul 26, 2018

Indeed, this is fixed. 👍

@tianon tianon closed this as completed Jul 26, 2018
@mwhudson
Copy link
Collaborator

Only for cosmic+ though! I should see about backporting some of the recent changes to at least bionic

@ondrajz
Copy link

ondrajz commented Sep 4, 2018

@mwhudson Using the workaround with multi stage seems pretty good for the non-cosmic images.

@misterkramer

This comment has been minimized.

@tianon
Copy link
Owner

tianon commented Jun 3, 2019

We have no plans to move Ubuntu to use a multi-stage build because they don't preserve permissions enough to be usable for a full distro rootfs.

@misterkramer

This comment has been minimized.

@tianon

This comment has been minimized.

@misterkramer

This comment has been minimized.

@tianon
Copy link
Owner

tianon commented Jun 3, 2019

Please see the background for this in https://bugs.launchpad.net/cloud-images/+bug/1699913. I can assure you that Docker is very efficient at managing empty layers -- there is no extra transport cost for this line for versions of the tarball which do not have any contents there.

In fact, I do intend to instead adjust that line at some point to verify that the directory is in fact empty instead of forcing it to be so.

If you need further help understanding the justification, plese refer to that Launchpad issue and the Docker Community Forums, the Docker Community Slack, or Stack Overflow (as the issues here are not intended as a support forum).

@misterkramer

This comment has been minimized.

@tianon

This comment has been minimized.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants