Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUN true in Containerfile touches mtime of /run, creates layer #5950

Open
allisonkarlitskaya opened this issue Jan 28, 2025 · 2 comments
Open

Comments

@allisonkarlitskaya
Copy link

I found containers/podman#14577 and containers/podman#14582 related issues when searching for duplicates.

tl;dr: Running a read-only command in a RUN unexpectedly modifies the container image and creates a new layer. Even if you add a workaround to prevent the modification, the layer will be created but it will be empty (the layer ID is equal to the SHA256 of an empty .tar file). Each RUN creates an additional layer.

Consider this Containerfile:

FROM fedora

RUN sleep 1
RUN stat /run
RUN sleep 1
RUN stat /run

You can see that each stat command reports a different mtime for /run. I think that's because when we go to mount /run/.containerenv (for the duration of each RUN command) we need to create/delete the temporary file for the bindmount point in /run, which results in the mtime changing.

That also means that these presumably do-nothing commands are creating layers, as can be seen in the output of podman image inspect on the created container: you get 5 distinct layers added. If you look at the first layer .tar (after the fedora base image) you'll see that it modifies /etc/hostname and /etc/resolv.conf along with /run but all of the other layers contain only /run with an updated timestamp, which I consider unexpected and undesirable.

In my particular usecase I'm running RUN rm, so I'm actually trying to modify the container image, but am getting /run changed as a side-effect, which is undesired.

As a workaround for my usecase, something like this does the trick:

FROM fedora

RUN --mount=type=tmpfs,target=/run sleep 1
RUN --mount=type=tmpfs,target=/run stat /run
RUN --mount=type=tmpfs,target=/run sleep 1
RUN --mount=type=tmpfs,target=/run stat /run

and then /run won't get modified.

This does do something else strange, though: each of those RUN commands will create a new layer, but all layers will have the same ID: 5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef. That's the SHA-256 of 1024 nul characters. That's the tar end-of-stream marker, so effectively this is creating 3 empty layers. That should probably also be fixed.

allisonkarlitskaya added a commit to allisonkarlitskaya/composefs-rs that referenced this issue Jan 28, 2025
When we create the final image containing the kernel UKI we need to make
sure that it's exactly equivalent to the original image (so that it gets
the same fs-verity digest).  We do that by removing the only thing we
added: the `/composefs-meta` directory.

The most obvious way to do this would be `RUN rm -rf /composefs-meta`
and that's the first thing I tried, but this creates a `.containerenv`
file in `/run` to use as a mountpoint for the containerenv file (present
for the duration of the `RUN` command), which modifies the timestamp of
`/run` as a side-effect, producing a different image.  I worked around
that before by manually recording a whiteout by copying an empty file to
`/.wh.composefs-meta`.  I was surprised that this worked, but it seemed
to work, so I went with it.

While pairing with Timothée today we discovered that this doesn't work
on his system, probably due to using a different podman storage driver.

Let's take another workaround: we can mount a tmpfs as `/run` for the
duration of the operation in order to protect the underlying filesystem
from being modified.  This is a cleaner approach anyway.

See containers/buildah#5950

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Co-Authored-By: Timothée Ravier <tim@siosm.fr>
@allisonkarlitskaya
Copy link
Author

See #4242 about the mentioned files in /etc. That's definitely a different issue, but it's good to know that it's being tracked as well.

@allisonkarlitskaya
Copy link
Author

I forgot to mention:

lis@x1:~$ podman --version
podman version 5.3.2
lis@x1:~$ buildah --version
buildah version 1.38.1 (image-spec 1.1.0, runtime-spec 1.2.0)
lis@x1:~$ cat ~/.config/containers/storage.conf 
[storage]
driver="btrfs"

allisonkarlitskaya added a commit to allisonkarlitskaya/composefs-rs that referenced this issue Jan 28, 2025
When we create the final image containing the kernel UKI we need to make
sure that it's exactly equivalent to the original image (so that it gets
the same fs-verity digest).  We do that by removing the only thing we
added: the `/composefs-meta` directory.

The most obvious way to do this would be `RUN rm -rf /composefs-meta`
and that's the first thing I tried, but this creates a `.containerenv`
file in `/run` to use as a mountpoint for the containerenv file (present
for the duration of the `RUN` command), which modifies the timestamp of
`/run` as a side-effect, producing a different image.  I worked around
that before by manually recording a whiteout by copying an empty file to
`/.wh.composefs-meta`.  I was surprised that this worked, but it seemed
to work, so I went with it.

While pairing with Timothée today we discovered that this doesn't work
on his system, probably due to using a different podman storage driver.

Let's take another workaround: we can mount a tmpfs as `/run` for the
duration of the operation in order to protect the underlying filesystem
from being modified.  This is a cleaner approach anyway.

See containers/buildah#5950

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Co-Authored-By: Timothée Ravier <tim@siosm.fr>
travier added a commit to containers/composefs-rs that referenced this issue Jan 29, 2025
When we create the final image containing the kernel UKI we need to make
sure that it's exactly equivalent to the original image (so that it gets
the same fs-verity digest).  We do that by removing the only thing we
added: the `/composefs-meta` directory.

The most obvious way to do this would be `RUN rm -rf /composefs-meta`
and that's the first thing I tried, but this creates a `.containerenv`
file in `/run` to use as a mountpoint for the containerenv file (present
for the duration of the `RUN` command), which modifies the timestamp of
`/run` as a side-effect, producing a different image.  I worked around
that before by manually recording a whiteout by copying an empty file to
`/.wh.composefs-meta`.  I was surprised that this worked, but it seemed
to work, so I went with it.

While pairing with Timothée today we discovered that this doesn't work
on his system, probably due to using a different podman storage driver.

Let's take another workaround: we can mount a tmpfs as `/run` for the
duration of the operation in order to protect the underlying filesystem
from being modified.  This is a cleaner approach anyway.

See containers/buildah#5950

Signed-off-by: Allison Karlitskaya <allison.karlitskaya@redhat.com>
Co-Authored-By: Timothée Ravier <tim@siosm.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant