Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman save does not have an option to produce the same output every time #14978

Open
graywolf-at-work opened this issue Jul 19, 2022 · 16 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@graywolf-at-work
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind feature

Description

Podman save does not produce the same tarball every time it is executed. That
complicated doing reproducible builds.

+$ echo 'from scratch' | podman build -f - .
STEP 1/1: FROM scratch
COMMIT
--> 115e55eb5ca
115e55eb5ca5fc95deb6532fc83cebaac7ad89684719a2538776f17ee884c471
+$ podman save --format oci-archive -o a.tar 115e55eb5ca5fc95deb6532fc83cebaac7ad89684719a2538776f17ee884c471
Copying blob 5f70bf18a086 done
Copying config 115e55eb5c done
Writing manifest to image destination
Storing signatures
:$ podman save --format oci-archive -o b.tar 115e55eb5ca5fc95deb6532fc83cebaac7ad89684719a2538776f17ee884c471
Copying blob 5f70bf18a086 done
Copying config 115e55eb5c done
Writing manifest to image destination
Storing signatures
+$ sha256sum *.tar
3aa779c6c0aa936ee1edac3e49527477ebfafe4c2c6d76f020ff918cf906923a  a.tar
14f3a33901726efab7e953ef8ab04b71895b36e0c561b7ffb6416acd97943449  b.tar

Describe the results you received:

The two tarball are not the same.

Describe the results you expected:

The two tarball are the same.

Additional information you deem important (e.g. issue happens only occasionally):

Current workaround I have is this repackaging script:

#!/bin/sh
set -eu

tmpd=/tmp/repackage-tar.$$

cleanup() {
        rc=$?
        set +e

        rm -rf $v "$tmpd"

        exit "$rc"
}
trap cleanup EXIT

f=$1; shift

mkdir "$tmpd"
tar -tf "$f" | sort >"$tmpd/before"

if grep -q ^/ <"$tmpd/before"; then
        echo >&2 'Some records have absolute paths'
        exit 1
fi

mkdir "$tmpd/data"
tar -C "$tmpd/data" -xf "$f"
tar -C "$tmpd/data" --sort=name --mtime=@0 --owner=0 --group=0 --numeric-owner \
    --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime,delete=ctime \
    -cf "$f" --no-recursion --verbatim-files-from -T "$tmpd/before"
tar -tf "$f" >"$tmpd/after"

diff -u "$tmpd/before" "$tmpd/after"

So there is a way around this problem, but it definitely is suboptimal that
this is necessary. Ideal solution would be to have something like
--reproducible flag for podman save that would make sure to always produce
bit-to-bit equal tarballs given specific image id.

Output of podman version:

+$ podman version
Client:       Podman Engine
Version:      4.1.0
API Version:  4.1.0
Go Version:   go1.18.2
Git Commit:   73c021e0588393f18d2c54dec335ec9e960287a4
Built:        Thu May 12 02:00:55 2022
OS/Arch:      linux/amd64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

No

@openshift-ci openshift-ci bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 19, 2022
@graywolf-at-work graywolf-at-work changed the title podman save does not have an option produce the same output every time podman save does not have an option to produce the same output every time Jul 19, 2022
@mheon
Copy link
Member

mheon commented Jul 21, 2022

@flouthoc PTAL.

@flouthoc
Copy link
Collaborator

Thanks for reporting at first instance looks like issue with c/image especially while we perform commit of archive.

@flouthoc
Copy link
Collaborator

flouthoc commented Jul 22, 2022

@vrothberg @mtrmac Please correct me if i got this wrong, But I think commit implementation of c/image/oci/NewImageDestination first unpacks the dest into a new temporary dir and then packs it again to a tar, this process can cause change of timestamps in between causing final sha to be different everytime https://github.com/containers/image/blob/main/oci/archive/oci_dest.go#L159

@vrothberg
Copy link
Member

Yes, it is the timestamp that differ. When unpacking the tar's, the contents are identical.

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 22, 2022

Strictly speaking, I’m skeptical that much of the code makes any reproducibility guarantees. Notably json.Marshal says nothing on the topic.

That said, a c/image option to pass a specific time (similar to https://wiki.debian.org/ReproducibleBuilds/TimestampsInTarball ) seems reasonable. I don’t immediately have an opinion on what the default should be.

@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2022

Default should be EPOCH

@rhatdan
Copy link
Member

rhatdan commented Jul 23, 2022

I should say the default should be current if users want to override it they can set to EPOCH.

@graywolf-at-work
Copy link
Contributor Author

While I'm fine with either way, I wonder if anyone actually cares about having "current" timestamps inside the .tar. So I don't see much reason not to default to 0.

@github-actions
Copy link

github-actions bot commented Sep 1, 2022

A friendly reminder that this issue had no activity for 30 days.

@graywolf-at-work
Copy link
Contributor Author

Thanks for the reminder I guess? How can I help?

@rhatdan
Copy link
Member

rhatdan commented Sep 4, 2022

@mtrmac @vrothberg thoughts?

@github-actions
Copy link

github-actions bot commented Oct 5, 2022

A friendly reminder that this issue had no activity for 30 days.

@graywolf-at-work
Copy link
Contributor Author

Thank you for the friendly reminder.

@micah
Copy link

micah commented Oct 6, 2023

The reproducible community seems to be settling on having the environment variable SOURCE_DATE_EPOCH as something that can be set to a value, that is then used for all date-specific operations so that the dates stay the same with the same builds.

This appears to be possible with BuildKit (https://medium.com/nttlabs/bit-for-bit-reproducible-builds-with-dockerfile-7cc2b9faed9f).

I have a similar problem, I can feed the exact same .tar to podman import and the resulting digests are different. However, I do notice that the "blob" digest stays the same, but the "config" is different, for example:

$ cat build.tar | podman import - build:one
Getting image source signatures
Copying blob 82ce02f764d7 done  
Copying config f249c34e29 done  
Writing manifest to image destination
Storing signatures
sha256:f249c34e29498deea8af78d52cf863ab919550805014bc038f71a29845b64b30
 
$ cat build.tar | podman import - build:two
Getting image source signatures
Copying blob 82ce02f764d7 skipped: already exists  
Copying config 4e63c9cf3b done  
Writing manifest to image destination
Storing signatures
sha256:4e63c9cf3b1914f6dcf1358fb1aa141439aa03e8ce6c936fad989b8eb2a59648
$

As you can see from the diff of the podman image inspect the images are identical, except for the following:

$ diff <(podman image inspect build:one) <(podman image inspect build:two)
3,4c3,4
<           "Id": "b903e497949c93e333afc64e2007e01ede8609a7a14fe34437357c0b8b888928",
<           "Digest": "sha256:fed51ffee7f2eddff0cc8f24b61351f9eb3af38d282147649f27a66869251e37",
---
>           "Id": "a0df15170474864af311c33e5401143c11bdb882aa127a8219b7ec3451f78da4",
>           "Digest": "sha256:85c66e5cd0f437c4e8fa94ddc9e28ee3cc9870b6a1a12408563c3c777be8b8f5",
6c6
<                "localhost/build:one"
---
>                "localhost/build:two"
9c9
<                "localhost/build@sha256:fed51ffee7f2eddff0cc8f24b61351f9eb3af38d282147649f27a66869251e37"
---
>                "localhost/build@sha256:85c66e5cd0f437c4e8fa94ddc9e28ee3cc9870b6a1a12408563c3c777be8b8f5"
13c13
<           "Created": "2023-10-06T09:08:42.604174109-04:00",
---
>           "Created": "2023-10-06T09:08:45.756187305-04:00",
40c40
<                     "created": "2023-10-06T09:08:42.604174109-04:00",
---
>                     "created": "2023-10-06T09:08:45.756187305-04:00",
46c46
<                "localhost/build:one"
---
>                "localhost/build:two"

I attempted to overload the system's time during the import by using libfaketime to set a specific time, but the resulting dates in the containers were still the current dates:

$ cat build.tar | LD_PRELOAD=/usr/lib/x86_64-linux-gnu/faketime/libfaketime.so.1 FAKETIME='1999-09-30 05:05:05' podman import - date:fake_one
Getting image source signatures
Copying blob 82ce02f764d7 skipped: already exists  
Copying config 37329e4256 done  
Writing manifest to image destination
Storing signatures
sha256:37329e425662bc9e321de373aaa9e51f8d218006c070c33ec96a027695f8b7da
$ cat build.tar | LD_PRELOAD=/usr/lib/x86_64-linux-gnu/faketime/libfaketime.so.1 FAKETIME='1999-09-30 05:05:05' podman import - date:fake_two
Getting image source signatures
Copying blob 82ce02f764d7 skipped: already exists  
Copying config 0d3596e50b done  
Writing manifest to image destination
Storing signatures
sha256:0d3596e50be724770d982469636a5fe0b8984daf70afdfce056b1b67cdb85f54
$ diff <(podman image inspect date:fake_one) <(podman image inspect date:fake_two)
3,4c3,4
<           "Id": "0d3596e50be724770d982469636a5fe0b8984daf70afdfce056b1b67cdb85f54",
<           "Digest": "sha256:0ce4b5eb2e2316fa55f07b9369c230cdefa52d3615e2718efb9c451b72f88bea",
---
>           "Id": "37329e425662bc9e321de373aaa9e51f8d218006c070c33ec96a027695f8b7da",
>           "Digest": "sha256:82ab215d8051b867d296ba651d1c3e95a79a8a806fd96ae51552beeca1196c8a",
6c6
<                "localhost/date:fake_one"
---
>                "localhost/date:fake_two"
9c9
<                "localhost/date@sha256:0ce4b5eb2e2316fa55f07b9369c230cdefa52d3615e2718efb9c451b72f88bea"
---
>                "localhost/date@sha256:82ab215d8051b867d296ba651d1c3e95a79a8a806fd96ae51552beeca1196c8a"
13c13
<           "Created": "2023-10-06T09:23:37.179282787-04:00",
---
>           "Created": "2023-10-06T09:23:39.775335889-04:00",
40c40
<                     "created": "2023-10-06T09:23:37.179282787-04:00",
---
>                     "created": "2023-10-06T09:23:39.775335889-04:00",
46c46
<                "localhost/date:fake_one"
---
>                "localhost/date:fake_two"

@ConYel
Copy link

ConYel commented Mar 8, 2024

Is it possible to add an option of podman save --timestamp=0 so we can have always deterministic timestamps for the blobs in the archive?
@rhatdan

@rhatdan
Copy link
Member

rhatdan commented Mar 8, 2024

If you are interested in opening a PR I am sure it would be considered. podman build currently has an option like this, I believe.

podman build --help | grep timestamp
      --timestamp int                                set created timestamp to the specified epoch seconds to allow for deterministic builds, defaults to current time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

8 participants