-
-
Notifications
You must be signed in to change notification settings - Fork 306
Closed
Labels
Description
To have deterministic / reproducible output, pex should produce a byte-for-byte identical output given identical inputs.
There are a few common cases where this breaks down:
- unstable timestamps included in output archives (zip/tar files, generally)
- This can be fixed by ensuring that pex uses hardcoded timestamps for the entries that it places in archives.
- unstable shas/timestamps intentionally included in metadata
- Fixing things like this might involve adding an option to either disable including this info, or to stabilize it.
- unstable ordering of hash iteration between machines
- Harder to hunt down, and harder to defend against. But fixing it involves using order preserving or sorted structures.
- use of absolute paths, or paths that are host specific.
It is out of scope (for this ticket) to stabilize the input files to pex (ie, adding lockfile support). So in cases where the network is involved, structures should be sorted.
It's not clear which combination of these issues might be in play in pex, so it would be good to start by getting a reproducibility test/experiment harness in place that makes it easy to compare two pex outputs and identify the above issues.