Skip to content

Make pex output deterministic / reproducible #716

@stuhood

Description

@stuhood

To have deterministic / reproducible output, pex should produce a byte-for-byte identical output given identical inputs.

There are a few common cases where this breaks down:

  1. unstable timestamps included in output archives (zip/tar files, generally)
    • This can be fixed by ensuring that pex uses hardcoded timestamps for the entries that it places in archives.
  2. unstable shas/timestamps intentionally included in metadata
    • Fixing things like this might involve adding an option to either disable including this info, or to stabilize it.
  3. unstable ordering of hash iteration between machines
    • Harder to hunt down, and harder to defend against. But fixing it involves using order preserving or sorted structures.
  4. use of absolute paths, or paths that are host specific.

It is out of scope (for this ticket) to stabilize the input files to pex (ie, adding lockfile support). So in cases where the network is involved, structures should be sorted.


It's not clear which combination of these issues might be in play in pex, so it would be good to start by getting a reproducibility test/experiment harness in place that makes it easy to compare two pex outputs and identify the above issues.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions