Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS builds are not reproducible #16028

Closed
aherrmann opened this issue Jun 14, 2023 · 5 comments
Closed

MacOS builds are not reproducible #16028

aherrmann opened this issue Jun 14, 2023 · 5 comments
Labels
bug Observed behavior contradicts documented or intended behavior linking os-macos
Milestone

Comments

@aherrmann
Copy link
Contributor

Zig Version

0.11.0-dev.3395+1e7dcaa3a

Steps to Reproduce and Observed Behavior

Adjust the path to the Zig compiler and invoke the following script twice:

ZIG=# path to zig compiler

cat >main.zig <<EOF
const std = @import("std");

pub fn main() void {
    std.io.getStdOut().writeAll(
        "Hello World!\n",
    ) catch unreachable;
}
EOF

test() {
  local mode="$1"
  local target="$2"
  echo "$mode $target (PWD $PWD)"
  rm -f main main.old main.o
  $ZIG build-exe -target $target -O $mode main.zig
  md5sum main
  { llvm-objdump --syms main | grep $PWD; } && echo contains PWD || echo does not contain PWD
  mv main main.old
  rm -f main main.o
  $ZIG build-exe -target $target -O $mode main.zig
  md5sum main
  diffoscope main main.old || true
  echo "----------------------------------------"
}

test Debug x86_64-linux
test Debug x86_64-macos
test ReleaseFast x86_64-macos
test ReleaseSafe x86_64-macos
test ReleaseSmall x86_64-macos

The output I observe on Ubuntu 22.04 is the following:

Debug x86_64-linux (PWD /tmp/zig-reproducible-binary-macos)
d24be1f3e202412214295c85b3c5db81  main
does not contain PWD
d24be1f3e202412214295c85b3c5db81  main
----------------------------------------
Debug x86_64-macos (PWD /tmp/zig-reproducible-binary-macos)
519c273df69d7404ac7bafbf326db2b5  main
0000000000000000      d  *UND* /tmp/zig-reproducible-binary-macos
0000000064896444      d  *UND* /tmp/zig-reproducible-binary-macos/main.o
contains PWD
c6687a2eae53d2b9d73000752ed4a9fa  main
--- main
+++ main.old
├── llvm-readobj --symbols {}
│ @@ -8288,15 +8288,15 @@
│    Symbol {
│      Name: /tmp/zig-reproducible-binary-macos/main.o (51146)
│      Type: SymDebugTable (0x66)
│      Section:  (0x0)
│      RefType: ReferenceFlagUndefinedLazy (0x1)
│      Flags [ (0x0)
│      ]
│ -    Value: 0x64896445
│ +    Value: 0x64896444
│    }
│    Symbol {
│      Name:  (0)
│      Type: SymDebugTable (0x2E)
│      Section:  (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)
├── x86_64
│┄ Format-specific differences are supported for this file format but no file-specific differences were detected; falling back to a binary diff. file(1) reports: Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE|HAS_TLV_DESCRIPTORS>
│ @@ -97,15 +97,15 @@
│  00000600: 0e00 0000 2000 0000 0c00 0000 2f75 7372  .... ......./usr
│  00000610: 2f6c 6962 2f64 796c 6400 0000 0000 0000  /lib/dyld.......
│  00000620: 2800 0080 1800 0000 c006 0000 0000 0000  (...............
│  00000630: 0000 0000 0000 0000 2a00 0000 1000 0000  ........*.......
│  00000640: 0000 0000 0000 0000 3200 0000 2000 0000  ........2... ...
│  00000650: 0100 0000 0007 0b00 0007 0b00 0100 0000  ................
│  00000660: 0500 0000 0000 0000 1b00 0000 1800 0000  ................
│ -00000670: 8266 5768 5140 385a 852d 3470 f4af 39a1  .fWhQ@8Z.-4p..9.
│ +00000670: 9778 c914 8de0 3de6 a58e dc05 c7bd 8c27  .x....=........'
│  00000680: 0c00 0000 3800 0000 1800 0000 0200 0000  ....8...........
│  00000690: 0564 0c05 0000 0100 2f75 7372 2f6c 6962  .d....../usr/lib
│  000006a0: 2f6c 6962 5379 7374 656d 2e42 2e64 796c  /libSystem.B.dyl
│  000006b0: 6962 0000 0000 0000 0000 0000 0000 0000  ib..............
│  000006c0: 5548 89e5 4881 ec10 0100 0089 bd44 ffff  UH..H........D..
│  000006d0: ff48 89b5 48ff ffff 4889 9550 ffff ff48  .H..H...H..P...H
│  000006e0: 8b05 1aa9 0800 488b 0048 8945 f889 bd5c  ......H..H.E...\
│ @@ -37783,15 +37783,15 @@
│  00093960: a05e 0700 0100 0000 d7b1 0000 0e01 0000  .^..............
│  00093970: f0e3 0700 0100 0000 02b2 0000 0e01 0000  ................
│  00093980: c0f7 0700 0100 0000 16b2 0000 0e01 0000  ................
│  00093990: 60f8 0700 0100 0000 2bb2 0000 0e0c 0000  `.......+.......
│  000939a0: 00d2 0800 0100 0000 a2c7 0000 6400 0000  ............d...
│  000939b0: 0000 0000 0000 0000 c5c7 0000 6400 0000  ............d...
│  000939c0: 0000 0000 0000 0000 cac7 0000 6600 0100  ............f...
│ -000939d0: 4564 8964 0000 0000 0000 0000 2e01 0000  Ed.d............
│ +000939d0: 4464 8964 0000 0000 0000 0000 2e01 0000  Dd.d............
│  000939e0: c006 0000 0100 0000 0100 0000 2401 0000  ............$...
│  000939f0: c006 0000 0100 0000 0000 0000 2400 0000  ............$...
│  00093a00: 2003 0000 0000 0000 0000 0000 4e01 0000   ...........N...
│  00093a10: 2003 0000 0000 0000 0000 0000 2e01 0000   ...............
│  00093a20: e009 0000 0100 0000 2c00 0000 2401 0000  ........,...$...
│  00093a30: e009 0000 0100 0000 0000 0000 2400 0000  ............$...
│  00093a40: a000 0000 0000 0000 0000 0000 4e01 0000  ............N...
----------------------------------------
ReleaseFast x86_64-macos (PWD /tmp/zig-reproducible-binary-macos)
949c5ce8bed677ce941dffebf6ffa09f  main
0000000000000000      d  *UND* /tmp/zig-reproducible-binary-macos
000000006489644c      d  *UND* /tmp/zig-reproducible-binary-macos/main.o
contains PWD
0bc5cd1419fb02de3cf29b3d607f3f12  main
--- main
+++ main.old
├── llvm-readobj --symbols {}
│ @@ -1040,15 +1040,15 @@
│    Symbol {
│      Name: /tmp/zig-reproducible-binary-macos/main.o (5753)
│      Type: SymDebugTable (0x66)
│      Section:  (0x0)
│      RefType: ReferenceFlagUndefinedLazy (0x1)
│      Flags [ (0x0)
│      ]
│ -    Value: 0x6489644F
│ +    Value: 0x6489644C
│    }
│    Symbol {
│      Name:  (0)
│      Type: SymDebugTable (0x2E)
│      Section:  (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)
├── x86_64
│┄ Format-specific differences are supported for this file format but no file-specific differences were detected; falling back to a binary diff. file(1) reports: Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE|HAS_TLV_DESCRIPTORS>
│ @@ -7071,15 +7071,15 @@
│  0001b9e0: 9914 0000 0e0c 0000 08a1 0100 0100 0000  ................
│  0001b9f0: b014 0000 0e0c 0000 04a1 0100 0100 0000  ................
│  0001ba00: c314 0000 0e0c 0000 e0a0 0100 0100 0000  ................
│  0001ba10: cc14 0000 0e0c 0000 f0a0 0100 0100 0000  ................
│  0001ba20: d814 0000 0e0c 0000 00a1 0100 0100 0000  ................
│  0001ba30: 5116 0000 6400 0000 0000 0000 0000 0000  Q...d...........
│  0001ba40: 7416 0000 6400 0000 0000 0000 0000 0000  t...d...........
│ -0001ba50: 7916 0000 6600 0100 4f64 8964 0000 0000  y...f...Od.d....
│ +0001ba50: 7916 0000 6600 0100 4c64 8964 0000 0000  y...f...Ld.d....
│  0001ba60: 0000 0000 2e01 0000 c006 0000 0100 0000  ................
│  0001ba70: 0100 0000 2401 0000 c006 0000 0100 0000  ....$...........
│  0001ba80: 0000 0000 2400 0000 d000 0000 0000 0000  ....$...........
│  0001ba90: 0000 0000 4e01 0000 d000 0000 0000 0000  ....N...........
│  0001baa0: 0000 0000 2e01 0000 9007 0000 0100 0000  ................
│  0001bab0: 2c00 0000 2401 0000 9007 0000 0100 0000  ,...$...........
│  0001bac0: 0000 0000 2400 0000 1000 0000 0000 0000  ....$...........
----------------------------------------
ReleaseSafe x86_64-macos (PWD /tmp/zig-reproducible-binary-macos)
9abeea3d229ac296f088c9fa2d2ec45b  main
0000000000000000      d  *UND* /tmp/zig-reproducible-binary-macos
0000000064896457      d  *UND* /tmp/zig-reproducible-binary-macos/main.o
contains PWD
f6ae1efd9d425ddd9b02829fddef3367  main
--- main
+++ main.old
├── llvm-readobj --symbols {}
│ @@ -1311,15 +1311,15 @@
│    Symbol {
│      Name: /tmp/zig-reproducible-binary-macos/main.o (7219)
│      Type: SymDebugTable (0x66)
│      Section:  (0x0)
│      RefType: ReferenceFlagUndefinedLazy (0x1)
│      Flags [ (0x0)
│      ]
│ -    Value: 0x6489645E
│ +    Value: 0x64896457
│    }
│    Symbol {
│      Name:  (0)
│      Type: SymDebugTable (0x2E)
│      Section:  (0x1)
│      RefType: UndefinedNonLazy (0x0)
│      Flags [ (0x0)
├── x86_64
│┄ Format-specific differences are supported for this file format but no file-specific differences were detected; falling back to a binary diff. file(1) reports: Mach-O 64-bit x86_64 executable, flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE|HAS_TLV_DESCRIPTORS>
│ @@ -10694,15 +10694,15 @@
│  00029c50: e880 0200 0100 0000 1c1a 0000 0e0c 0000  ................
│  00029c60: f880 0200 0100 0000 281a 0000 0e0c 0000  ........(.......
│  00029c70: 0881 0200 0100 0000 391a 0000 0e0c 0000  ........9.......
│  00029c80: 0c81 0200 0100 0000 4c1a 0000 0e0c 0000  ........L.......
│  00029c90: 1081 0200 0100 0000 0b1c 0000 6400 0000  ............d...
│  00029ca0: 0000 0000 0000 0000 2e1c 0000 6400 0000  ............d...
│  00029cb0: 0000 0000 0000 0000 331c 0000 6600 0100  ........3...f...
│ -00029cc0: 5e64 8964 0000 0000 0000 0000 2e01 0000  ^d.d............
│ +00029cc0: 5764 8964 0000 0000 0000 0000 2e01 0000  Wd.d............
│  00029cd0: c006 0000 0100 0000 0100 0000 2401 0000  ............$...
│  00029ce0: c006 0000 0100 0000 0000 0000 2400 0000  ............$...
│  00029cf0: 4002 0000 0000 0000 0000 0000 4e01 0000  @...........N...
│  00029d00: 4002 0000 0000 0000 0000 0000 2e01 0000  @...............
│  00029d10: 0009 0000 0100 0000 2c00 0000 2401 0000  ........,...$...
│  00029d20: 0009 0000 0100 0000 0000 0000 2400 0000  ............$...
│  00029d30: 4000 0000 0000 0000 0000 0000 4e01 0000  @...........N...
----------------------------------------
ReleaseSmall x86_64-macos (PWD /tmp/zig-reproducible-binary-macos)
6afd08f5fc26a4c4befd9067c18f6bbb  main
does not contain PWD
6afd08f5fc26a4c4befd9067c18f6bbb  main
----------------------------------------

This indicates two issues:

  1. The hashes of the resulting binaries differ between invocations for the Debug, ReleaseFast, and ReleaseSafe build modes on MacOS.
  2. The path to the current working directory is included in the resulting binary, making the binary non-reproducible in a context where this path is not reproducible (e.g. under build systems that create isolated working directories under non-reproducible prefixes, e.g. Bazel, see Zig binaries contain absolute paths aherrmann/rules_zig#79)

(I found #13919 as a relevant issue. However, it is closed as fixed.)

Expected Behavior

  1. The produced binaries are reproducible.
  2. The produced binaries do not contain the absolute path to the current working directory or any other non-reproducible filepath.

(Side note: I see that the Debug build mode has no reproducibility requirement. This is problematic for build systems that use content addressed storage for caching, such as Bazel.)

@aherrmann aherrmann added the bug Observed behavior contradicts documented or intended behavior label Jun 14, 2023
@kubkon
Copy link
Member

kubkon commented Jun 28, 2023

Debug builds will never be fully reproducible due to presence of debug symbol stabs which will include mtime of the SOs and OSOs. Release builds with debug info stripped should be fully reproducible however.

@kubkon
Copy link
Member

kubkon commented Jan 30, 2024

I am closing this issue as not applicable since we only guarantee build reproducibility for release builds and with debug info stripped. We have appropriate CI checks to that effect too:

diff stage3/bin/zig stage4/bin/zig
and
diff stage3-release/bin/zig stage4-release/bin/zig
Feel free to reopen if you feel that it still poses an issues though!

@kubkon kubkon closed this as completed Jan 30, 2024
@Vexu Vexu modified the milestones: 0.13.0, 0.12.0 Jan 30, 2024
@aherrmann
Copy link
Contributor Author

@kubkon Thanks for clarifying! It's understandable that there is no desire to guarantee reproducibility for binaries that contain debug information. In the Bazel use-case it is still problematic (aherrmann/rules_zig#79) as Bazel performs builds in a temporary sandbox working directory and consequently the embedded debug source paths are non-deterministic, and Bazel's (distributed) caching is based on content hashing. So, non-deterministic outputs increase cache misses and prevent features like early cut-off.

If the only source of indeterminism are these source paths. Would it be feasible to generate relative debug source paths or let the user control the debug source path prefix? If so, would it be viable to extend the reproducibility guarantees to debug builds with the caveat that they are only reproducible if the debug source prefix is controlled?

I saw that #9361 already requests project relative debug paths. Relatedly, there is an issue about debugging as well, where the embedded paths pointing to the sandbox working directory are no longer valid when users attempt to debug the produced binary, see aherrmann/rules_zig#207.

@kubkon
Copy link
Member

kubkon commented Feb 3, 2024

@kubkon Thanks for clarifying! It's understandable that there is no desire to guarantee reproducibility for binaries that contain debug information. In the Bazel use-case it is still problematic (aherrmann/rules_zig#79) as Bazel performs builds in a temporary sandbox working directory and consequently the embedded debug source paths are non-deterministic, and Bazel's (distributed) caching is based on content hashing. So, non-deterministic outputs increase cache misses and prevent features like early cut-off.

If the only source of indeterminism are these source paths. Would it be feasible to generate relative debug source paths or let the user control the debug source path prefix? If so, would it be viable to extend the reproducibility guarantees to debug builds with the caveat that they are only reproducible if the debug source prefix is controlled?

I saw that #9361 already requests project relative debug paths. Relatedly, there is an issue about debugging as well, where the embedded paths pointing to the sandbox working directory are no longer valid when users attempt to debug the produced binary, see aherrmann/rules_zig#207.

For MachO with traditional debug info stabs you get nondeterminism from both file paths and mtime values embedded in the symbol table of the final image. It may be a better option to simply always request a dSYM bundle which would allow us strip all debug info stabs from the binary while preserving debug info in a standalone file. Please note that emitting dSYM is still a todo.

@aherrmann
Copy link
Contributor Author

Thanks! Makes sense. It sounds like separating debug symbols is the better option then. There's precedent for this in Bazel so, it should be feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior linking os-macos
Projects
None yet
Development

No branches or pull requests

3 participants