Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible builds regression in nightly #47135

Closed
kpcyrd opened this issue Jan 2, 2018 · 13 comments
Closed

Reproducible builds regression in nightly #47135

kpcyrd opened this issue Jan 2, 2018 · 13 comments
Labels
P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kpcyrd
Copy link

kpcyrd commented Jan 2, 2018

hello,

I'm running a CI system with reprotest to ensure the binaries built from the project are reproducible and verifiable. This system started to fail between 2017-12-19 and 2017-12-26:

Build 193, 2017-12-19T00:56:25Z, e6446ad65d193e0155ac02d58f338f9136182267, https://travis-ci.org/kpcyrd/sniffglue/jobs/318377138
Build 196, 2017-12-26T01:20:33Z, e6446ad65d193e0155ac02d58f338f9136182267, https://travis-ci.org/kpcyrd/sniffglue/jobs/321613070

I assume this is a regression in rust nightly, I can reproduce the test failure locally with a current rust nightly. My tests pass when switching from nightly to stable (with some magic to access -Zremap-path-prefix-{from,to}.

My testsuite looks like this:

#!/bin/sh
set -xue

# tested with rustc 1.22.1 and cargo 0.23.0

# by default, the build folder is located in /tmp, which is a tmpfs. The target/ folder
# can become quite large, causing the build to fail if we don't have enough RAM.
export TMPDIR="$HOME/tmp/repro-test"
mkdir -p "$TMPDIR"

reprotest -vv --vary=-time,-domain_host --source-pattern 'Cargo.* src/' '
    RUSTC_BOOTSTRAP=1 CARGO_HOME="$PWD/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
        RUSTFLAGS="-Zremap-path-prefix-from=$HOME -Zremap-path-prefix-to=/remap-home -Zremap-path-prefix-from=$PWD -Zremap-path-prefix-to=/remap-pwd" \
        cargo build --release --verbose' \
    target/release/sniffglue

You can run this yourself using:

git clone https://github.com/kpcyrd/sniffglue.git
cd sniffglue
docker build -t reprotest-sniffglue -f docs/Dockerfile.reprotest .
docker run --privileged reprotest-sniffglue ci/reprotest.sh

The full diffoscope report is quite large, the gist looks like this:

INFO:reprotest:build successful, copying artifacts
INFO:reprotest:copying /root/tmp/repro-test/reprotest.QwImZ4/artifacts-experiment-1/ back from virtual server's /root/tmp/repro-test/tmp29t413le/experiment-1
INFO:reprotest:Running diffoscope: ['diffoscope', '--exclude-directory-metadata', '/root/tmp/repro-test/tmp29t413le/control', '/root/tmp/repro-test/tmp29t413le/experiment-1']
--- /root/tmp/repro-test/tmp29t413le/control
+++ /root/tmp/repro-test/tmp29t413le/experiment-1
├── source-root
│ ├── target
│ │ ├── release
│ │ │ ├── sniffglue
│ │ │ │ ├── readelf --wide --file-header {}
│ │ │ │ │ @@ -6,15 +6,15 @@
│ │ │ │ │    OS/ABI:                            UNIX - System V
│ │ │ │ │    ABI Version:                       0
│ │ │ │ │    Type:                              DYN (Shared object file)
│ │ │ │ │    Machine:                           Advanced Micro Devices X86-64
│ │ │ │ │    Version:                           0x1
│ │ │ │ │    Entry point address:               0x15410
│ │ │ │ │    Start of program headers:          64 (bytes into file)
│ │ │ │ │ -  Start of section headers:          7624168 (bytes into file)
│ │ │ │ │ +  Start of section headers:          7624176 (bytes into file)
│ │ │ │ │    Flags:                             0x0
│ │ │ │ │    Size of this header:               64 (bytes)
│ │ │ │ │    Size of program headers:           56 (bytes)
│ │ │ │ │    Number of program headers:         10
│ │ │ │ │    Size of section headers:           64 (bytes)
│ │ │ │ │    Number of section headers:         44
│ │ │ │ │    Section header string table index: 43
│ │ │ │ ├── readelf --wide --sections {}
│ │ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │ │ -There are 44 section headers, starting at offset 0x7455e8:
│ │ │ │ │ +There are 44 section headers, starting at offset 0x7455f0:
│ │ │ │ │  
│ │ │ │ │  Section Headers:
│ │ │ │ │    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
│ │ │ │ │    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
│ │ │ │ │    [ 1] .interp           PROGBITS        0000000000000270 000270 00001c 00   A  0   0  1
│ │ │ │ │    [ 2] .note.ABI-tag     NOTE            000000000000028c 00028c 000020 00   A  0   0  4
│ │ │ │ │    [ 3] .note.gnu.build-id NOTE            00000000000002ac 0002ac 000024 00   A  0   0  4
│ │ │ │ │ @@ -40,14 +40,14 @@
│ │ │ │ │    [35] .debug_str        PROGBITS        0000000000000000 4704b8 0e5241 01  MS  0   0  1
│ │ │ │ │    [36] .debug_loc        PROGBITS        0000000000000000 5556f9 0c881e 00      0   0  1
│ │ │ │ │    [37] .debug_macinfo    PROGBITS        0000000000000000 61df17 000041 00      0   0  1
│ │ │ │ │    [38] .debug_pubtypes   PROGBITS        0000000000000000 61df58 02223b 00      0   0  1
│ │ │ │ │    [39] .debug_ranges     PROGBITS        0000000000000000 640193 07ef50 00      0   0  1
│ │ │ │ │    [40] .debug_macro      PROGBITS        0000000000000000 6bf0e3 013d65 00      0   0  1
│ │ │ │ │    [41] .symtab           SYMTAB          0000000000000000 6d2e48 02fee0 18     42 5529  8
│ │ │ │ │ -  [42] .strtab           STRTAB          0000000000000000 702d28 0426f8 00      0   0  1
│ │ │ │ │ -  [43] .shstrtab         STRTAB          0000000000000000 745420 0001c6 00      0   0  1
│ │ │ │ │ +  [42] .strtab           STRTAB          0000000000000000 702d28 042702 00      0   0  1
│ │ │ │ │ +  [43] .shstrtab         STRTAB          0000000000000000 74542a 0001c6 00      0   0  1
│ │ │ │ │  Key to Flags:
│ │ │ │ │    W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
│ │ │ │ │    L (link order), O (extra OS processing required), G (group), T (TLS),
│ │ │ │ │    C (compressed), x (unknown), o (OS specific), E (exclude),
│ │ │ │ │    l (large), p (processor specific)
│ │ │ │ ├── readelf --wide --symbols {}
│ │ │ │ │ @@ -5676,183 +5676,183 @@
│ │ │ │ │    5523: 000000000017d284   320 FUNC    LOCAL  DEFAULT   14 backtrace_dwarf_add
│ │ │ │ │    5524: 00000000001af930   104 FUNC    LOCAL  DEFAULT   14 je_malloc_tsd_boot1
│ │ │ │ │    5525: 0000000000197b50   132 FUNC    LOCAL  DEFAULT   14 je_chunk_boot
│ │ │ │ │    5526: 000000000019fc30    12 FUNC    LOCAL  DEFAULT   14 je_ctl_prefork
│ │ │ │ │    5527: 0000000000453958     0 OBJECT  LOCAL  DEFAULT   24 _DYNAMIC
│ │ │ │ │    5528: 0000000000177d0d    92 FUNC    LOCAL  DEFAULT   14 backtrace_release_view
│ │ │ │ │    5529: 000000000011c3f0  2038 FUNC    GLOBAL DEFAULT   14 _ZN5regex3dfa3Fsm12cached_state17hb554e8bfc5200e27E
│ │ │ │ │ -  5530: 00000000000bb960    30 FUNC    GLOBAL HIDDEN    14 _ZN4core3ptr13drop_in_place17hf44fe1997133c74dE.llvm.57E7137B
│ │ │ │ │ -  5531: 00000000000cb740    14 FUNC    GLOBAL DEFAULT   14 _ZN147_$LT$clap..args..arg_builder..option..OptBuilder$LT$$u27$n$C$$u20$$u27$e$GT$$u20$as$u20$clap..args..any_arg..AnyArg$LT$$u27$n$C$$u20$$u27$e$GT$$GT$8max_vals17h3d6e1bfec0acc71aE
│ │ │ │ │ -  5532: 000000000011a720   711 FUNC    GLOBAL HIDDEN    14 _ZN5regex8literals15LiteralSearcher3new17ha6ddbdce121a0edaE.llvm.FC380A3B
│ │ │ │ │ -  5533: 000000000012ad70   283 FUNC    GLOBAL HIDDEN    14 _ZN49_$LT$alloc..raw_vec..RawVec$LT$T$C$$u20$A$GT$$GT$7reserve17hda1b026b7e50ce87E
│ │ │ │ │ -  5534: 0000000000128f30    48 FUNC    GLOBAL HIDDEN    14 _ZN4core3ptr13drop_in_place17h277abafcc961cdd0E.llvm.DA02C37B
│ │ │ │ │ -  5535: 0000000000026130  1464 FUNC    GLOBAL HIDDEN    14 _ZN49_$LT$std..sync..mpsc..stream..Packet$LT$T$GT$$GT$4recv17hf720d50b94f350d7E
│ │ │ │ │ -  5536: 00000000000e2e50   433 FUNC    GLOBAL HIDDEN    14 _ZN65_$LT$clap..fmt..Format$LT$T$GT$$u20$as$u20$core..fmt..Display$GT$3fmt17hd3283ea0c9e52cdaE
│ │ │ │ │ -  5537: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND seccomp_load
│ │ │ │ │ -  5538: 000000000044b270    48 OBJECT  GLOBAL HIDDEN    23 vtable.o.llvm.FE355FBC
[...]

cc: @infinity0

@alexcrichton
Copy link
Member

Thanks for the report! Do you have a perhaps more isolated reproduction of this? A naive attempt to reproduce this locally (cargo build twice and see what changes) unfortunately wasn't able to reproduce this.

@infinity0
Copy link
Contributor

If you run reprotest --auto-build it will hopefully tell you which variation (e.g. time / timezone / fileordering) is causing the non-determinism.

Can we get a longer listing of the readelf --wide --symbols {} section? From what you pasted it looks like some table is getting re-ordered.

tag #34902

@kpcyrd
Copy link
Author

kpcyrd commented Jan 3, 2018

hey @alexcrichton and @infinity0, thanks for taking the time to look into this.

The project that I test covers a large amount of edge cases (eg. FFI), I probably need some time to track it down to a specific edgecases that is causing problems.

I tried --auto-build --vary=-time,-domain_host but it's not able to find a working configuration:

Not reproducible, even when fixing as much as reprotest knows how to. :(

I've attached a gist for both diffscope.out generated by reprotest and diffoscope.json generated with

diffoscope --json artifacts/diffoscope.json artifacts/control/source-root/target/release/sniffglue artifacts/experiment-1/source-root/target/release/sniffglue

https://gist.github.com/kpcyrd/ac5c8a4d8837d18d5f7f5bc074b71924

This was generated using:

$ rustup run nightly -- rustc --version
rustc 1.24.0-nightly (b65f0bedd 2018-01-01)
$ cargo +nightly version
cargo 0.25.0-nightly (a88fbace4 2017-12-29)
$ 

If it helps I can offer writing a script that tests nightly-2017-12-19 to nightly-2017-12-26 until it finds the nightly that broke (a full rustc bisect would probably take me a while to setup).

@gsollazzo gsollazzo added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Feb 1, 2018
@nikomatsakis nikomatsakis added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2018
@nikomatsakis
Copy link
Contributor

Nominating for prioritization in @rust-lang/compiler meeting .

@nikomatsakis
Copy link
Contributor

@kpcyrd

Hi! I'm trying to figure out what is happening here. In particular, I can't tell how reliably this can be reproduced etc.

If it helps I can offer writing a script that tests nightly-2017-12-19 to nightly-2017-12-26 until it finds the nightly that broke (a full rustc bisect would probably take me a while to setup).

If it is possible to bisect over nightlies, that would be tremendously helpful.

@nikomatsakis
Copy link
Contributor

triage: P-medium

It is our intention to support reproducible builds, but they are not fully supported; calling P-medium. It'd be great to fix though.

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated labels Feb 15, 2018
@kpcyrd
Copy link
Author

kpcyrd commented Feb 18, 2018

Sorry for my late reply. I've tested every nightly from 2017-12-19 to 2017-12-26:

nightly-2017-12-19: ✅ reproducible
nightly-2017-12-20: ✅ reproducible
nightly-2017-12-21: ✅ reproducible
nightly-2017-12-22: ✅ reproducible
nightly-2017-12-23: ✅ reproducible
nightly-2017-12-24: ✅ reproducible
nightly-2017-12-25: ✅ reproducible
nightly-2017-12-26: ❎ unreproducible

@kpcyrd
Copy link
Author

kpcyrd commented Feb 18, 2018

This is how I setup my tests:

git clone https://github.com/kpcyrd/sniffglue.git
cd sniffglue
git branch repro e6446ad65d193e0155ac02d58f338f9136182267
git checkout repro

Apply this patch:

diff --git a/ci/reprotest.sh b/ci/reprotest.sh
index 5bdd378..ecc48b7 100755
--- a/ci/reprotest.sh
+++ b/ci/reprotest.sh
@@ -8,8 +8,10 @@ set -xue
 export TMPDIR="$HOME/tmp/repro-test"
 mkdir -p "$TMPDIR"
 
+rustup install "nightly-2017-12-$1"
+
 reprotest -vv --vary=-time,-domain_host --source-pattern 'Cargo.* src/' '
-    RUSTC_BOOTSTRAP=1 CARGO_HOME="$PWD/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
+    RUSTC_BOOTSTRAP=1 CARGO_HOME="'$HOME'/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
         RUSTFLAGS="-Zremap-path-prefix-from=$HOME -Zremap-path-prefix-to=/remap-home -Zremap-path-prefix-from=$PWD -Zremap-path-prefix-to=/remap-pwd" \
-        cargo build --release --verbose' \
+        rustup run nightly-2017-12-'$1' cargo build --release --verbose' \
     target/release/sniffglue

Build the test container and run the tests:

# build container (reprotest-sniffglue)
BUILD_MODE=reprotest ci/build.sh
# test nightlies
docker run --privileged reprotest-sniffglue sh -c '(for x in `seq 19 26`; do ci/reprotest.sh "$x"; done)' | tee repro-regression.log

I can reproduce it reliably this way.

@michaelwoerister
Copy link
Member

Maybe this is because of multiple codegen units + ThinLTO? We enabled that by default during that time, didn't we, @alexcrichton? (#46910)

Does it also reproduce if you add -Ccodegen-units=1 to RUSTFLAGS?

@kpcyrd
Copy link
Author

kpcyrd commented Feb 19, 2018

@michaelwoerister The binary was reproducible with -Ccodegen-units=1 on both nightly-2017-12-26 and 2018-02-17. Nice!

I'm by no means an expert regarding these features, would it be possible to run codegen concurrently, wait until they finish and then sort the results before using them?

@alexcrichton
Copy link
Member

I believe this is fixed in nightly now? I'm not really sure why though. I bisected the PR that fixed this to #47522, although nothing there looks related to reproducible builds.

#47467 seems the most likely, but if that's true then it may mean that the bug is still lurking and hidden rather than fixed.

@kpcyrd
Copy link
Author

kpcyrd commented Feb 20, 2018

@alexcrichton You're right, I forgot to re-test nightly. I rebuilt with nightly a couple of times and everything was working nicely. I've re-enabled this test and started to add it for another project as well, I'm going to let you know if I notice anything again.

Thanks everybody!

@kpcyrd kpcyrd closed this as completed Feb 20, 2018
@michaelwoerister
Copy link
Member

Maybe ThinLTO is not entirely deterministic. I wouldn't be surprised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants