-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug_loc info is (slightly) non-deterministic #45397
Comments
cc @infinity0 |
On a non-lto build, I get differences in |
So interestingly, dwarfdump doesn't barf on the non-lto object file. The diff in dwarfdump output looks like:
There are a lot more of those, but if we ignore all the DW_AT_location differences, there is this:
Then further down:
Then at some point there's also:
which is actually in .debug_ranges, not .debug_loc. And
|
cc @tromey |
@glandium Did you build in the same build path, or try giving edit: looks like from one of your traces that you did build in the same build path, so I suppose the differences are due to something else. |
Same paths, same version of rust, same version of gcc, same version of everything. Essentially, I've been triggering the same Firefox CI build twice and compared the output. |
Today, I also observe non-repeatability in the The example I was debugging happened to be https://crates.io/crates/miniz_oxide |
might be related to #89911; you could try with rust 1.55 / llvm 12 and see if it reproduces? |
I tried with rustc-1.55 and clang/llvm-11.1.0 (which is what I had handy), and the result is perfectly deterministic and reproducible (experimented 20 times), no differences related to debug info. |
I ran the repeatbility experiment on the
|
At least based on #90301, we can easily blame the LLVM update for some change. |
In #90301 I have a fix. Would be nice to verify it fixes also this bug. |
@fangism Your problem is likely fixed now (well, it will be in the next nightly build, otherwise you have to build rustc yourself), but please confirm. @glandium Your original report predates the bug in LLVM I fixed, so most likely you will still see a problem. Could you make a reproducible case please? |
My original bug is probably long gone, as rust hasn't been a problem for Firefox reproducibility for a while (although we also now do cross-language LTO, so the real compilation is handled by clang's llvm). |
So, should we close this bug then? |
I've been comparing Firefox build on Mozilla CI with and without sccache, and after having eliminated all the expected differences, there was one remaining in the resulting binary that ended up being caused by rust code. That difference was in the build-id of libxul.so, as well as the checksum in the gnu_debuglink section. Both are influenced by the contents of debug sections. I repeated the comparisons with 2 builds without sccache and got the same discrepancy.
Further analysis revealed that the root difference lies in the
debug_loc
data ingkrust-b23623c450cfcda2.0.o
, that seems to be related to the_ZN5style10properties10LonghandId11parse_value17heed0466ee2fc256eE
symbol (style::properties::LonghandId::parse_value
). That function is generated by a python script, but I validated that the generated source that produced the different object files was identical.I can provide the two .o files I've been comparing, but they are each 200MB large (or about 27MB when compressed with zstd) so I don't know where to put them.
The "slightly" in the bug summary is because, compared to the size of those files, the differences are rather small. I'll additionally note that dwarfdump doesn't like those .o files and fails with:
I'll do another comparison run with LTO disabled, which hopefully would produce smaller .o files.
Cc: @michaelwoerister @luser @froydnj
The text was updated successfully, but these errors were encountered: