-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic perf.rust-lang.org runs (e.g. syn-opt). #69060
Comments
This may have been happening a lot, if anyone has the time, going through the history of It's far noisier than it should be IMO, and non-determinism might play a significant role. |
It seems possible that the queries are run from codegen, but since they're queries, I believe the problem is within Rust code (to my knowledge we don't have callbacks or anything like that from LLVM). |
It might be a combination. LLVM is known to have trouble maintaining deterministic performance, in a similar vein to rustc hashing pointers, but potentially even affecting output for LLVM optimziations. I highly advise disabling at least userspace ASLR, and manually check (by dumping IIRC @emberian managed to get the nondeterminism down to a bimodal (i.e. two possible profiles) situation for "irsy" (Is Rust Slim Yet), back in the day. |
Something suspicious when looking at the query view is one less of each It's possible all benchmarks are affected and |
@wesleywiser is pointing out this effect of one less of each This suggests to me this might be a very specific source of non-determinism in rustc or a proc macro. |
I've done some local experiments and I can't cause this on nightly, even by doing this: rm -rf target/release
RUSTFLAGS=-Cmetadata=$N cargo rustc --release -- -Zself-profile
mkdir perf-$N
mv syn-*.{events,string_{data,index}} perf-$N for several values of Next thing I'm going to do is build a local EDIT: nope, that doesn't trigger it either, but I can keep trying. |
Hm, FWIW perf does not use nightly channel, but maybe you can try the above we pre-produced artifacts? (We can look at some PR changing unimportant things like README or triagebot.toml, if they exist given rollups). But building locally should work too, you may need to disable ignore-git though in config.toml. |
@Mark-Simulacrum pointed out that some paths may differ, but building two copies of |
Just realized I've been doing all my testing without |
I was able to reproduce this
|
This seems to repro as well: mkdir perf-{1,2}
rm perf-{1,2}/*
rm -rf target/release
CARGO_INCREMENTAL=1 cargo +rust-3-stage1 rustc --release -- -Zself-profile=perf-1
rm -rf target/release
CARGO_INCREMENTAL=1 cargo +rust-4-stage1 rustc --release -- -Zself-profile=perf-2
Those builds only differ by an empty I assume that means every crate hash is different. |
Which queries are redone seems consistent for a given
So we clearly have some sort of dependency on that version, maybe via If we can't fix this, there's technically a potential workaround by making try builds lie about their git hash in |
Looks like the difference is that for that last pair of runs, the one with extra query counts uses:
This is troubling, since it suggests there's some non-determinism in specialization. To get query keys I've added I've used commands like these to extract the differing query keys: jq -s 'map(map(select(.name == "def_span")) | map(.args.arg0) | sort) | (.[1]-.[0])' perf-{1,2}/chrome_profiler.json that example outputs: [
"alloc::vec::{{impl}}[30]",
"core::slice::{{impl}}[127]",
"core::slice::{{impl}}[128]"
] I've then counted the |
This iterates over an
|
I guess the order this loop iterates in matters because of side-effects?
|
Sorting This is the
And
EDIT: see #69060 (comment). |
As I was afraid, there's explicit sorting upstream, and I'm not sure it helps: rust/src/librustc_trait_selection/traits/specialize/mod.rs Lines 308 to 318 in 2fbb075
|
EDIT: see #69060 (comment). |
I'm starting to believe this is the culprit: rust/src/librustc_metadata/rmeta/encoder.rs Lines 1454 to 1457 in b9ac291
The That is, I believe EDIT: it does seem that removing that sort (with no other changes) fixes the non-determinism for me. |
@Mark-Simulacrum If we want to avoid changing the compiler at all, there might be a way to guarantee stability of
Unlike working around individual between-compiler-versions instabilities, this also has the advantage that anything working with Combine that with disabling ASLR, and the allocation + hashing behavior should be fully stable. |
From #70462 (comment):
So we'd need to investigate the codegen/LLVM non-determinism separately. And that will be harder since it's all timing differences, not a discrete quantity query counts. @Mark-Simulacrum What might be useful is grabbing EDIT: I wonder if CGU partitioning can vary wildly and that's causing unpredictable codegen/LLVM performance. |
I had some spare time today and deployed the |
IIUC, the last 3 points horizontally (on both black and blue lines) are post-rust-lang/rustc-perf#645: |
We're not out of the water yet, this perf diff has a bunch of query count differences (42 for the top ones, Nothing in d223029...ce1ab35 seems like it could explain that. |
So, given that the version number has a measurable impact on performance... (a) I guess this means we'll see a bump in these graphs on each version number bump, and (b) should we do some optimization to find the version number that makes the compiler go fastest? :P |
@RalfJung The version gets in via I don't think rust-lang/cargo#8073 lets any part of the version into So why do we get such wild non-determinism for Well, it's always in LLVM. Sure, there's some query counts that vary, but those are tiny and ultimately irrelevant. The other hint is that this happens in One theory is that However, I seem to recall from experimentation that it's not actually the case, or at least I couldn't find differences in CGU partitioning. Which leads to an even simpler explanation: CGU ordering alone impacts the total time spent in LLVM that much. So if you want to make rustc slightly faster, don't waste your time with the hashes. |
Maybe I should have made it clearer (I hoped the ":P" was enough but maybe not), but I was joking in (b) above. ;) |
https://perf.rust-lang.org/detailed-query.html?commit=e369f6b617bc5124ec5d02626dc1c821589e6eb3&base_commit=4d1241f5158ffd66730e094d8f199ed654ed52ae&benchmark=syn-opt&run_name=patched%20incremental:%20println
This perf run was expected to be a no-op, but appears to have resulted in slightly less queries being run (2 def_span and 2 metadata reads).
cc @eddyb
The text was updated successfully, but these errors were encountered: