Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile time regression #43141

Closed
Boddlnagg opened this issue Jul 10, 2017 · 15 comments · Fixed by #43723
Closed

Compile time regression #43141

Boddlnagg opened this issue Jul 10, 2017 · 15 comments · Fixed by #43723
Assignees
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Boddlnagg
Copy link
Contributor

There was a major compile time (and memory consumption) regression when I compare building winrt-rust on Rust 1.18 stable and current 1.20 nightly. (I couldn't test on beta because error: the option Z is only accepted on the nightly compiler.)

rustc 1.18.0 (03fc9d6 2017-06-06): 137.31 secs, 2873 MB peak memory
Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
warning: the option `Z` is unstable and should only be used on the nightly compiler, but it is currently accepted for backwards compatibility; this will soon change, see issue #31847 for more details

time: 2.145; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.711; rss: 808MB        expansion
time: 0.000; rss: 808MB maybe building test harness
time: 0.129; rss: 808MB maybe creating a macro crate
time: 0.000; rss: 808MB checking for inline asm in case the target doesn't support it
time: 0.537; rss: 808MB early lint checks
time: 0.158; rss: 808MB AST validation
time: 1.477; rss: 935MB name resolution
time: 0.710; rss: 935MB complete gated feature checking
time: 2.102; rss: 1453MB        lowering ast -> hir
time: 0.642; rss: 1361MB        indexing hir
time: 0.142; rss: 1361MB        attribute checking
time: 0.088; rss: 948MB language item collection
time: 0.451; rss: 954MB lifetime resolution
time: 0.000; rss: 954MB looking for entry point
time: 0.004; rss: 954MB looking for plugin registrar
time: 0.858; rss: 1095MB        region resolution
time: 0.194; rss: 1095MB        loop checking
time: 0.102; rss: 1095MB        static item recursion checking
time: 1.601; rss: 1110MB        compute_incremental_hashes_map
time: 0.000; rss: 1110MB        load_dep_graph
time: 0.248; rss: 1110MB        stability index
time: 0.437; rss: 1123MB        stability checking
time: 1.905; rss: 1292MB        type collecting
time: 0.122; rss: 1293MB        variance inference
time: 0.236; rss: 1307MB        impl wf inference
time: 3.658; rss: 1328MB        coherence checking
time: 8.390; rss: 1419MB        wf checking
time: 5.613; rss: 1477MB        item-types checking
time: 48.962; rss: 2098MB       item-bodies checking
time: 3.430; rss: 2164MB        const checking
time: 2.696; rss: 2174MB        privacy checking
time: 0.700; rss: 2174MB        intrinsic checking
time: 0.257; rss: 2174MB        effect checking
time: 0.880; rss: 2175MB        match checking
time: 0.513; rss: 2175MB        liveness checking
time: 3.851; rss: 2589MB        MIR dump
time: 0.499; rss: 2645MB      SimplifyCfg
time: 1.689; rss: 2645MB      TypeckMir
time: 1.294; rss: 2648MB      QualifyAndPromoteConstants
time: 0.087; rss: 2648MB      SimplifyBranches
time: 0.344; rss: 2648MB      SimplifyCfg
time: 3.914; rss: 2648MB        MIR cleanup and validation
time: 5.023; rss: 2655MB        borrow checking
time: 0.471; rss: 2662MB        reachability checking
time: 1.121; rss: 2662MB        death checking
time: 0.000; rss: 2662MB        unused lib feature checking
time: 3.540; rss: 2662MB        lint checking
time: 0.000; rss: 2662MB        resolving dependency formats
time: 0.085; rss: 2662MB      NoLandingPads
time: 0.349; rss: 2662MB      SimplifyCfg
time: 0.579; rss: 2677MB      EraseRegions
time: 0.168; rss: 2677MB      AddCallGuards
time: 1.869; rss: 2679MB      ElaborateDrops
time: 0.084; rss: 2679MB      NoLandingPads
time: 0.388; rss: 2679MB      SimplifyCfg
time: 0.000; rss: 2679MB      Inline
time: 0.307; rss: 2679MB      InstCombine
time: 0.533; rss: 2679MB      Deaggregator
time: 0.100; rss: 2679MB      CopyPropagation
time: 0.357; rss: 2681MB      SimplifyLocals
time: 0.168; rss: 2681MB      AddCallGuards
time: 0.093; rss: 2681MB      PreTrans
time: 5.084; rss: 2681MB        MIR optimisations
time: 2.501; rss: 2802MB      write metadata
time: 0.991; rss: 2831MB      translation item collection
time: 0.115; rss: 2873MB      codegen unit partitioning
time: 0.134; rss: 2866MB      internalize symbols
time: 6.013; rss: 2836MB        translation
time: 0.000; rss: 2836MB        assert dep graph
time: 0.000; rss: 2836MB        serialize dep graph
time: 0.204; rss: 357MB       llvm function passes [0]
time: 1.705; rss: 361MB       llvm module passes [0]
time: 1.634; rss: 371MB       codegen passes [0]
time: 0.002; rss: 371MB       codegen passes [0]
time: 3.600; rss: 371MB LLVM passes
time: 0.000; rss: 371MB serialize work products
time: 0.639; rss: 358MB linking
 Finished release [optimized] target(s) in 137.31 secs
rustc 1.20.0-nightly (d84693b 2017-07-09): 262.64 secs, 3346 MB peak memory
Compiling winrt v0.2.1 (file:///D:/Code/Rust/winrt-rust)
time: 2.448; rss: 449MB parsing
time: 0.000; rss: 449MB recursion limit
time: 0.000; rss: 449MB crate injection
time: 0.000; rss: 449MB plugin loading
time: 0.000; rss: 449MB plugin registration
time: 12.485; rss: 782MB        expansion
time: 0.000; rss: 782MB maybe building test harness
time: 0.142; rss: 782MB maybe creating a macro crate
time: 0.593; rss: 782MB creating allocators
time: 0.000; rss: 782MB checking for inline asm in case the target doesn't support it
time: 0.602; rss: 782MB early lint checks
time: 0.162; rss: 782MB AST validation
time: 2.673; rss: 982MB name resolution
time: 0.825; rss: 982MB complete gated feature checking
time: 2.580; rss: 1538MB        lowering ast -> hir
time: 0.728; rss: 1391MB        indexing hir
time: 0.166; rss: 1391MB        attribute checking
time: 0.105; rss: 964MB language item collection
time: 0.536; rss: 971MB lifetime resolution
time: 0.000; rss: 971MB looking for entry point
time: 0.005; rss: 971MB looking for plugin registrar
time: 0.228; rss: 971MB loop checking
time: 0.124; rss: 971MB static item recursion checking
time: 1.963; rss: 986MB compute_incremental_hashes_map
time: 0.000; rss: 986MB load_dep_graph
time: 0.271; rss: 987MB stability index
time: 0.490; rss: 1000MB        stability checking
time: 2.389; rss: 1163MB        type collecting
time: 0.327; rss: 1178MB        impl wf inference
time: 5.498; rss: 1195MB        coherence checking
time: 0.042; rss: 1195MB        variance testing
time: 96.245; rss: 1580MB       wf checking
time: 6.401; rss: 1634MB        item-types checking
time: 52.552; rss: 2229MB       item-bodies checking
time: 9.095; rss: 2350MB        const checking
time: 4.262; rss: 2360MB        privacy checking
time: 0.352; rss: 2363MB        intrinsic checking
time: 0.364; rss: 2363MB        effect checking
time: 0.638; rss: 2363MB        match checking
time: 0.607; rss: 2363MB        liveness checking
time: 15.272; rss: 3014MB       borrow checking
time: 0.499; rss: 3022MB        reachability checking
time: 1.118; rss: 3023MB        death checking
time: 0.000; rss: 3023MB        unused lib feature checking
time: 6.384; rss: 3023MB        lint checking
time: 0.000; rss: 3023MB        resolving dependency formats
time: 21.937; rss: 3226MB     write metadata
time: 1.238; rss: 3254MB      translation item collection
time: 0.119; rss: 3309MB      codegen unit partitioning
time: 0.187; rss: 3346MB      internalize symbols
time: 26.257; rss: 3338MB       translation
time: 0.000; rss: 3338MB        assert dep graph
time: 0.000; rss: 3338MB        serialize dep graph
time: 0.003; rss: 430MB       codegen passes [0]
time: 0.226; rss: 432MB       llvm function passes [1]
time: 1.994; rss: 436MB       llvm module passes [1]
time: 1.853; rss: 445MB       codegen passes [1]
time: 4.135; rss: 445MB LLVM passes
time: 0.000; rss: 445MB serialize work products
time: 0.597; rss: 431MB linking
 Finished release [optimized] target(s) in 262.64 secs
 

winrt-rust certainly is a rather exotic crate in that it mainly consists of a > 200k LOC generated file that wraps Windows COM APIs, but something seems to be wrong here, especially in the wf checking pass, which went from 8.4 seconds to 96.2 seconds.

To reproduce the numbers, you can run cargo rustc --release --features all -- -Z time-passes (--features all is important, otherwise only part of the crate is being built).

@Mark-Simulacrum Mark-Simulacrum added I-compiletime Issue: Problems and improvements with respect to compile times. regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 10, 2017
@Mark-Simulacrum
Copy link
Member

cc @rust-lang/compiler -- Possibly just the standard regression currently caused by MIR borrowck and allocators being introduced, but somewhat seems different.

@Mark-Simulacrum Mark-Simulacrum added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jul 10, 2017
@eddyb
Copy link
Member

eddyb commented Jul 10, 2017

For WF checking I'd ask @nikomatsakis and @arielb1 specifically. Sounds like a new check was added, or an existing one became much slower.

@Boddlnagg
Copy link
Contributor Author

I just tested on beta (rustc 1.19.0-beta.3) without -Z time-passes (which is no longer allowed) and it took 246.59 secs. So the major regression seems to have been introduced between 1.18 and 1.19.

@eddyb eddyb added regression-from-stable-to-beta Performance or correctness regression from stable to beta. and removed regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. labels Jul 12, 2017
@nikomatsakis
Copy link
Contributor

@Boddlnagg any chance you can run this in some sort of other profiler? I presume it only builds on Windows, right?

@nikomatsakis
Copy link
Contributor

@Boddlnagg another awesome thing -- if you are able to do it -- would be bisecting with nightly releases to try and narrow this down a bit more. I don't have easy access to a windows machine, but I'll try to find someone who does also.

@Mark-Simulacrum
Copy link
Member

Bisection narrowed to 75b0568...14f30da61(commits between nightly-2017-05-16 and nightly-2017-05-21. It seems likely this means #41911 is at fault here, cc @michaelwoerister. No guarantees though. cargo check works to reproduce, so anyone should be able to test this.

@nikomatsakis
Copy link
Contributor

Ah, #41911 could certainly have done it, though the reason is not yet obvious.

@brson
Copy link
Contributor

brson commented Jul 13, 2017

Seems like this won't be fixed for 1.19, but it looks pretty bad.

@alexcrichton alexcrichton added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Jul 23, 2017
@Mark-Simulacrum
Copy link
Member

@rust-lang/compiler -- this looks like it needs a look from someone to investigate the cause here.

@nikomatsakis
Copy link
Contributor

Assigning to @arielb1 to do profiling before/after #41911.

@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 28, 2017
@arielb1
Copy link
Contributor

arielb1 commented Aug 7, 2017

So after one possibly-unneeded reboot into Windows, it turns out that... drumroll

#41911 broke is_relevant_trait_impl:
https://github.com/rust-lang/rust/pull/41911/files#diff-6bfb45b8d6ccab4d1e18e623dee893faR192

+    let relevant: Vec<DefId> = all_trait_impls
+        .non_blanket_impls
+        .iter()
+        .cloned()
+        .filter(|&impl_def_id| {
+            let impl_self_ty = tcx.type_of(impl_def_id);
+            let impl_simple_self_ty = fast_reject::simplify_type(tcx,
+                                                                 impl_self_ty,
+                                                                 false).unwrap();
+            impl_simple_self_ty == self_ty
+        })
+        .collect(); 

@eddyb
Copy link
Member

eddyb commented Aug 7, 2017

@michaelwoerister
Copy link
Member

@eddyb I thought you were talking about sharing a single blanket_impls vec between all results for the same trait back then.

@eddyb
Copy link
Member

eddyb commented Aug 7, 2017

@michaelwoerister Maybe I noticed one problem, but clearly not all of them.

@Mark-Simulacrum
Copy link
Member

@arielb1 Responding to your ping on IRC, yes, I was able to reproduce via cargo check (just had to pass --target the-relevant-windows-triple).

arielb1 added a commit to arielb1/rust that referenced this issue Aug 8, 2017
A change in rust-lang#41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes rust-lang#43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.
bors added a commit that referenced this issue Aug 8, 2017
make `for_all_relevant_impls` O(1) again

A change in #41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes #43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.

r? @eddyb
beta-nominating because regression
alexcrichton pushed a commit to alexcrichton/rust that referenced this issue Aug 23, 2017
A change in rust-lang#41911 had made `for_all_relevant_impls` do a linear scan over
all impls, instead of using an HashMap. Use an HashMap again to avoid
quadratic blowup when there is a large number of structs with impls.

I think this fixes rust-lang#43141 completely, but I want better measurements in
order to be sure. As a perf patch, please don't roll this up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants