-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up the build (i.e. Build faster) #489
Comments
More ideas:
|
Here's some profiles of Fresh debug build takes 15.8s: $ cargo rustc -- -Z time-passes -Z time-llvm-passes Compiling libc v0.2.21 Compiling gcc v0.3.45 Compiling lazy_static v0.2.6 Compiling untrusted v0.3.2 Compiling rand v0.3.15 Compiling num_cpus v1.3.0 Compiling deque v0.3.1 Compiling rayon v0.6.0 Compiling ring v0.7.3 (file:///home/cmr/proj/ring) time: 0.046; rss: 56MB parsing time: 0.000; rss: 56MB recursion limit time: 0.000; rss: 56MB crate injection time: 0.000; rss: 56MB plugin loading time: 0.000; rss: 56MB plugin registration time: 0.078; rss: 95MB expansion time: 0.000; rss: 95MB maybe building test harness time: 0.000; rss: 95MB maybe creating a macro crate time: 0.000; rss: 95MB checking for inline asm in case the target doesn't support it time: 0.003; rss: 95MB early lint checks time: 0.001; rss: 95MB AST validation time: 0.012; rss: 98MB name resolution time: 0.008; rss: 98MB complete gated feature checking time: 0.010; rss: 104MB lowering ast -> hir time: 0.003; rss: 104MB indexing hir time: 0.001; rss: 104MB attribute checking time: 0.004; rss: 107MB language item collection time: 0.002; rss: 107MB lifetime resolution time: 0.000; rss: 107MB looking for entry point time: 0.000; rss: 107MB looking for plugin registrar time: 0.003; rss: 107MB region resolution time: 0.001; rss: 107MB loop checking time: 0.000; rss: 107MB static item recursion checking time: 0.031; rss: 108MB compute_incremental_hashes_map time: 0.000; rss: 108MB load_dep_graph time: 0.001; rss: 108MB stability index time: 0.003; rss: 108MB stability checking time: 0.461; rss: 124MB type collecting time: 0.000; rss: 124MB variance inference time: 0.000; rss: 124MB impl wf inference time: 0.014; rss: 127MB coherence checking time: 0.021; rss: 127MB wf checking time: 0.045; rss: 127MB item-types checking time: 0.324; rss: 134MB item-bodies checking time: 0.022; rss: 136MB const checking time: 0.005; rss: 136MB privacy checking time: 0.002; rss: 136MB intrinsic checking time: 0.001; rss: 136MB effect checking time: 0.005; rss: 136MB match checking time: 0.003; rss: 136MB liveness checking time: 0.016; rss: 136MB rvalue checking time: 0.037; rss: 151MB MIR dump time: 0.005; rss: 151MB SimplifyCfg time: 0.008; rss: 151MB QualifyAndPromoteConstants time: 0.012; rss: 151MB TypeckMir time: 0.000; rss: 151MB SimplifyBranches time: 0.002; rss: 151MB SimplifyCfg time: 0.028; rss: 151MB MIR cleanup and validation time: 0.044; rss: 151MB borrow checking time: 0.000; rss: 151MB reachability checking time: 0.003; rss: 151MB death checking time: 0.000; rss: 151MB unused lib feature checking time: 0.041; rss: 151MB lint checking time: 0.000; rss: 151MB resolving dependency formats time: 0.000; rss: 151MB NoLandingPads time: 0.002; rss: 151MB SimplifyCfg time: 0.004; rss: 151MB EraseRegions time: 0.001; rss: 151MB AddCallGuards time: 0.015; rss: 154MB ElaborateDrops time: 0.000; rss: 154MB NoLandingPads time: 0.003; rss: 154MB SimplifyCfg time: 0.000; rss: 154MB Inline time: 0.003; rss: 154MB InstCombine time: 0.001; rss: 154MB Deaggregator time: 0.000; rss: 154MB CopyPropagation time: 0.003; rss: 154MB SimplifyLocals time: 0.001; rss: 154MB AddCallGuards time: 0.000; rss: 154MB PreTrans time: 0.033; rss: 154MB MIR optimisations time: 0.009; rss: 154MB write metadata time: 0.066; rss: 156MB translation item collection time: 0.013; rss: 156MB codegen unit partitioning time: 0.007; rss: 176MB internalize symbols time: 0.532; rss: 176MB translation time: 0.000; rss: 176MB assert dep graph time: 0.000; rss: 176MB serialize dep graph time: 0.046; rss: 142MB llvm function passes [0] time: 0.035; rss: 144MB llvm module passes [0] time: 0.941; rss: 150MB codegen passes [0] time: 0.000; rss: 149MB codegen passes [0] ===-------------------------------------------------------------------------=== Instruction Selection and Scheduling ===-------------------------------------------------------------------------=== Total Execution Time: 0.1400 seconds (0.1241 wall clock) It takes 3.3s to build just libring.rlib, and not the C code, and 9.3s to build both the rlib and the C code. It takes 0.41 seconds to build a trivial crate that takes the SHA512 of a line of stdin, after the deps (including ring) are already built: Compiling f v0.1.0 (file:///home/cmr/proj/ring/t/f) time: 0.000; rss: 48MB parsing time: 0.000; rss: 48MB recursion limit time: 0.000; rss: 48MB crate injection time: 0.000; rss: 48MB plugin loading time: 0.000; rss: 48MB plugin registration time: 0.022; rss: 84MB expansion time: 0.000; rss: 84MB maybe building test harness time: 0.000; rss: 84MB maybe creating a macro crate time: 0.000; rss: 84MB checking for inline asm in case the target doesn't support it time: 0.000; rss: 84MB early lint checks time: 0.000; rss: 84MB AST validation time: 0.005; rss: 84MB name resolution time: 0.000; rss: 84MB complete gated feature checking time: 0.000; rss: 84MB lowering ast -> hir time: 0.000; rss: 84MB indexing hir time: 0.000; rss: 84MB attribute checking time: 0.000; rss: 84MB language item collection time: 0.000; rss: 84MB lifetime resolution time: 0.000; rss: 84MB looking for entry point time: 0.000; rss: 84MB looking for plugin registrar time: 0.000; rss: 84MB region resolution time: 0.000; rss: 84MB loop checking time: 0.000; rss: 84MB static item recursion checking time: 0.000; rss: 87MB compute_incremental_hashes_map time: 0.000; rss: 87MB load_dep_graph time: 0.000; rss: 87MB stability index time: 0.000; rss: 87MB stability checking time: 0.000; rss: 87MB type collecting time: 0.000; rss: 87MB variance inference time: 0.000; rss: 87MB impl wf inference time: 0.000; rss: 87MB coherence checking time: 0.000; rss: 87MB wf checking time: 0.001; rss: 87MB item-types checking time: 0.011; rss: 101MB item-bodies checking time: 0.002; rss: 101MB const checking time: 0.000; rss: 101MB privacy checking time: 0.000; rss: 101MB intrinsic checking time: 0.000; rss: 101MB effect checking time: 0.000; rss: 101MB match checking time: 0.000; rss: 101MB liveness checking time: 0.000; rss: 101MB rvalue checking time: 0.000; rss: 101MB MIR dump time: 0.000; rss: 101MB SimplifyCfg time: 0.000; rss: 101MB QualifyAndPromoteConstants time: 0.000; rss: 101MB TypeckMir time: 0.000; rss: 101MB SimplifyBranches time: 0.000; rss: 101MB SimplifyCfg time: 0.001; rss: 101MB MIR cleanup and validation time: 0.000; rss: 101MB borrow checking time: 0.000; rss: 101MB reachability checking time: 0.000; rss: 101MB death checking time: 0.000; rss: 101MB unused lib feature checking warning: unused result which must be used --> src/main.rs:7:5 | 7 | stdin().read_line(&mut s); | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | = note: #[warn(unused_must_use)] on by default Here's the size of the artifacts inside 60K add.o 8.0K aes-x86_64-elf.o 40K aes.o 8.0K aesni-gcm-x86_64-elf.o 8.0K aesni-x86_64-elf.o 68K bn.o 76K bn_test_convert.o 52K bn_test_new.o 8.0K bsaes-x86_64-elf.o 12K chacha-x86_64-elf.o 56K cmp.o 76K constant_time_test.o 60K convert.o 48K cpu-intel.o 44K crypto.o 160K curve25519.o 64K div.o 60K e_aes.o 48K ecp_nistz.o 4.0K ecp_nistz256-x86_64-elf.o 216K ecp_nistz256.o 60K exponentiation.o 52K gcd.o 68K gcm.o 60K generic.o 48K gfp_p256.o 84K gfp_p384.o 12K ghash-x86_64-elf.o 64K limbs.o 40K mem.o 60K montgomery.o 56K montgomery_inv.o 52K mul.o 12K p256-x86_64-asm-elf.o 12K poly1305-x86_64-elf.o 44K random.o 1.7M ring-15bf6c46f8e53abc.0.o 24K sha256-x86_64-elf.o 24K sha512-x86_64-elf.o 56K shift.o 96K sysrand.o 8.0K vpaes-x86_64-elf.o 16K x25519-asm-x86_64.o 48K x25519-x86_64.o 8.0K x86_64-mont-elf.o 12K x86_64-mont5-elf.o 716K ring-15bf6c46f8e53abc.0.bytecode.deflate 944K rust.metadata.bin In release mode, things aren't that much worse. Building ring by itself takes 15.6s to build, 21.78s with build deps. Rebuilding just libring.rlib takes just 4.64s. Building the C code takes most of the time. Here's the profile: Compiling ring v0.7.3 (file:///home/cmr/proj/ring) time: 0.045; rss: 56MB parsing time: 0.000; rss: 56MB recursion limit time: 0.000; rss: 56MB crate injection time: 0.000; rss: 56MB plugin loading time: 0.000; rss: 56MB plugin registration time: 0.077; rss: 94MB expansion time: 0.000; rss: 94MB maybe building test harness time: 0.001; rss: 94MB maybe creating a macro crate time: 0.000; rss: 94MB checking for inline asm in case the target doesn't support it time: 0.003; rss: 94MB early lint checks time: 0.001; rss: 94MB AST validation time: 0.011; rss: 99MB name resolution time: 0.008; rss: 99MB complete gated feature checking time: 0.009; rss: 101MB lowering ast -> hir time: 0.003; rss: 105MB indexing hir time: 0.001; rss: 105MB attribute checking time: 0.004; rss: 105MB language item collection time: 0.002; rss: 105MB lifetime resolution time: 0.000; rss: 105MB looking for entry point time: 0.000; rss: 105MB looking for plugin registrar time: 0.003; rss: 107MB region resolution time: 0.001; rss: 107MB loop checking time: 0.000; rss: 107MB static item recursion checking time: 0.014; rss: 107MB compute_incremental_hashes_map time: 0.000; rss: 107MB load_dep_graph time: 0.001; rss: 107MB stability index time: 0.003; rss: 107MB stability checking time: 0.464; rss: 123MB type collecting time: 0.000; rss: 123MB variance inference time: 0.000; rss: 123MB impl wf inference time: 0.014; rss: 126MB coherence checking time: 0.020; rss: 126MB wf checking time: 0.045; rss: 126MB item-types checking time: 0.322; rss: 133MB item-bodies checking time: 0.022; rss: 133MB const checking time: 0.005; rss: 133MB privacy checking time: 0.004; rss: 133MB intrinsic checking time: 0.001; rss: 133MB effect checking time: 0.006; rss: 133MB match checking time: 0.003; rss: 133MB liveness checking time: 0.016; rss: 133MB rvalue checking time: 0.036; rss: 150MB MIR dump time: 0.004; rss: 150MB SimplifyCfg time: 0.009; rss: 150MB QualifyAndPromoteConstants time: 0.013; rss: 150MB TypeckMir time: 0.000; rss: 150MB SimplifyBranches time: 0.002; rss: 150MB SimplifyCfg time: 0.029; rss: 150MB MIR cleanup and validation time: 0.045; rss: 152MB borrow checking time: 0.000; rss: 152MB reachability checking time: 0.003; rss: 152MB death checking time: 0.000; rss: 152MB unused lib feature checking time: 0.041; rss: 152MB lint checking time: 0.000; rss: 152MB resolving dependency formats time: 0.000; rss: 152MB NoLandingPads time: 0.002; rss: 152MB SimplifyCfg time: 0.005; rss: 152MB EraseRegions time: 0.001; rss: 152MB AddCallGuards time: 0.014; rss: 152MB ElaborateDrops time: 0.000; rss: 152MB NoLandingPads time: 0.002; rss: 152MB SimplifyCfg time: 0.000; rss: 152MB Inline time: 0.002; rss: 152MB InstCombine time: 0.001; rss: 152MB Deaggregator time: 0.000; rss: 152MB CopyPropagation time: 0.003; rss: 152MB SimplifyLocals time: 0.001; rss: 152MB AddCallGuards time: 0.000; rss: 152MB PreTrans time: 0.031; rss: 152MB MIR optimisations time: 0.009; rss: 154MB write metadata time: 0.065; rss: 156MB translation item collection time: 0.013; rss: 156MB codegen unit partitioning time: 0.007; rss: 170MB internalize symbols time: 0.391; rss: 170MB translation time: 0.000; rss: 170MB assert dep graph time: 0.000; rss: 170MB serialize dep graph time: 0.164; rss: 136MB llvm function passes [0] time: 2.155; rss: 140MB llvm module passes [0] time: 0.562; rss: 143MB codegen passes [0] time: 0.001; rss: 143MB codegen passes [0] ===-------------------------------------------------------------------------=== Register Allocation ===-------------------------------------------------------------------------=== Total Execution Time: 0.0167 seconds (0.0174 wall clock) |
Note that moving out |
I believe it will make builds faster then editing the tests, because the lib won't be recompiled at all. To me this is a major benefit since "cargo test" is the primary way I build ring. I've done a bunch of work to remove the need for C++ at all, and to greatly reduce the amount of C code present. Unfortunately it's kind of a big project that's 75% done, hard to commit incrementally, and kind of stalled right now. But if the C/C++ stuff is what's making things slow, then this will help a lot when I can get time to finish it. |
That's a good benefit that I didn't think of.
I instrumented the build script to print how much time executing subcommands takes (not accounting for any parallelism or any work the build script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556 seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519 seconds, in debug mode. In release mode, the asm takes 0.3802813 seconds, and the C takes 4.583867 seconds. |
cmr <notifications@github.com> wrote:
I instrumented the build script to print how much time executing
subcommands takes (not accounting for any parallelism or any work the build
script does). Overall, it takes 4.462682 seconds. The perl takes 1.891556
seconds, the asm takes 0.3488179 seconds, and the C itself takes 2.168519
seconds, in debug mode.
We should decide which use cases we're trying to optimize for.
I think Ted is mostly interested in the "time to build *ring* as a
dependency of another project" wall time. In that case, the Perl step is
skipped because it is precomputed.
I'm mostly interested in "cargo test --features=rsa_signing" build speed,
to help people contributing code, testing, etc. to *ring*.
I have to say, I personally think the build speed is quite bearable.
In any case, I did an experiment last night that shows we'll soon be able
to remove 10 files, including all the C++, which all adds up to about 2,800
lines of code. And I think not long after that we'll remove another ~10 C
files. So the natural progression looks positive as far as build speed is
concerned.
|
cc @arielb1 |
Now that rust-lang/rust#41469 is merged, you should also look at time-passes. |
BTW, if you don't need RSA then in ring 0.8.0 you'll be able to with |
Is there anybody unhappy with the build time now? We've made several changes that should be improvements, though we haven't attempted to measure everything again. Without new measurements this is unactionable. |
OK, I'm going to close this now, on the assumption everything is A-OK. |
In another bug, @luser suggested that we add a way to just build the digest API, without the rest, in the name of making the build faster: “Mostly faster compiling, yeah. Rust compilation is slow enough, pulling in another large dependency just to use one small bit of it makes the problem even worse.”
Since that time, the build system has been completely rewritten, mostly through @weiznich's awesome work. One thing we did was pregenerate the assembly language code from the PerlAsm scripts, so that all the Perl steps are skipped when building from crates.io, which may help.
But, we should still try to make the build faster. This requires somebody to profile the build to find out what the bottlenecks are.
Wild guess ideas:
While we did spend significant effort ensuring the build is parallelized, we didn't make everything perfect in that respect. There is probably some low-hanging fruit regarding parallelism there.
When we're not building from .Git (when ".git" doesn't exist), maybe we should just avoid dirty checking and just go straight to compiling everything unconditionally. Presumably, Cargo does its own dirty checking to ensure each library is only built once, so our own dirty checking is superfluous in that case as our build script would only be run when all files are dirty. (Dirty checking would still be essential for Git builds, of course.)
Now there is a ring-test library that contains some test code, which is built in addition to the ring-core library which contains the C & asm code for the library proper. The ring-test library ideally shouldn't even be linked into the ring library at all. If we changed constant_time_test and bn_test to be "integration" tests instead of "unit" tests then we could use
#[link]
inside the integration test files to link libring-test.a only to the integration tests, and not to everything else. If this were done, then dependent crates that build ring from crates.io wouldn't need to link libring-test.a. Presumably, we would use the presence or absence of ".git" in build.rs to determine whether or not to build libring-test.a. However, I'm not sure that this is a good idea because users can do "cargo test -p ring" from within a dependent crate to run the ring test suite, and this would break. Probably we need to wait for Add support for test build scripts rust-lang/cargo#1581 instead.The text was updated successfully, but these errors were encountered: