Rollup of 6 pull requests #84788

Dylan-DPC-zz · 2021-05-01T12:27:33Z

Successful merges:

[aarch64] add target feature outline-atomics #83655 ([aarch64] add target feature outline-atomics)
Allow setting target_family to multiple values, and implement target_family="wasm" #84072 (Allow setting target_family to multiple values, and implement target_family="wasm")
Allow running x.py test --stage 2 src/tools/linkchecker with download-rustc = true #84471 (Allow running x.py test --stage 2 src/tools/linkchecker with download-rustc = true)
Unignore a couple of tests #84638 (Unignore a couple of tests)
[Arm64] use isb instruction instead of yield in spin loops #84725 ([Arm64] use isb instruction instead of yield in spin loops)
Update compiler-builtins to 0.1.42 to get fix for outlined atomics #84764 (Update compiler-builtins to 0.1.41 to get fix for outlined atomics)

Failed merges:

r? @ghost
@rustbot modify labels: rollup

This enables us to set more generic labels shared between targets. For example `target_family="wasm"` across all targets that are conceptually "wasm". See rust-lang/reference#1006

…true` Previously, the LD_LIBRARY_PATH for the linkchecker looked like `build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib`, because the linkchecker depends on the master copy of the standard library. This is true, but doesn't include the library path for the compiler libraries: ``` /home/joshua/src/rust/rust/build/x86_64-unknown-linux-gnu/stage1-tools-bin/error_index_generator: error while loading shared libraries: libLLVM-12-rust-1.53.0-nightly.so: cannot open shared object file: No such file or directory ``` That file is in `build/x86_64-unknown-linux-gnu/stage1/lib/libLLVM-12-rust-1.53.0-nightly.so`, which wasn't included in the dynamic path. This adds `build/x86_64-unknown-linux-gnu/stage1/lib` to the dynamic path for the linkchecker.

On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86. Below is an experiment that shows the effects of yield and isb on Arm64 and the time of a pause instruction on x86 Intel processors. The micro-benchmarks use https://github.com/google/benchmark.git $ cat a.cc static void BM_scalar_increment(benchmark::State& state) { int i = 0; for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_scalar_increment); static void BM_yield(benchmark::State& state) { for (auto _ : state) asm volatile("yield"::); } BENCHMARK(BM_yield); static void BM_isb(benchmark::State& state) { for (auto _ : state) asm volatile("isb"::); } BENCHMARK(BM_isb); BENCHMARK_MAIN(); $ g++ -o run a.cc -O2 -lbenchmark -lpthread $ ./run -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- AWS Graviton2 (Neoverse-N1) processor: BM_scalar_increment 0.485 ns 0.485 ns 1000000000 BM_yield 0.400 ns 0.400 ns 1000000000 BM_isb 13.2 ns 13.2 ns 52993304 AWS Graviton (A-72) processor: BM_scalar_increment 0.897 ns 0.874 ns 801558633 BM_yield 0.877 ns 0.875 ns 800002377 BM_isb 13.0 ns 12.7 ns 55169412 Apple Arm64 M1 processor: BM_scalar_increment 0.315 ns 0.315 ns 1000000000 BM_yield 0.313 ns 0.313 ns 1000000000 BM_isb 9.06 ns 9.06 ns 77259282 static void BM_pause(benchmark::State& state) { for (auto _ : state) asm volatile("pause"::); } BENCHMARK(BM_pause); Intel Skylake processor: BM_scalar_increment 0.295 ns 0.295 ns 1000000000 BM_pause 41.7 ns 41.7 ns 16780553 Tested on Graviton2 aarch64-linux with `./x.py test`.

This should fix linking of other C code (and soon Rust-generated code) on aarch64 musl.

Enable outline-atomics by default as enabled in clang by the following commit https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11 Performance improves by several orders of magnitude when using the LSE instructions instead of the ARMv8.0 compatible load/store exclusive instructions. Tested on Graviton2 aarch64-linux with x.py build && x.py install && x.py test

…htriplett [aarch64] add target feature outline-atomics Enable outline-atomics by default as enabled in clang by the following commit https://reviews.llvm.org/rGc5e7e649d537067dec7111f3de1430d0fc8a4d11 Performance improves by several orders of magnitude when using the LSE instructions instead of the ARMv8.0 compatible load/store exclusive instructions. Tested on Graviton2 aarch64-linux with x.py build && x.py install && x.py test

… r=petrochenkov Allow setting `target_family` to multiple values, and implement `target_family="wasm"` As per the conclusion in [this thread](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Are.20we.20comfortable.20with.20adding.20an.20insta-stable.20cfg.28wasm.29.3F/near/233158441), this implements an ability to specify any number of `target_family` values, allowing for more flexible generic groups, or "families", to be created than just the OS-based unix/windows dichotomy. cc rust-lang/reference#1006

…acrum Allow running `x.py test --stage 2 src/tools/linkchecker` with `download-rustc = true` Previously, the LD_LIBRARY_PATH for the linkchecker looked like `build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib`, because the linkchecker depends on the master copy of the standard library. This is true, but doesn't include the library path for the compiler libraries: ``` /home/joshua/src/rust/rust/build/x86_64-unknown-linux-gnu/stage1-tools-bin/error_index_generator: error while loading shared libraries: libLLVM-12-rust-1.53.0-nightly.so: cannot open shared object file: No such file or directory ``` That file is in `build/x86_64-unknown-linux-gnu/stage1/lib/libLLVM-12-rust-1.53.0-nightly.so`, which wasn't included in the dynamic path. This adds `build/x86_64-unknown-linux-gnu/stage1/lib` to the dynamic path for the linkchecker.

…ulacrum Unignore a couple of tests

[Arm64] use isb instruction instead of yield in spin loops On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86. Below is an experiment that shows the effects of yield and isb on Arm64 and the time of a pause instruction on x86 Intel processors. The micro-benchmarks use https://github.com/google/benchmark.git ``` $ cat a.cc static void BM_scalar_increment(benchmark::State& state) { int i = 0; for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_scalar_increment); static void BM_yield(benchmark::State& state) { for (auto _ : state) asm volatile("yield"::); } BENCHMARK(BM_yield); static void BM_isb(benchmark::State& state) { for (auto _ : state) asm volatile("isb"::); } BENCHMARK(BM_isb); BENCHMARK_MAIN(); $ g++ -o run a.cc -O2 -lbenchmark -lpthread $ ./run -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- AWS Graviton2 (Neoverse-N1) processor: BM_scalar_increment 0.485 ns 0.485 ns 1000000000 BM_yield 0.400 ns 0.400 ns 1000000000 BM_isb 13.2 ns 13.2 ns 52993304 AWS Graviton (A-72) processor: BM_scalar_increment 0.897 ns 0.874 ns 801558633 BM_yield 0.877 ns 0.875 ns 800002377 BM_isb 13.0 ns 12.7 ns 55169412 Apple Arm64 M1 processor: BM_scalar_increment 0.315 ns 0.315 ns 1000000000 BM_yield 0.313 ns 0.313 ns 1000000000 BM_isb 9.06 ns 9.06 ns 77259282 ``` ``` static void BM_pause(benchmark::State& state) { for (auto _ : state) asm volatile("pause"::); } BENCHMARK(BM_pause); Intel Skylake processor: BM_scalar_increment 0.295 ns 0.295 ns 1000000000 BM_pause 41.7 ns 41.7 ns 16780553 ``` Tested on Graviton2 aarch64-linux with `./x.py test`.

…ns, r=Amanieu Update compiler-builtins to 0.1.41 to get fix for outlined atomics This should fix linking of other C code (and soon Rust-generated code) on aarch64 musl.

nagisa and others added 14 commits April 11, 2021 01:18

Allow setting target_family to multiple values

4afea69

This enables us to set more generic labels shared between targets. For example `target_family="wasm"` across all targets that are conceptually "wasm". See rust-lang/reference#1006

Set target_family="wasm" for wasm targets

dfe3c3c

unignore a couple of tests

cf46fb1

fix test

6697b0d

Update compiler-builtins to 0.1.41 to get fix for outlined atomics

49e67c3

This should fix linking of other C code (and soon Rust-generated code) on aarch64 musl.

Rollup merge of rust-lang#84638 - mark-i-m:unignore-tests, r=Mark-Sim…

c1fff0d

…ulacrum Unignore a couple of tests

Rollup merge of rust-lang#84764 - joshtriplett:update-compiler-builti…

15a03a0

…ns, r=Amanieu Update compiler-builtins to 0.1.41 to get fix for outlined atomics This should fix linking of other C code (and soon Rust-generated code) on aarch64 musl.

rustbot added the rollup A PR which is a rollup label May 1, 2021

Dylan-DPC-zz closed this May 1, 2021

Dylan-DPC-zz deleted the rollup-qevgi10 branch May 1, 2021 12:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rollup of 6 pull requests #84788

Rollup of 6 pull requests #84788

Uh oh!

Dylan-DPC-zz commented May 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Rollup of 6 pull requests #84788

Rollup of 6 pull requests #84788

Uh oh!

Conversation

Dylan-DPC-zz commented May 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants