[Arm64] use isb instruction instead of yield in spin loops #84725

sebpop · 2021-04-29T23:09:33Z

On arm64 we have seen on several databases that ISB (instruction synchronization
barrier) is better to use than yield in a spin loop. The yield instruction is a
nop. The isb instruction puts the processor to sleep for some short time. isb
is a good equivalent to the pause instruction on x86.

Below is an experiment that shows the effects of yield and isb on Arm64 and the
time of a pause instruction on x86 Intel processors. The micro-benchmarks use
https://github.com/google/benchmark.git

$ cat a.cc
static void BM_scalar_increment(benchmark::State& state) {
  int i = 0;
  for (auto _ : state)
    benchmark::DoNotOptimize(i++);
}
BENCHMARK(BM_scalar_increment);
static void BM_yield(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("yield"::);
}
BENCHMARK(BM_yield);
static void BM_isb(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("isb"::);
}
BENCHMARK(BM_isb);
BENCHMARK_MAIN();

$ g++ -o run a.cc -O2 -lbenchmark -lpthread
$ ./run

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------

AWS Graviton2 (Neoverse-N1) processor:
BM_scalar_increment      0.485 ns        0.485 ns   1000000000
BM_yield                 0.400 ns        0.400 ns   1000000000
BM_isb                    13.2 ns         13.2 ns     52993304

AWS Graviton (A-72) processor:
BM_scalar_increment      0.897 ns        0.874 ns    801558633
BM_yield                 0.877 ns        0.875 ns    800002377
BM_isb                    13.0 ns         12.7 ns     55169412

Apple Arm64 M1 processor:
BM_scalar_increment      0.315 ns        0.315 ns   1000000000
BM_yield                 0.313 ns        0.313 ns   1000000000
BM_isb                    9.06 ns         9.06 ns     77259282

static void BM_pause(benchmark::State& state) {
  for (auto _ : state)
    asm volatile("pause"::);
}
BENCHMARK(BM_pause);

Intel Skylake processor:
BM_scalar_increment      0.295 ns        0.295 ns   1000000000
BM_pause                  41.7 ns         41.7 ns     16780553

Tested on Graviton2 aarch64-linux with ./x.py test.

On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86. Below is an experiment that shows the effects of yield and isb on Arm64 and the time of a pause instruction on x86 Intel processors. The micro-benchmarks use https://github.com/google/benchmark.git $ cat a.cc static void BM_scalar_increment(benchmark::State& state) { int i = 0; for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_scalar_increment); static void BM_yield(benchmark::State& state) { for (auto _ : state) asm volatile("yield"::); } BENCHMARK(BM_yield); static void BM_isb(benchmark::State& state) { for (auto _ : state) asm volatile("isb"::); } BENCHMARK(BM_isb); BENCHMARK_MAIN(); $ g++ -o run a.cc -O2 -lbenchmark -lpthread $ ./run -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- AWS Graviton2 (Neoverse-N1) processor: BM_scalar_increment 0.485 ns 0.485 ns 1000000000 BM_yield 0.400 ns 0.400 ns 1000000000 BM_isb 13.2 ns 13.2 ns 52993304 AWS Graviton (A-72) processor: BM_scalar_increment 0.897 ns 0.874 ns 801558633 BM_yield 0.877 ns 0.875 ns 800002377 BM_isb 13.0 ns 12.7 ns 55169412 Apple Arm64 M1 processor: BM_scalar_increment 0.315 ns 0.315 ns 1000000000 BM_yield 0.313 ns 0.313 ns 1000000000 BM_isb 9.06 ns 9.06 ns 77259282 static void BM_pause(benchmark::State& state) { for (auto _ : state) asm volatile("pause"::); } BENCHMARK(BM_pause); Intel Skylake processor: BM_scalar_increment 0.295 ns 0.295 ns 1000000000 BM_pause 41.7 ns 41.7 ns 16780553 Tested on Graviton2 aarch64-linux with `./x.py test`.

rust-highfive · 2021-04-29T23:09:36Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @joshtriplett (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

sebpop · 2021-04-29T23:28:04Z

Here is the patch that implements the equivalent of pause instruction for Arm64 in MySQL mysql/mysql-server@c4e5505 and the associated bug report https://bugs.mysql.com/bug.php?id=100664
The patch was committed to mysql and is available in mysql 8.0.23.

joshtriplett · 2021-04-30T02:25:22Z

This seems reasonable to me. r=me if CI passes.

workingjubilee · 2021-04-30T03:15:30Z

@bors r=joshtriplett

bors · 2021-04-30T03:15:31Z

📌 Commit c064b65 has been approved by joshtriplett

[Arm64] use isb instruction instead of yield in spin loops On arm64 we have seen on several databases that ISB (instruction synchronization barrier) is better to use than yield in a spin loop. The yield instruction is a nop. The isb instruction puts the processor to sleep for some short time. isb is a good equivalent to the pause instruction on x86. Below is an experiment that shows the effects of yield and isb on Arm64 and the time of a pause instruction on x86 Intel processors. The micro-benchmarks use https://github.com/google/benchmark.git ``` $ cat a.cc static void BM_scalar_increment(benchmark::State& state) { int i = 0; for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_scalar_increment); static void BM_yield(benchmark::State& state) { for (auto _ : state) asm volatile("yield"::); } BENCHMARK(BM_yield); static void BM_isb(benchmark::State& state) { for (auto _ : state) asm volatile("isb"::); } BENCHMARK(BM_isb); BENCHMARK_MAIN(); $ g++ -o run a.cc -O2 -lbenchmark -lpthread $ ./run -------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------- AWS Graviton2 (Neoverse-N1) processor: BM_scalar_increment 0.485 ns 0.485 ns 1000000000 BM_yield 0.400 ns 0.400 ns 1000000000 BM_isb 13.2 ns 13.2 ns 52993304 AWS Graviton (A-72) processor: BM_scalar_increment 0.897 ns 0.874 ns 801558633 BM_yield 0.877 ns 0.875 ns 800002377 BM_isb 13.0 ns 12.7 ns 55169412 Apple Arm64 M1 processor: BM_scalar_increment 0.315 ns 0.315 ns 1000000000 BM_yield 0.313 ns 0.313 ns 1000000000 BM_isb 9.06 ns 9.06 ns 77259282 ``` ``` static void BM_pause(benchmark::State& state) { for (auto _ : state) asm volatile("pause"::); } BENCHMARK(BM_pause); Intel Skylake processor: BM_scalar_increment 0.295 ns 0.295 ns 1000000000 BM_pause 41.7 ns 41.7 ns 16780553 ``` Tested on Graviton2 aarch64-linux with `./x.py test`.

bors · 2021-05-02T04:54:35Z

⌛ Testing commit c064b65 with merge e244e84...

bors · 2021-05-02T07:09:33Z

☀️ Test successful - checks-actions
Approved by: joshtriplett
Pushing e244e84 to master...

Unfortunately, C still lacks a standard function for pause (x86, sparc) or yeild (arm) instructions, for use in spin lock or CAS loops. BIND has its own based on vendor intrinsics or inline asm. Previously, it was buried in the `isc_rwlock` implementation. This commit renames `isc_rwlock_pause()` to `isc_pause()` and moves it into <isc/pause.h>. This commit also fixes the configure script so that it detects ARM yield support on systems that identify as `aarch*` instead of `arm*`. On 64-bit ARM systems we now use the ISB (instruction synchronization barrier) instruction in preference to yield. The ISB instruction pauses the CPU for longer, several nanoseconds, which is more like the x86 pause instruction. There are more details in a Rust pull request, which also refers to MySQL making the same change: rust-lang/rust#84725

rust-highfive assigned joshtriplett Apr 29, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 29, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 30, 2021

Dylan-DPC-zz mentioned this pull request Apr 30, 2021

Rollup of 4 pull requests #84739

Closed

GuillaumeGomez mentioned this pull request Apr 30, 2021

Rollup of 7 pull requests #84757

Closed

Dylan-DPC-zz mentioned this pull request May 1, 2021

Rollup of 6 pull requests #84788

Closed

Dylan-DPC-zz mentioned this pull request May 1, 2021

Rollup of 6 pull requests #84791

Closed

Dylan-DPC-zz mentioned this pull request May 1, 2021

Rollup of 6 pull requests #84804

Closed

bors added the merged-by-bors This PR was explicitly merged by bors. label May 2, 2021

bors merged commit e244e84 into rust-lang:master May 2, 2021

rustbot added this to the 1.54.0 milestone May 2, 2021

ghost mentioned this pull request May 3, 2021

fix checking os_family rust-lang/miri#1786

Merged

AWSjswinney mentioned this pull request Jan 10, 2022

use isb instead of yield for spin locks on arm jemalloc/jemalloc#2205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Arm64] use isb instruction instead of yield in spin loops #84725

[Arm64] use isb instruction instead of yield in spin loops #84725

Uh oh!

sebpop commented Apr 29, 2021

Uh oh!

rust-highfive commented Apr 29, 2021

Uh oh!

sebpop commented Apr 29, 2021

Uh oh!

joshtriplett commented Apr 30, 2021

Uh oh!

workingjubilee commented Apr 30, 2021

Uh oh!

bors commented Apr 30, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[Arm64] use isb instruction instead of yield in spin loops #84725

[Arm64] use isb instruction instead of yield in spin loops #84725

Uh oh!

Conversation

sebpop commented Apr 29, 2021

Uh oh!

rust-highfive commented Apr 29, 2021

Uh oh!

sebpop commented Apr 29, 2021

Uh oh!

joshtriplett commented Apr 30, 2021

Uh oh!

workingjubilee commented Apr 30, 2021

Uh oh!

bors commented Apr 30, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

bors commented May 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants