Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use isb for normal cpu pause on aarch64 #49481

Merged
merged 3 commits into from
May 3, 2023

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Apr 24, 2023

Following rust-lang/rust@c064b65, it seems that isb is probably best for this. wfe is still good for some specific spins we do because we explicitly call jl_cpu_wake on them, but in most cases we just call jl_cpu_pause which technically can take unbounded time in aarch64, though it seems to be about a microsecond in the M1 which is far too long for some things.

In order to keep the nice pause/wakecombo I added jl_cpu_suspend (Open to bikeshedding) to be the actual long wait that needs to be woken up, while pause now behaves more like x86 pause taking a couple nanoseconds to execute.

@gbaraldi
Copy link
Member Author

gbaraldi commented Apr 24, 2023

1.9

julia> @benchmark ccall(:jl_cpu_pause, Cvoid, ())
BenchmarkTools.Trial: 3769 samples with 1000 evaluations.
 Range (min  max):  369.250 ns   4.283 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):       1.333 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.326 μs ± 79.711 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                           ▂█▄
  ▃▁▁▁▃▃▁▃▁▁▁▃▁▃▁▄▄▄▁▁▃▄▃▃▁▄▁▁▁▄▃▃▁▁▃▅▄▄▄▃▁▃▅▄▄▅▄▄▄▅▆▆▆▅▇▇████ █
  369 ns        Histogram: log(frequency) by time      1.36 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

PR

julia> @benchmark ccall(:jl_cpu_pause, Cvoid, ())
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
 Range (min  max):  16.449 ns  56.696 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     16.575 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   16.742 ns ±  1.451 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▄▇ █▆ ▅▄              ▁ ▂                                  ▂
  ▆██▁██▁██▁█▅▁▃▃▁▆▄▁▃▃▁▇█▁██▁▇▇▄▆▆▇▁▆▆▁▆▆▁▆▆▁▆█▁▇▇▁▇▆▁▅▅▁▅▄▃ █
  16.4 ns      Histogram: log(frequency) by time      18.1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

For reference on an 12th gen intel

@benchmark ccall(:jl_cpu_pause,Cvoid, ())
BenchmarkTools.Trial: 10000 samples with 992 evaluations.
 Range (min  max):  36.617 ns  46.337 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     36.830 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.402 ns ±  0.923 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █  ▅                     ▇  ▄                               ▁
  █▅▃█▅▂▃▄▂▃▃▄▅▃▃▄▅▆▆▆▆▆▆▅▅█▆▅█▆▄▄▆▆▅▄▆▅▅▅▆▅▆▅▅▅▅▃▃▅▅▄█▆▇▅▅▄▅ █
  36.6 ns      Histogram: log(frequency) by time      40.3 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

@gbaraldi gbaraldi requested review from vtjnash and vchuravy April 24, 2023 15:08
@gbaraldi
Copy link
Member Author

.NET does some more fancier things for this dotnet/coreclr#13670. But that can probably a separate PR.

@d-netto
Copy link
Member

d-netto commented Apr 29, 2023

If we're changing jl_cpu_pause, then the exponential backoff algorithm from #48600 might need some adjustments as well.

@gbaraldi
Copy link
Member Author

Maybe, this behaves a lot more like the x86 one so it's probably a bit easier to share code.

@vchuravy vchuravy requested a review from yuyichao April 29, 2023 17:06
@vchuravy
Copy link
Member

This makes sense from my side, and the fact that other languages like Rust have done that is encouraging.

I am not necessarily familiar enough with AArch64 , so I am hoping @yuyichao can share some advice.

@vchuravy
Copy link
Member

vchuravy commented May 3, 2023

Also cc: @kpamnany

@kpamnany
Copy link
Contributor

kpamnany commented May 3, 2023

I'm no expert on AMD architecture, but this seems like a good thing.

@vchuravy vchuravy merged commit a35db92 into JuliaLang:master May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants