-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use isb for normal cpu pause on aarch64 #49481
Conversation
1.9 julia> @benchmark ccall(:jl_cpu_pause, Cvoid, ())
BenchmarkTools.Trial: 3769 samples with 1000 evaluations.
Range (min … max): 369.250 ns … 4.283 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.333 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.326 μs ± 79.711 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂█▄
▃▁▁▁▃▃▁▃▁▁▁▃▁▃▁▄▄▄▁▁▃▄▃▃▁▄▁▁▁▄▃▃▁▁▃▅▄▄▄▃▁▃▅▄▄▅▄▄▄▅▆▆▆▅▇▇████ █
369 ns Histogram: log(frequency) by time 1.36 μs <
Memory estimate: 0 bytes, allocs estimate: 0. PR julia> @benchmark ccall(:jl_cpu_pause, Cvoid, ())
BenchmarkTools.Trial: 10000 samples with 998 evaluations.
Range (min … max): 16.449 ns … 56.696 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 16.575 ns ┊ GC (median): 0.00%
Time (mean ± σ): 16.742 ns ± 1.451 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▇ █▆ ▅▄ ▁ ▂ ▂
▆██▁██▁██▁█▅▁▃▃▁▆▄▁▃▃▁▇█▁██▁▇▇▄▆▆▇▁▆▆▁▆▆▁▆▆▁▆█▁▇▇▁▇▆▁▅▅▁▅▄▃ █
16.4 ns Histogram: log(frequency) by time 18.1 ns <
Memory estimate: 0 bytes, allocs estimate: 0. For reference on an 12th gen intel @benchmark ccall(:jl_cpu_pause,Cvoid, ())
BenchmarkTools.Trial: 10000 samples with 992 evaluations.
Range (min … max): 36.617 ns … 46.337 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 36.830 ns ┊ GC (median): 0.00%
Time (mean ± σ): 37.402 ns ± 0.923 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▅ ▇ ▄ ▁
█▅▃█▅▂▃▄▂▃▃▄▅▃▃▄▅▆▆▆▆▆▆▅▅█▆▅█▆▄▄▆▆▅▄▆▅▅▅▆▅▆▅▅▅▅▃▃▅▅▄█▆▇▅▅▄▅ █
36.6 ns Histogram: log(frequency) by time 40.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0. |
.NET does some more fancier things for this dotnet/coreclr#13670. But that can probably a separate PR. |
If we're changing |
Maybe, this behaves a lot more like the x86 one so it's probably a bit easier to share code. |
This makes sense from my side, and the fact that other languages like Rust have done that is encouraging. I am not necessarily familiar enough with AArch64 , so I am hoping @yuyichao can share some advice. |
Also cc: @kpamnany |
I'm no expert on AMD architecture, but this seems like a good thing. |
Following rust-lang/rust@c064b65, it seems that isb is probably best for this.
wfe
is still good for some specific spins we do because we explicitly calljl_cpu_wake
on them, but in most cases we just calljl_cpu_pause
which technically can take unbounded time in aarch64, though it seems to be about a microsecond in the M1 which is far too long for some things.In order to keep the nice
pause/wake
combo I addedjl_cpu_suspend
(Open to bikeshedding) to be the actual long wait that needs to be woken up, while pause now behaves more like x86pause
taking a couple nanoseconds to execute.