Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
x86_64: Use
lock or
instead of mfence
Based on x86_32 64-bit atomic SeqCst store using SSE generated by LLVM. https://godbolt.org/z/9sKEr8YWc Equivalent to mfence, but is 10-35% faster at least in simple cases on Coffee Lake. Below are the results of the microbenchmark on an Intel Core i7-9750H (Coffee Lake) with the ORDERING constant set to SeqCst. ``` bench_portable_atomic_arch/u128_store time: [11.610 ns 11.670 ns 11.738 ns] change: [-36.119% -35.236% -34.348%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild bench_portable_atomic_arch/u128_concurrent_load_store time: [202.30 µs 203.54 µs 205.24 µs] change: [-32.313% -31.167% -29.845%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe bench_portable_atomic_arch/u128_concurrent_store time: [395.74 µs 397.37 µs 398.98 µs] change: [-18.517% -17.560% -16.582%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) low mild 2 (2.00%) high mild 12 (12.00%) high severe bench_portable_atomic_arch/u128_concurrent_store_swap time: [791.21 µs 793.43 µs 795.69 µs] change: [-10.682% -10.197% -9.6789%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe ```
- Loading branch information