Added optimized x86 atomic_fence for gcc-compatible compilers.

On x86 (32 and 64-bit) any lock-prefixed instruction provides sequential consistency guarantees for atomic operations and is more efficient than mfence. We are choosing a "lock not" on a dummy byte on the stack for the following reasons: - The "not" instruction does not affect flags or clobber any registers. The memory operand is presumably accessible through esp/rsp. - The dummy byte variable is at the top of the stack, which is likely hot in cache. - The dummy variable does not alias any other data on the stack, which means the "lock not" instruction won't introduce any false data dependencies with prior or following instructions. In order to avoid various sanitizers and valgrind complaining, we have to initialize the dummy variable to zero prior to the operation. Additionally, for memory orders weaker than seq_cst there is no need for any special instructions, and we only need a compiler fence. For the relaxed memory order we don't need even that.
uxlfoundation · Nov 25, 2021 · b7d647f · b7d647f
1 parent fb8ae3b
commit b7d647f
Showing 1 changed file with 12 additions and 0 deletions.
diff --git a/include/oneapi/tbb/detail/_machine.h b/include/oneapi/tbb/detail/_machine.h
@@ -84,6 +84,17 @@ using std::this_thread::yield;
 #endif
 
 static inline void atomic_fence(std::memory_order order) {
+#if defined(__GNUC__) && (__TBB_x86_64 || __TBB_x86_32)
+    if (order == std::memory_order_seq_cst)
+    {
+        unsigned char dummy = 0u;
+        __asm__ __volatile__ ("lock; notb %0" : "+m" (dummy) :: "memory");
+    }
+    else if (order != std::memory_order_relaxed)
+    {
+        __asm__ __volatile__ ("" ::: "memory");
+    }
+#else
 #if _MSC_VER && (__TBB_x86_64 || __TBB_x86_32)
     if (order == std::memory_order_seq_cst ||
         order == std::memory_order_acq_rel ||
@@ -95,6 +106,7 @@ static inline void atomic_fence(std::memory_order order) {
     }
 #endif /*_MSC_VER && (__TBB_x86_64 || __TBB_x86_32)*/
     std::atomic_thread_fence(order);
+#endif
 }
 
 //--------------------------------------------------------------------------------------------------