Skip to content

Commit

Permalink
Added optimized x86 atomic_fence for gcc-compatible compilers.
Browse files Browse the repository at this point in the history
On x86 (32 and 64-bit) any lock-prefixed instruction provides sequential
consistency guarantees for atomic operations and is more efficient than
mfence.

We are choosing a "lock not" on a dummy byte on the stack for the following
reasons:

 - The "not" instruction does not affect flags or clobber any registers.
   The memory operand is presumably accessible through esp/rsp.
 - The dummy byte variable is at the top of the stack, which is likely
   hot in cache.
 - The dummy variable does not alias any other data on the stack, which
   means the "lock not" instruction won't introduce any false data
   dependencies with prior or following instructions.

In order to avoid various sanitizers and valgrind complaining, we have to
initialize the dummy variable to zero prior to the operation.

Additionally, for memory orders weaker than seq_cst there is no need for
any special instructions, and we only need a compiler fence. For the relaxed
memory order we don't need even that.
  • Loading branch information
Lastique committed Nov 25, 2021
1 parent fb8ae3b commit b7d647f
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions include/oneapi/tbb/detail/_machine.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,17 @@ using std::this_thread::yield;
#endif

static inline void atomic_fence(std::memory_order order) {
#if defined(__GNUC__) && (__TBB_x86_64 || __TBB_x86_32)
if (order == std::memory_order_seq_cst)
{
unsigned char dummy = 0u;
__asm__ __volatile__ ("lock; notb %0" : "+m" (dummy) :: "memory");
}
else if (order != std::memory_order_relaxed)
{
__asm__ __volatile__ ("" ::: "memory");
}
#else
#if _MSC_VER && (__TBB_x86_64 || __TBB_x86_32)
if (order == std::memory_order_seq_cst ||
order == std::memory_order_acq_rel ||
Expand All @@ -95,6 +106,7 @@ static inline void atomic_fence(std::memory_order order) {
}
#endif /*_MSC_VER && (__TBB_x86_64 || __TBB_x86_32)*/
std::atomic_thread_fence(order);
#endif
}

//--------------------------------------------------------------------------------------------------
Expand Down

0 comments on commit b7d647f

Please sign in to comment.