-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New spinlocks in C++ w/tests #2266
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this to its own header, but besides that it looks excellent. I'm assuming all the tests are passing. If you prefer to merge as is, go ahead, but we'll then have to work on breaking apart the smp_utils later.
EXPECT(val == 2); | ||
} | ||
|
||
CASE("multithreaded - extended") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is beautiful 😍
I would have liked to see one with a 128 - 256 threads, just to make sure we max out the contention with the maximum cores qemu can provide, but we can add that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs as a unit test so I didn't want it to take too long or fail on less powerful machines. If I remove the locks I get the wrong value, so it seems to be enough to relatively reliably detect if the lock is broken (at least on my machine).
It could be interesting to run multiple cores in the integration test, but then we need SMP to work again first! :)
api/smp_utils
Outdated
// requires a write. | ||
while (lock_.exchange(true, std::memory_order_acquire)) { | ||
while (lock_.load(std::memory_order_relaxed)) { | ||
_mm_pause(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pause intrinsic - nice, I didn't know about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_mm_pause()
is purely x86 intrinsic, and won't work on arm where you have __yield()
instead. It's basically a hint to the internal CPU scheduler, to aid hyperthreading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes - then I guess we still have to include an architecture check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about ways to implement this check. To try to avoid platform-specific checks outside the architecture specific include files, perhaps we could introduce an os::Arch::cpu_relax
for each platform? Here: https://github.com/includeos/IncludeOS/blob/master/api/arch/x86_64.hpp#L19
We could then implement it as _mm_pause
for x86 and skip the aarch64 implementation for now (so it would give a build error later) or just add it as an untested call to __yield
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me if the semantics of yield and pause are similar enough that they can be used in the same place in the same algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand - we don't actively support arm yet, and this lock should work just fine on arm as well without the intrinsic, so for me it's fine to do this later as well.
// Optimised for uncontended locks as it starts with exchange, which | ||
// requires a write. | ||
while (lock_.exchange(true, std::memory_order_acquire)) { | ||
while (lock_.load(std::memory_order_relaxed)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so this can be memory_order_relaxed
because if it returns false (lock is available), we still get the strong guarantee that all concurrent writes to the lock completed, and that the lock is still available, from the subsequent exchange in the outer loop. Looks excellent - and it will likely get away with much fewer memory fences under heavy contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put this in its own file- it will remove one dependency (the arch include) and move us towards more granular headers. api/spinlock.hpp
or, if you prefer to move more stuff, make smp
a directory and move this to api/smp/spinlock.hpp
.
TTAS spinlock implementation using C++ atomics. Implements C++11 named requirements BasicLockable, Lockable and TimedLockable. Adds os::Arch::cpu_relax - currently only implemented for x86. Removes scoped_spinlock as we can now use std::lock_guard instead. Updates all spinlock_t users to use the new class.
Thanks for the reviews. I've updated to use |
TTAS spinlock implementation using C++ atomics. Implements C++11 named requirements BasicLockable, Lockable and TimedLockable.
Removes scoped_spinlock as we can now use
std::lock_guard
instead.Updates all spinlock_t users to use the new class.
Includes unit test (w/multithreading) and integration test to check that the delays work.
This also fixes the alignment issue seen in #2251 (comment), as we're no longer aligning a
typedef
inside a class. I didn't add backalignas
to the class as I'm not sure if it speeds anything up + it is Intel only. I suggest we consider adding it back if we have a concrete usecase we can benchmark.I also removed the empty lock/unlock stubs that were added when SMP=off so now spinlocks always behave the same. If this leads to deadlocks anywhere we can fix it in the relevant code. This didn't happen in any of the tests I ran.