Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New spinlocks in C++ w/tests #2266

Merged
merged 4 commits into from
Aug 26, 2024
Merged

Conversation

MagnusS
Copy link
Member

@MagnusS MagnusS commented Aug 24, 2024

TTAS spinlock implementation using C++ atomics. Implements C++11 named requirements BasicLockable, Lockable and TimedLockable.

Removes scoped_spinlock as we can now use std::lock_guard instead.

Updates all spinlock_t users to use the new class.

Includes unit test (w/multithreading) and integration test to check that the delays work.

This also fixes the alignment issue seen in #2251 (comment), as we're no longer aligning a typedef inside a class. I didn't add back alignas to the class as I'm not sure if it speeds anything up + it is Intel only. I suggest we consider adding it back if we have a concrete usecase we can benchmark.

I also removed the empty lock/unlock stubs that were added when SMP=off so now spinlocks always behave the same. If this leads to deadlocks anywhere we can fix it in the relevant code. This didn't happen in any of the tests I ran.

Copy link
Contributor

@alfreb alfreb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to its own header, but besides that it looks excellent. I'm assuming all the tests are passing. If you prefer to merge as is, go ahead, but we'll then have to work on breaking apart the smp_utils later.

EXPECT(val == 2);
}

CASE("multithreaded - extended")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is beautiful 😍
I would have liked to see one with a 128 - 256 threads, just to make sure we max out the contention with the maximum cores qemu can provide, but we can add that later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs as a unit test so I didn't want it to take too long or fail on less powerful machines. If I remove the locks I get the wrong value, so it seems to be enough to relatively reliably detect if the lock is broken (at least on my machine).

It could be interesting to run multiple cores in the integration test, but then we need SMP to work again first! :)

api/smp_utils Outdated
// requires a write.
while (lock_.exchange(true, std::memory_order_acquire)) {
while (lock_.load(std::memory_order_relaxed)) {
_mm_pause();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pause intrinsic - nice, I didn't know about this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mm_pause() is purely x86 intrinsic, and won't work on arm where you have __yield() instead. It's basically a hint to the internal CPU scheduler, to aid hyperthreading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes - then I guess we still have to include an architecture check here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about ways to implement this check. To try to avoid platform-specific checks outside the architecture specific include files, perhaps we could introduce an os::Arch::cpu_relax for each platform? Here: https://github.com/includeos/IncludeOS/blob/master/api/arch/x86_64.hpp#L19

We could then implement it as _mm_pause for x86 and skip the aarch64 implementation for now (so it would give a build error later) or just add it as an untested call to __yield.

What do you think @alfreb @fwsGonzo ?

Copy link
Contributor

@alfreb alfreb Aug 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me if the semantics of yield and pause are similar enough that they can be used in the same place in the same algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand - we don't actively support arm yet, and this lock should work just fine on arm as well without the intrinsic, so for me it's fine to do this later as well.

// Optimised for uncontended locks as it starts with exchange, which
// requires a write.
while (lock_.exchange(true, std::memory_order_acquire)) {
while (lock_.load(std::memory_order_relaxed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so this can be memory_order_relaxed because if it returns false (lock is available), we still get the strong guarantee that all concurrent writes to the lock completed, and that the lock is still available, from the subsequent exchange in the outer loop. Looks excellent - and it will likely get away with much fewer memory fences under heavy contention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put this in its own file- it will remove one dependency (the arch include) and move us towards more granular headers. api/spinlock.hpp or, if you prefer to move more stuff, make smp a directory and move this to api/smp/spinlock.hpp.

TTAS spinlock implementation using C++ atomics. Implements C++11 named
requirements BasicLockable, Lockable and TimedLockable.

Adds os::Arch::cpu_relax - currently only implemented for x86.

Removes scoped_spinlock as we can now use std::lock_guard instead.

Updates all spinlock_t users to use the new class.
@MagnusS
Copy link
Member Author

MagnusS commented Aug 26, 2024

Thanks for the reviews. I've updated to use os::Arch::cpu_relax instead of _mm_pause directly, which should allow us to easily add support for other architectures later. I also added a timeout for unittests and an EXPECT case I missed. I'll merge this for now -- then we can split it in .hpp files later -- and also decide what to do with minimal_barrier_t.

@MagnusS MagnusS merged commit 41e3170 into includeos:v0.16.0-release Aug 26, 2024
@MagnusS MagnusS deleted the spinlocks branch August 26, 2024 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants