-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
async_std::sync::Mutex is too heavy #362
Comments
Perhaps we could reuse I'd accept a PR that switches from |
I did a few experiments, and I think I got an idea that can reduce Here's the brief list of my plan. If you think it looks alright, I'll proceed to implementation and benchmark the performance differences. Step 1: Replace Slab with a double linked list
Step 2:
Step 3:
After all steps the size of |
Do we allocate nodes for the linked list on the heap? If so, I am worried that these allocations will be slower than using the slab. Another possibility is to lazily allocate the slab on the heap only when it is actually needed (i.e. there is contention on the async mutex). We could even put the address of the slab into |
The linked list nodes are allocated on the heap, for the benefits of a stable address. Linked list has an extra benefit of guaranteed From my initial experiments, performance actually goes up slightly by using linked lists, despite the allocation. |
Interesting! Okay, I'm convinced :) On another thought, there must be tricks for optimizing allocations in the typical case so I'm not worried as much anymore... I'd be happy to review a PR that replaces Also, it would be great to have your experiments as microbenchmarks using |
If you just use futures_intrusive::sync::Mutex you get your parking-lot backed small mutex without any need for heap allocations - and even without the need to write the code yourself :-) |
Btw: I would be super careful with pure Spinlocks in userspace. If the OS schedules away the thread which holds the lock you will start wasting a ton of resources and enter the wonderful world of priority inversions. IMHO they are a tool for Kernels and embedded applications. For normal userspace applications one should fallback to a Kernel wait primitive after some spins, or just avoid them in general. |
I looked at the code, I think the basic idea is the same. However using that will pull in a large amount of dependencies, which this project tries to avoid. My PR also does better than the code you refers to, so I don't see the point switching to that. |
Yes, it's literally a HUGE amount of dependencies 😀 |
That's true, and my benchmarks show that naively changing everything to spinlock causes 4x performance degradation. However after changing from Slab to a linked list, we only spend very small amount of time in the critical region, and benchmark shows a similar (and slightly better) performance when compared with the current state. FYI, crossbeam's Spinlock also has a back off policy which will yield to other threads if it cannot acquire the lock in a few spins. |
If you see #252 you will notice that parking-lot is explicitly removed as a dependency from this project. |
Reducing deps is all well and good, but it doesn't make much sense to end up writing and maintaining equivalent code here when the potential dependency is of good quality and maintained. |
For reference, these are our current
This is
This is
This is
There is some variance to these numbers between consecutive runs, though. |
@stjepang Are the sources of these benchmarks available somewhere? Is it the ones of #370? If yes then I think the "contention" benchmarks do only a part of what the name says. It is only about contention within a single-thread, but cross-thread synchronization will never have an impact. In that kind of benchmark likely Edit: Oh, now I see they are here |
If I'm reading the code correctly, it looks like locking Tokio's The idea is that the future returned by |
@stjepang Can you add in the benchmarks using your |
I've noticed that async-std uses
std::sync::Mutex
in quite a few places. It's well known that parking_lot offers much better performance, and more noticeably space saving, compared to the std mutex, thus enabling very fine-grained locks.The current
async_std::sync::Mutex
and other few places of async_std seems to be usingstd::sync::Mutex
, which has quite a few allocations and is very heavy-weight. Is there any existing discussion about performance and code bloat impact of switching over to parking_lot?There is also a probability that we can use ideas similar to parking_lot to create a very cheap and light-weight (i.e. only a few bytes, no extra allocation) async Mutex that optimises for non-contended cases.
The text was updated successfully, but these errors were encountered: