-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative lockless Mutex implementation (2nd try) #11520
Conversation
Couldn't we provide something like |
Maybe as a macro; anyway, it doesn't help with the user incorrectly calling it recursively unless you ban all recursive locking (which is limiting, since it prevents composition). That said, I'm not sure whether we should detect recursion, or even have the option of allowing recursion, and it's mostly independent of the mutex implementation algorithm. |
This looks promising, thanks for implementing this! I'll take a look at this tomorrow. |
In general, I'd prefer a cleaner history, so could you replace the original mutex commit with your mutex commit? Dealing with the fallout will probably be similar. |
@@ -0,0 +1,111 @@ | |||
/* Copyright (c) 2010-2011 Dmitry Vyukov. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This license is the wrong license if you've re-implemented it.
I'm certainly no lock-free expert, so to me this queue looks correct, but I don't consider myself qualified to officially say so. Do you know of any external implementations of this flavor of queue that I could look at as well? Any literature/papers would also be a good reference point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it has been implemented by drastically modifying the previous version, so whether it's a derived work or not could be debatable.
The conditions say that the copyright notice must not be removed if it is a derived work, so the simplest course of action is to leave it as is, and add an extra copyright line if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is implemented from memory, I'm much more uncomfortable than using the previous version. Why did you re-implement the queue? Was it to remove the Thread::yield_now()
in the inconsistent case?
// have or are about to take the OS mutex using a blocking lock call. | ||
// | ||
// It is now as fair as the OS mutex allows, even when mixing green and native tasks, | ||
// since native threads will queue like green tasks, if any green task is queued. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a lot more about this explanation here. There's still no mention of the general protocol or the general idea of why native tasks are blocked on the queue. This also doesn't justify why it's ok to block native tasks on the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise native tasks will either always have priority over green tasks, or green tasks will always have priority over native tasks, depending on whether unlock prefers to unlock or dequeue when lockers > 1 && queue_size != 0.
First of all, &mut guarantees that it is not aliased AT ALL, that's the whole point of it! If aliased &mut between tasks are allowed in general then memory safety and freedom from data races are both broken. Also, it's incorrect that each thread will have only one &mut to the Mutex. Let's say you have a MutexArc implemented using Mutex and make a clone of it. Now, you lock the first one, which returns the guard object, which must ultimately contain a LockGuard and thus an &mut to the Mutex. Then you recursively attempt to lock the clone: at this point that recursive lock operation is using an &mut which is the same as the one in the LockGuard of the first lock. This breaks Rust semantics.
The Mutex is safe to use (except for destroy, which is unsafe and maybe shouldn't be there in the first place). Let's say you have tasks and you want each to make three beeps using the computer speaker using a distinctive frequency, by calling an hypotetical speaker::beep(frequency: uint) three times. You don't want the beeps to interleave, so you put a Mutex in an Arc, and use that to make sure it doesn't happen. This is a perfectly valid and memory-safe use for Mutex that doesn't need unsafe code.
No, because if you are locking the Mutex as part of a MutexArc, for instance, then the MutexArc is clonable so you can do the recursive locking by locking the other clone. If you are a "naked" Mutex where other paths cannot be created, you can't send have the tasks simultaneously accessing it, which makes it pointless to use one.
Cell and Atomic (let's assume the latter exists) are both mechanisms to allow to mutate aliased memory, and the only difference is that Cell either must not be allowed or will result in undefined values if the aliases are shared between tasks. Cell is thus strictly more restrictive in its pre-conditions that Atomic, so it makes no sense for Cell to require a self pointer that provides less guarantees than Atomic. And of course, it breaks the non-aliasing property of &mut as already argued. That said, the fix to the mutability of self is mostly separate from the rest, but it conflicts with the rest of the code in a non-trivial way (because lock as implemented no longer borrow checks with &mut self), so I think introducing Mutex with the correct &self pointers is simpler that doing it in two commits. |
Correct. That is not the only point of it. For atomics we are using it to indicate "this is being mutated"
Correct. That is why it's unsafe to create multiple
The point that you're missing is that it's unsafe to clone it.
So does sharing the memory of the mutex itself, but that's what must be done.
Correct. It must be unsafely shared. This is not a first-class primitive for everyone everywhere to use. This is a primitive to build other concurrency primitives. For example, a
Again, cloning a MutexArc is unsafe in terms of the usage of the mutex. If you want to remove |
I think having multiple &mut aliases is not merely unsafe, but needs to be defined as resulting in undefined behavior if they are actually used, and thus unsafe code must not do so, just like unsafe code must not transmute random integers or zero to &T or ~T and access those pointers, and so on. The alternative of having multiple &mut aliases not be undefined behavior means that Rust programs will be slower because some optimizations cannot be applied (specifically, telling LLVM using TBAA or another mechanism that there cannot be any aliases and thus that memory accesses can be moved with no restrictions and that loads can be freely merged or duplicated). Anyway, it's pretty easy to remove all the changes from & -> &mut self if desired. |
Unfortunately, I'm not sure if I'll have time to do further work myself on this. I believe the current code in the pull request should be an acceptable design and mergeable, except for potential bugs, cosmetic issues and the fact that things like the &mut change and Mutex/Cond split could perhaps be split in separate commits (but at the cost of having more "churn" in the series as things are written and then changed by subsequent commits). We might want to switch to 64-bit atomics on 32-bit and perhaps 128-bit atomics on 64-bit, but that can be done later, since the task limit won't be hit anyway without non-default stack sizes that make it impossible to call C functions. Another thing to do is to replace critical sections and events with SRW locks and condition variables on Windows Vista+: note that both of those structures are pointed-sized and statically initializable just like the atomic uints in the current code, so it's trivial to switch at runtime between the current code on Windows XP and reinterpreting those fields as SRW locks or condition variables for Windows Vista and later. Finally an rwlock is probably needed as well, and should be possible to implement by extending the algorithm in the code to use a (queue_size, readers, writers) tuple and putting the read mode in the queued nodes (64/128-bit atomics will allow to fit the triple). Native rwlocks don't always support a truly fair FIFO mode (at least the one in Linux glibc NPTL either lets all readers or all writers get ahead), so it might be a good idea to let the user specify what they prefer or if they want the rwlock to be FIFO, and avoid blocking native tasks on the native rwlock at all if the user selects FIFO. |
Would you like me to take it from here? |
Yes, if you'd like, thanks. Opened issue #11583 for mutability in atomics (I used an issue since the code is not complete, since it doesn't fix the actual intrinsics). |
Thanks again for all your work! |
Closing for now, I'll reopen once I've gotten through all the rebasings |
I got around to benchmarking this code, and here's the results I got. The The caveat here is that the green numbers aren't too significant. We currently need to seriously tune the green schedulers for workloads like this because the numbers all over the place. The numbers provided here are all run with In this table, these are all runtimes, so lower is better. All tests were run in an arch linux VM with 4 cores.
My conclusions from this are that doing pretty much any atomic other than With this in mind, I'm going to abandon this implementation and go back to the original version. Once the green scheduler is tuned, we can revisit to see what the impact is for green threads in finer detail. |
@alexcrichton Thanks for the measurements. That's a little disappointing since I liked the elegance of this solution. |
Added an optimization to the cmpxchg loops and benchmarks, see https://github.com/bill-myers/rust/tree/mutex-lockless for the updated code. |
…nt-levels, r=xFrednet used_underscore_bindings: respect lint levels on the binding definition Fixes rust-lang#11520 Fixes rust-lang#947 Also ignores usages of `PhantomData` changelog: none
This is yet another attempt to create a satisfying implementation of Mutex (first in #11462, second in #11509), created since I believe I have managed to devise a working lockless algorithm.
I think the algorithm is correct, but please check; if it is, then this has none of the drawbacks of the previous pulls.
This code is based on Alex Crichton's code, but replaces the "held" AtomicBool variable with a more sophisticated MutexState struct which consists of an atomic (lockers, queue_size) pair.
Also, we replace the intrusive MPSC queue implementation with one using list reversal, to remove the need to yield there and the need to store a stub node in the Mutex.
Features:
Non-features:
Non-issues:
The only known issue is that as implemented now this only works with up to 2^16 tasks on 32-bit and up to 2^32 tasks on 64-bit, since it needs to store two variables with a task count in a single atomic variable.
This shouldn't be a problem for two reasons:
Currently Rust uses 2MB stacks by default, so programs that don't specify the stack size manually already cannot trip this limitation.