-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid memory ordering in thread park / unpark #53366
Comments
I think that particular What really matters is that we have read a |
Could you explain why you think this is true? The park / unpark API specifically needs to provide acq / rel semantics to the user of the API. When a call to Specifically, that line in Line 1057 in 6e5e63a
That would |
So the point is specifically the case where this CAS fails because the value is already match self.inner.state.compare_exchange(EMPTY, NOTIFIED, SeqCst, SeqCst) {
Ok(_) => return, // no one was waiting
Err(NOTIFIED) => return, // already unparked <--
Err(PARKED) => {} // gotta go wake someone up
_ => panic!("inconsistent state in unpark"),
} That is interesting indeed. However, notice that if that CAS failed, it performed just a read. So adding an extra read in However, it is possible that here |
Looking at the formulation in http://plv.mpi-sws.org/scfix/paper.pdf, SC accesses have some extra requirements for their ordering relative to each other but do not seem give extra synchronization otherwise. However, I am no longer sure what you think the problem is. The line you pointed at, if it actually performs a write, will write the However, the line I pointed at above is somewhat interesting -- and your comments in the other discussion sounds like that is what you are worried about:
The only situation I can imagine where there is a difference is something like the following:
Now thread 0 never synchronized with thread 2. However, notice that thread 0 is also never waiting for thread 2 -- so, it is equally likely that the execution proceeds as follows:
So, there is some kind of race here when a thread wakes up while a redundant unpark happens -- depending on the outcome of the race, the thread has a token remaining (state But anyway no amount of additional ordering can fix this, this is a more fundamental point in the algorithm: How to handle redundant unparks? If these are allowed, we inherently have a race because whether an unpark is redundant or not is racy. And as long as a redundantly unparking thread does not perform any writes (or barriers), there is no way it could |
Correct me if I'm wrong, but if there a 2 threads doing the unparking, then there's no way the parked thread can know which thread unparked it. So, regardless if every call to unpark synchronized with parked thread, there still needs to be another form of synchronization to tell which thread which thread called unpark. That is, the synchonization here is only needed when there is one unparker, which it currently works fine for? |
There's still a potential race. If Thread 2 reads NOTIFIED, it knows that Thread 0 will unpark "in the future"; however, without sequential consistency, that doesn't mean Thread 2's prior writes will necessarily be visible to Thread 0 when it does so. If Thread 0 then goes to sleep again, it may never see Thread 2's writes unless someone else happens to unpark it. However, as far as I know, using SeqCst here is incorrect, as it doesn't guarantee any ordering with respect to other accesses that are not SeqCst. It does work if the other accesses happen to be SeqCst: that is, if Thread 2 does a SeqCst write to some other variable before calling unpark, and Thread 0 does a corresponding SeqCst read after returning from park, then the use of SeqCst for the compare-exchange should guarantee that Thread 0 reads what Thread 2 wrote. This is something that wouldn't work with just AcqRel. But the desired semantics for park/unpark are stronger. When Thread 2 calls unpark, it's supposed to guarantee that Thread 0 will 'eventually' return from park and at that point be able to see Thread 2's prior writes, even with a relaxed or acquire load. That guarantee is upheld in the 'successful CAS from EMPTY to NOTIFIED' case, but not the 'unsuccessful CAS because it was NOTIFIED already' case. |
Ok, I see the desire but I'm having trouble understanding how that could be useful. Do you have a concrete example where it's needed? |
For example, Thread 2 pushes something onto a lock-free queue, then unparks Thread 0; Thread 0 is running an infinite loop that alternates between calling If the memory model is weak enough, I think it should be needed for almost any use of park/unpark. Even when a mutex is involved, it seems that mutexes don't necessarily guarantee sequential consistency (!) … So, for example, even if Thread 2 locks the mutex to write some data, and Thread 0 locks the mutex before reading it, in theory Thread 0 could acquire the mutex before Thread 2 and not see the data, even if Thread 2, after unlocking the mutex, then observes Thread 0 being in NOTIFIED state and thus supposedly not having locked it yet! |
But the queue would have to be designed so that writer synchronizes with the reader, otherwise you would have a race when Thread 0 was unparked by Thread 1 and then starts reading the queue while Thread 2 is writing to it? |
I agree on the analysis of the problem. Not sure about the desired API. ;) However, this can only be a problem in cases where a thread calls |
Of course you still need the lock free queue to be synchronized. The issue is that the |
@carllerche So what is an example of a data structure that speculatively calls |
I don't understand your question. Are you asking if, given two threads A, B, where A may be blocked in |
Essentially, yes. And also I am asking if you agree with my analysis above that this is the only case where the issue we are discussing here surfaces. |
@RalfJung that behavior is exactly the roll of park / unpark. That is why |
It doesn't help that the documentation for
First of all, it's not a spinlock; it properly blocks the current thread in the OS rather than spinning. So it would be better to say "mutex". But it doesn't really have the semantics of any kind of lock. It's closest to an auto-resetting event, except that it allows for spurious wakeups. It's also similar to a condition variable, and indeed the implementation uses a condition variable. However, like an event but unlike a condition variable, it guarantees that if the target thread is currently awake (not parked) when someone unparks it, the thread will immediately return from its next park. For the record, just as an explanation of how events can be used like condition variables, as a "please wake up and check this other data" signal: This avoids the need for a mutex. A condition variable must be combined with an accompanying mutex to be used correctly: if the thread is already awake, signaling it will have no effect, but it could be awake yet just about to go back to sleep, having already checked the other data – perhaps even having already started executing, e.g., |
@carllerche I see. It would still be interesting to have a concrete simplified example, I find it hard to imagine in the abstract. Anyway, next question: Assuming thread A "speculatively" wakes up thread B, i.e. A doesn't actually know if B is parked. How can anything ever exploit the fact that this synchronizes? There is an inherent race here, and it is always possible that A is already completely unparked before A gets to do anything. In that case, no synchronization can possibly happen. I believe this is a possibility in every case where the missing ordering you are talking about could be relevant -- and hence I believe that adding a stronger ordering here cannot actually ever reliably incur stronger synchronization. IOW, you are saying the problem is something like the following situation:
The issue, if I understand it correctly, is a missing happens-before between A and B in this case. However, in every such situation, we always also have the following possible behavior:
In this case, clearly there is no happens-before between A and B. So if a missing happens-before is a problem in the first situation, then the algorithm is buggy even if we fix this problem, because the second situation remains a possibility. Therefore, this is not even actually a problem. I do not see any way in which adding a happens-before in the first situation could possibly fix anything. If you are still convinced that there is a problem, I think it would be extremely helpful for this discussion if you could sketch a concrete user of park/unpark, and describe how adding this synchronization edge you are asking for helps that user. The discussion in the abstract is clearly not leading us anywhere.
Certainly not "all"; structures like It would be interesting to see what Java does here in terms of memory orderings. But so far, I am unconvinced that the missing happens-before edge here can ever be the cause of a problem. |
@RalfJung If Thread B is already 'completely unparked' when Thread A calls |
To clarify – the failure case is not generally that Thread B reads some sort of inconsistent data, but rather that it doesn't notice Thread A's writes at all, assumes the work isn't done yet, and goes back to sleep waiting for it; but Thread A has already called edit: I guess it doesn’t technically meet the definition of a deadlock; I just mean that it gets stuck :) |
Thanks, I see the problem now, so something like use std::sync::atomic::{AtomicBool, Ordering};
use std::thread::{current, spawn, park};
static FLAG: AtomicBool = AtomicBool::new(false);
fn main() {
let thread_0 = current();
let _thread_1 = spawn(move || {
FLAG.store(true, Ordering::Relaxed);
thread_0.unpark();
});
let thread_0 = current();
let _thread_2 = spawn(move || {
thread_0.unpark();
});
loop {
park();
if FLAG.load(Ordering::Relaxed) {
return;
}
}
} should be guaranteed to exit, but it currently isn't? Would this work, or am I forgetting something?
unpark:
|
@comex Actually I spotted a bug, I think it should be
|
Yep, that looks right.
I'm actually not sure! In general, the redundant unpark case will look something like: (1) Thread 2 unparks Thread 1, swaps from (If the swap in (2) happens earlier, it's the same scenario but swapping "Thread 2" and "Thread 3". If it happens later, then the state stays Any individual atomic variable is guaranteed to have a total order of modifications. And clearly, the write in (2) must happen before the write in (3). Thread 1 is guaranteed to see Thread 3's writes if the write in (2) happens before the read in (3). So the only problem is if the write in (2) occurs in between the read in (3) and the write in (3). Since (3) is doing an atomic swap, one might expect it to be impossible that anything could come between the read and the write. However, I am not completely sure whether that's the case when the write doesn't actually modify the value (because the variable already had that value). I need to do some spec spelunking. Separately, your original version has a bug where |
On second thought (and after reviewing cppreference's summary of modification order a bit more), I'm pretty sure it is sound with respect to that issue. On the other hand, it seems a bit suboptimal that your revised version has park do two CASes in a row. It might be better to start with the existing implementation and make I suppose this extra swap could be a swap-to-same-value like in your example. If you want to be paranoid about the issue I was worried about, there could be multiple different values that correspond to NOTIFIED, and it could swap from one to the other. However, this is probably unnecessary. |
Looking at the revised version of park now I realise it's basically just the original version with the swap added like @stjepang did in his PR, the only difference being locking the mutex later. So, assuming the usual case of unpark not being called at the same time as park, there's 1 CAS if notified and 2 when empty (swap never called). If you removed the early out you have 2 CASes when notified (the explicit CAS and the swap) and 1 when empty but I guess it's better not to optimise for this case because it's going to sleep anyway. |
And doing an extra CAS before wait which is only needed because the mutex is locked later. So @stjepang's version of park is probably best. |
It looks like the troublesome store was added just recently in #51290 to avoid spurious wakeups, but those are supposed to be allowed anyway. |
I've created a PR to fix this here and I would appreciate a code review if anyone has the chance, thanks. |
I think the example above was broken already before that PR. Namely, if the second spawned thread does |
Fix `thread` `park`/`unpark` synchronization Previously the code below would not be guaranteed to exit when the second unpark took the `return, // already unparked` path because there was no write to synchronize with a read in `park`. EDIT: doesn't actually require third thread ``` use std::sync::atomic::{AtomicBool, Ordering}; use std::thread::{current, spawn, park}; static FLAG: AtomicBool = AtomicBool::new(false); fn main() { let thread_0 = current(); spawn(move || { thread_0.unpark(); FLAG.store(true, Ordering::Relaxed); thread_0.unpark(); }); while !FLAG.load(Ordering::Relaxed) { park(); } } ``` I have some other ideas on how to improve the performance of `park` and `unpark` using fences, avoiding any atomic RMW when the state is already `NOTIFIED`, and also how to avoid calling `notify_one` without the mutex locked. But I need to write some micro benchmarks first, so I'll submit those changes at a later date if they prove to be faster. Fixes #53366 I hope.
I'm pretty sure that the memory orderings uses in the thread park / unpark are incorrect. They happen to work on x86 due to the generated ASM being stronger than the spec, but it will doubtfully work on other architectures.
There are a few locations where a
store
is used withSeqCst
assuming that this would "acquire", but this is not the case. For example hereI discovered this issue when a PR was provided to copy the Rust code into Tokio.
See the discussion here
The text was updated successfully, but these errors were encountered: