-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor sync::Once #65719
Refactor sync::Once #65719
Conversation
r? @KodrAus (rust_highfive has picked a reviewer for you, use r? to override) |
Thinking a bit more about this, I don't think the commit 'Don't mutate waiter nodes' is the smartest move. Let me work on it a little more before anyone takes a look to review (will probably take me a day). |
Added another commit. In Waiter use interior mutability and use no raw pointerThe existing code first created a mutable reference to a Two options we can choose from instead:
This commit goes the route with interior mutability, and minimizing the use of raw pointers. cc @RalfJung You may be interested in what I think are two cases of unsoundness in
|
From what I can see, this is not only a refactoring, but also a fix for UB (aliasing of a mutable reference), very good catch!
I have updated my own implementation of |
Yes, that is possible. I choose to also switch it back to
You probably also considered the case where there are no waiting threads and the pointer part is 0. If I understand it right you propose that both RUNNING (0x2) and any pattern not matching 0, 1, 2, or 3 correspond to RUNNING? That could work, but I don't yet see an advantage to warrant the subtlety.
Then we mostly agree. What did you think about the argument that |
Your trick to use Also using the names |
I would also argue that using a raw let mut curr: *const Waiter = ...;
while !curr.is_null() {
let thread = unsafe {
let next = (*curr).next;
let thread = (*curr).thread.take().unwrap();
(*curr).signaled.store(true, Release);
thread
}
thread.unpark();
} looks cleaner to me, alternatively you could also create a temporary reference inside the unsafe block, of course.
Sorry, I wrote this from memory and have confused some things.
I believe the memory model does not permit the reordering of acquire loads among themselves, as it would violate the |
I am not married to the idea, but wanted to give it a try. Not smart of me, as I also wanted this PR to be mostly uncontroversial. I do believe the lifetimes work out, but will just as easily revert that part if other reviewers feel the same way. |
I was pointed by @matklad to https://internals.rust-lang.org/t/idea-replace-seqcst-with-seqcst-acq-rel-acqrel/11028/3. Edit: reading that post again, and the one right after it, I also do believe Acquire should be enough. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
@oliver-giersch I have reverted the part about using normal references, now things are back to raw pointers as at the start of this PR. And I have changed the stuff about how two loads might need to be SeqCst. |
Looks great to me, a rather modest refactoring that fixes at least one clear instance of UB, I'd think this would be a no-brainer if not for the relaxed orderings. |
Because it has to acquire the nodes from the waiting threads before walking the queue to wake them up. |
Yeah I believe you are correct. It's good to these exercises from time to time. |
Ping from triage. |
r? @Amanieu |
(moved to #65796) |
I have moved stuff around so there is no more mutation to |
I can't at a glance judge the functional correctness of this, but I'd personally still prefer raw pointers of interior mutability, but well, I won't fight the one who does the actual work here. ;) |
There is something at https://github.com/rust-lang/rust/pull/65719/files#diff-80754b8db8699947d7b2a43a9cc17dedR169, but I'll add more.
Thank you! I don't mind which one is used, but I have some trouble picturing how the code should like like using raw pointers :-) |
There is one more thing I was playing with, but didn't include until now. For the The acquire is currently necessary because the Now what if we wrap Is that a correct conclusion, that we would not need to acquire data if we don't read it anymore? Even if it lives on our stack etc.? Is it true that using a |
(Stable) Rust doesn't support "consume", and for a good reason -- nobody knows how to properly specify it. Let's just stick with the sane and well-understood release/acquire orderings. |
I don't want to suggest using it here. But I was curious about if we wouldn't need to do an acquire if we don't do a read from the non-atomic fields of |
That sentence contains too many negations for me to be able to parse it.^^ |
Let's forget about it for now. It was something I was wondering about, but not planning for this PR. I have amended the last commit to keep the comments in
|
The code LGTM. There's just a few formatting issues that a quick rustfmt can fix. |
@Amanieu Thank you for reviewing. I have to admit this is my first time running rustfmt. |
@bors r+ |
📌 Commit b05e200 has been approved by |
…nieu Refactor sync::Once `std::sync::Once` contains some tricky code to park and unpark waiting threads. [once_cell](https://github.com/matklad/once_cell) has very similar code copied from here. I tried to add more comments and refactor the code to make it more readable (at least in my opinion). My PR to `once_cell` was rejected, because it is an advantage to remain close to the implementation in std, and because I made a mess of the atomic orderings. So now a PR here, with similar changes to `std::sync::Once`! The initial goal was to see if there is some way to detect reentrant initialization instead of deadlocking. No luck there yet, but you first have to understand and document the complexities of the existing code :smile:. *Maybe not this entire PR will be acceptable, but I hope at least some of the commits can be useful.* Individual commits: #### Rename state to state_and_queue Just a more accurate description, although a bit wordy. It helped me on a first read through the code, where before `state` was use to encode a pointer in to nodes of a linked list. #### Simplify loop conditions in RUNNING and add comments In the relevant loop there are two things to be careful about: - make sure not to enqueue the current thread only while still RUNNING, otherwise we will never be woken up (the status may have changed while trying to enqueue this thread). - pick up if another thread just replaced the head of the linked list. Because the first check was part of the condition of the while loop, the rest of the parking code also had to live in that loop. It took me a while to get the subtlety here, and it should now be clearer. Also call out that we really have to wait until signaled, otherwise we leave a dangling reference. #### Don't mutate waiter nodes Previously while waking up other threads the managing thread would `take()` out the `Thread` struct and use that to unpark the other thread. It is just as easy to clone it, just 24 bytes. This way `Waiter.thread` does not need an `Option`, `Waiter.next` does not need to be a mutable pointer, and there is less data that needs to be synchronised by later atomic operations. #### Turn Finish into WaiterQueue In my opinion these changes make it just a bit more clear what is going on with the thread parking stuff. #### Move thread parking to a seperate function Maybe controversial, but with this last commit all the thread parking stuff has a reasonably clean seperation from the state changes in `Once`. This is arguably the trickier part of `Once`, compared to the loop in `call_inner`. It may make it easier to reuse parts of this code (see rust-lang/rfcs#2788 (comment)). Not sure if that ever becomes a reality though. #### Reduce the amount of comments in call_inner With the changes from the previous commits, the code pretty much speaks for itself, and the amount of comments is hurting readability a bit. #### Use more precise atomic orderings Now the hard one. This is the one change that is not anything but a pure refactor or change of comments. I have a dislike for using `SeqCst` everywhere, because it hides what the atomics are supposed to do. the rationale was: > This cold path uses SeqCst consistently because the performance difference really does not matter there, and SeqCst minimizes the chances of something going wrong. But in my opinion, having the proper orderings and some explanation helps to understand what is going on. My rationale for the used orderings (also included as comment): When running `Once` we deal with multiple atomics: `Once.state_and_queue` and an unknown number of `Waiter.signaled`. * `state_and_queue` is used (1) as a state flag, (2) for synchronizing the data that is the result of the `Once`, and (3) for synchronizing `Waiter` nodes. - At the end of the `call_inner` function we have to make sure the result of the `Once` is acquired. So every load which can be the only one to load COMPLETED must have at least Acquire ordering, which means all three of them. - `WaiterQueue::Drop` is the only place that may store COMPLETED, and must do so with Release ordering to make result available. - `wait` inserts `Waiter` nodes as a pointer in `state_and_queue`, and needs to make the nodes available with Release ordering. The load in its `compare_and_swap` can be Relaxed because it only has to compare the atomic, not to read other data. - `WaiterQueue::Drop` must see the `Waiter` nodes, so it must load `state_and_queue` with Acquire ordering. - There is just one store where `state_and_queue` is used only as a state flag, without having to synchronize data: switching the state from INCOMPLETE to RUNNING in `call_inner`. This store can be Relaxed, but the read has to be Acquire because of the requirements mentioned above. * `Waiter.signaled` is both used as a flag, and to protect a field with interior mutability in `Waiter`. `Waiter.thread` is changed in `WaiterQueue::Drop` which then sets `signaled` with Release ordering. After `wait` loads `signaled` with Acquire and sees it is true, it needs to see the changes to drop the `Waiter` struct correctly. * There is one place where the two atomics `Once.state_and_queue` and `Waiter.signaled` come together, and might be reordered by the compiler or processor. Because both use Aquire ordering such a reordering is not allowed, so no need for SeqCst. cc @matklad
Rollup of 7 pull requests Successful merges: - #65719 (Refactor sync::Once) - #65831 (Don't cast directly from &[T; N] to *const T) - #66048 (Correct error in documentation for Ipv4Addr method) - #66058 (Correct deprecated `is_global` IPv6 documentation) - #66216 ([mir-opt] Handle return place in ConstProp and improve SimplifyLocals pass) - #66217 (invalid_value lint: use diagnostic items) - #66235 (rustc_metadata: don't let LLVM confuse rmeta blobs for COFF object files.) Failed merges: r? @ghost
std::sync::Once
contains some tricky code to park and unpark waiting threads. once_cell has very similar code copied from here. I tried to add more comments and refactor the code to make it more readable (at least in my opinion). My PR toonce_cell
was rejected, because it is an advantage to remain close to the implementation in std, and because I made a mess of the atomic orderings. So now a PR here, with similar changes tostd::sync::Once
!The initial goal was to see if there is some way to detect reentrant initialization instead of deadlocking. No luck there yet, but you first have to understand and document the complexities of the existing code 😄.
Maybe not this entire PR will be acceptable, but I hope at least some of the commits can be useful.
Individual commits:
Rename state to state_and_queue
Just a more accurate description, although a bit wordy. It helped me on a first read through the code, where before
state
was use to encode a pointer in to nodes of a linked list.Simplify loop conditions in RUNNING and add comments
In the relevant loop there are two things to be careful about:
Because the first check was part of the condition of the while loop, the rest of the parking code also had to live in that loop. It took me a while to get the subtlety here, and it should now be clearer.
Also call out that we really have to wait until signaled, otherwise we leave a dangling reference.
Don't mutate waiter nodes
Previously while waking up other threads the managing thread would
take()
out theThread
struct and use that to unpark the other thread. It is just as easy to clone it, just 24 bytes. This wayWaiter.thread
does not need anOption
,Waiter.next
does not need to be a mutable pointer, and there is less data that needs to be synchronised by later atomic operations.Turn Finish into WaiterQueue
In my opinion these changes make it just a bit more clear what is going on with the thread parking stuff.
Move thread parking to a seperate function
Maybe controversial, but with this last commit all the thread parking stuff has a reasonably clean seperation from the state changes in
Once
. This is arguably the trickier part ofOnce
, compared to the loop incall_inner
. It may make it easier to reuse parts of this code (see rust-lang/rfcs#2788 (comment)). Not sure if that ever becomes a reality though.Reduce the amount of comments in call_inner
With the changes from the previous commits, the code pretty much speaks for itself, and the amount of comments is hurting readability a bit.
Use more precise atomic orderings
Now the hard one. This is the one change that is not anything but a pure refactor or change of comments.
I have a dislike for using
SeqCst
everywhere, because it hides what the atomics are supposed to do. the rationale was:But in my opinion, having the proper orderings and some explanation helps to understand what is going on. My rationale for the used orderings (also included as comment):
When running
Once
we deal with multiple atomics:Once.state_and_queue
and an unknown number ofWaiter.signaled
.state_and_queue
is used (1) as a state flag, (2) for synchronizing the data that is the result of theOnce
, and (3) for synchronizingWaiter
nodes.call_inner
function we have to make sure the result of theOnce
is acquired. So every load which can be the only one to load COMPLETED must have at least Acquire ordering, which means all three of them.WaiterQueue::Drop
is the only place that may store COMPLETED, and must do so with Release ordering to make result available.wait
insertsWaiter
nodes as a pointer instate_and_queue
, and needs to make the nodes available with Release ordering. The load in itscompare_and_swap
can be Relaxed because it only has to compare the atomic, not to read other data.WaiterQueue::Drop
must see theWaiter
nodes, so it must loadstate_and_queue
with Acquire ordering.state_and_queue
is used only as a state flag, without having to synchronize data: switching the state from INCOMPLETE to RUNNING incall_inner
. This store can be Relaxed, but the read has to be Acquire because of the requirements mentioned above.Waiter.signaled
is both used as a flag, and to protect a field with interior mutability inWaiter
.Waiter.thread
is changed inWaiterQueue::Drop
which then setssignaled
with Release ordering. Afterwait
loadssignaled
with Acquire and sees it is true, it needs to see the changes to drop theWaiter
struct correctly.Once.state_and_queue
andWaiter.signaled
come together, and might be reordered by the compiler or processor. Because both use Aquire ordering such a reordering is not allowed, so no need for SeqCst.cc @matklad