-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a generic CAS loop to std::sync::Atomic* #48658
Conversation
r? @kennytm (rust_highfive has picked a reviewer for you, use r? to override) |
src/libcore/sync/atomic.rs
Outdated
/// Examples: | ||
/// | ||
/// ```rust | ||
/// use std::sync::atömic::{AtomicIsize, Ordering}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atömic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a heavy metal atom 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swear it was a typo.
src/libcore/sync/atomic.rs
Outdated
/// Examples: | ||
/// | ||
/// ```rust | ||
/// use std::sync::atömic::{AtomicIsize, Ordering}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atömic
?
src/libcore/sync/atomic.rs
Outdated
@@ -1268,6 +1268,70 @@ macro_rules! atomic_int { | |||
} | |||
} | |||
|
|||
/// fetch the value, apply a function to it and return the result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capitalize the F, and also use the singular form.
Fetches the value, applies a function to it and returns the result.
src/libcore/sync/atomic.rs
Outdated
/// Note: This may call the function multiple times if the value has been | ||
/// changed from other threads in the meantime. | ||
/// | ||
/// Examples: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples should be a section header, followed by a summary what the example does.
/// # Examples
///
/// Compute the maximum of a slice, where each update step is done atomically.
src/libcore/sync/atomic.rs
Outdated
let mut prev = self.get(Ordering::Relaxed); | ||
loop { | ||
let next = f(prev); | ||
match compare_exchange_weak(self, prev, next, Ordering::SeqCst, Ordering::Relaxed) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance the user wants to use ordering other than SeqCst/Relaxed? (Similar question for the self.get(Relaxed)
above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely yes. Current implementation seems to be always right, however it might end up doing more loops than strictly necessary if some stricter ordering was used for get
… I think. And SeqCst
isn’t always necessary for CEW, but once you go for a weaker ordering there, it is important to be able to specify a stronger one for get
and failure cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagisa Benchmarks in other languages show that using the most relaxed ordering and allowing for more loops achieves for better throughput. However, as this is not other languages, I'll see if I can come up with some benchmarks for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@llogiq it is difficult to generalise the benchmarks like this. The article you’ve linked to is benchmarking on x86, which will have different performance characteristics compared to an architecture where the memory model is weaker. This is exactly why we provide means to specify the orderings everywhere.
Furthermore, I couldn’t manage to convince LLVM to generate as good a code with manual CAS loop compared to just calling the Here’s the code for the dmb ish
.LBB1_1:
ldrexd r4, r5, [r0]
subs r1, r2, r4
sbcs r1, r3, r5
mov r1, #0
movwlt r1, #1
cmp r1, #0
mov r1, r3
moveq r4, r2
movne r1, r5
mov r5, r1
strexd r1, r4, r5, [r0]
cmp r1, #0
bne .LBB1_1
dmb ish and something that gets generated when implementing the same-ish functionality using the CAS loop: push {r4, r5, r6, r7, r8, r9, r10, r11, lr}
ldrd r8, r9, [r0]
ldrexd r10, r11, [r0]
eor r1, r11, r9
eor r7, r10, r8
orr r5, r7, r1
subs r7, r2, r8
sbcs r7, r3, r9
mov r1, #0
mov r7, #0
movwlo r7, #1
cmp r7, #0
mov r7, r3
movne r7, r9
mov r6, r2
movne r6, r8
cmp r5, #0
bne .LBB2_4
dmb ish
.LBB2_2:
strexd r5, r6, r7, [r0]
cmp r5, #0
beq .LBB2_6
ldrexd r10, r11, [r0]
eor r5, r10, r8
eor r4, r11, r9
orrs r5, r5, r4
beq .LBB2_2
.LBB2_4:
clrex
cmp r1, #0
beq .LBB2_8
.LBB2_5:
pop {r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB2_6:
dmb ish
mov r1, #1
cmp r1, #0
beq .LBB2_8
b .LBB2_5
.LBB2_7:
mov r10, r6
mov r11, r7
cmp r1, #0
bne .LBB2_5
.LBB2_8:
ldrexd r6, r7, [r0]
mov r9, r3
eor r1, r6, r10
eor r5, r7, r11
orr r1, r1, r5
subs r5, r2, r10
sbcs r5, r3, r11
mov r5, #0
movwlo r5, #1
cmp r5, #0
movne r9, r11
mov r8, r2
movne r8, r10
cmp r1, #0
bne .LBB2_12
dmb ish
.LBB2_10:
strexd r1, r8, r9, [r0]
cmp r1, #0
beq .LBB2_13
ldrexd r6, r7, [r0]
eor r1, r6, r10
eor r4, r7, r11
orrs r1, r1, r4
beq .LBB2_10
.LBB2_12:
mov r1, #0
clrex
b .LBB2_7
.LBB2_13:
dmb ish
mov r1, #1
b .LBB2_7 So it seems to me that it would be worthwhile to add a binding for such a function anyway. |
src/libcore/sync/atomic.rs
Outdated
} | ||
} | ||
|
||
/// Fetches the value, applies a function to it and returns the prvious value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/prvious/previous
Ping from triage @llogiq! Could you finish this PR so it can be merged? |
I'm a bit unsure how to progress:
|
I think we should add two ordering arguments -- one for CAS success case and the other for for failure case (also used for initial Also cc @Amanieu, who may have opinions about this stuff. |
At the very least one should be able to choose an ordering other than SeqCst. There are three orderings involved here (load, compare_exchange_weak's success + failure). If one can prove that any one can be always Relaxed or any two ordering must be equivalent, then we don't need to expose that ordering to the API. |
@kennytm The ordering for the initial load and for the CAS failure are the same: in both cases this is used to load a value for the next loop iteration. For the API itself, I think something like this would be better: fn transaction<F: FnMut($int_type) -> Option<$int_type>>(&self, f: F, success: Ordering, failure: Ordering) -> bool; The closure can return |
I'll also leave this as an example of prior work (in C++). |
@llogiq I'm not convinced that we need both |
@Amanieu in that case I think it'd be prudent to have an example of extracting the new value from the closure. How would you do that? Setting a mutable binding outside? We also should look at the generated assembly for that case. |
Yes, basically something like this: let mut new = 0;
ATOMIC.fetch_and_update(|x| {new = x + 1; new}, Ordering::Relaxed, Ordering::Relaxed); You need to replace (As a side note, I'm still not a big fan of the |
I'll change the name when I remove |
I strongly oppose to the name |
I disagree that |
After some consideration, I'm OK with the name I went through a few of the CAS loops in my various crates, and the majority of them are of this form: loop {
if some_condition {
return false;
}
match atomic.compare_exchange_weak(
oldval,
newval,
Ordering::Relaxed,
Ordering::Relaxed,
) {
Ok(_) => return true,
Err(x) => oldval = x,
}
} I therefore think that we should make the closure return an |
So the fndecl would be |
@llogiq That's correct. I'm somewhat torn on the return type of |
So you'd return either |
Yes |
3589b35
to
b87c545
Compare
Rebased and squashed. @kennytm r? |
@llogiq The |
It appears I lost it during the rebase. I'll re-add it. |
Done. Let's see if Travis likes it. |
dde148c
to
2bc4859
Compare
Besides fixing some typos, I had to remove the random number generator, as it doesn't work with types below 32 bits. |
Now with the |
src/libcore/sync/atomic.rs
Outdated
|
||
Note: This may call the function multiple times if the value has been changed from other threads in | ||
the meantime, as long as the function returns `Some(_)`. It will however only apply the function | ||
once to the value stored within. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will however only apply the function once to the value stored within.
I don't get what does this mean. Could be clarify this in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to say that the function will be applied only once per observed value. I've changed the text accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are using compare_exchange_weak
, it could actually be applied more than once even if the value has not changed. For example on ARM, a compare_exchange_weak
may spuriously fail due to an interrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. Ok, I've changed the text once more to say that the function will only have been applied once to the stored value. Otherwise people may fear they need to make the function idempotent.
src/libcore/sync/atomic.rs
Outdated
assert!(max_foo == 42); | ||
```"), | ||
#[inline] | ||
#[unstable(feature = "no_more_cas", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we pick a different feature name to fetch_{max, min}
(e.g. atomic_min_max
). We may have different stabilization schedule between fetch_{min, max}
and fetch_update
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This adds a new method to all numeric `Atomic*` types with a safe compare-and-set loop, so users will no longer need to write their own, except in *very* strange circumstances. This solves rust-lang#48384 with `x.fetch_max(_)`/`x.fetch_min(_)`. It also relates to rust-lang#48655 (which I misuse as tracking issue for now). *note* This *might* need a crater run because the functions could clash with third party extension traits.
Now we have the split feature gate and a better wording for the comments. Travis also seems to like it. r? |
@bors r+ |
📌 Commit 0f5e419 has been approved by |
Add a generic CAS loop to std::sync::Atomic* This adds two new methods to both `AtomicIsize` and `AtomicUsize` with optimized safe compare-and-set loops, so users will no longer need to write their own, except in *very* strange circumstances. `update_and_fetch` will apply the function and return its result, whereas `fetch_and_update` will apply the function and return the previous value. This solves rust-lang#48384 with `x.update_and_fetch(|x| x.max(y))`. It also relates to rust-lang#48655 (which I misuse as tracking issue for now).. *note* This *might* need a crater run because the functions could clash with third party extension traits.
Rollup of 9 pull requests Successful merges: - #48658 (Add a generic CAS loop to std::sync::Atomic*) - #49253 (Take the original extra-filename passed to a crate into account when resolving it as a dependency) - #49345 (RFC 2008: Finishing Touches) - #49432 (Flush executables to disk after linkage) - #49496 (Add more vec![... ; n] optimizations) - #49563 (add a dist builder to build rust-std components for the THUMB targets) - #49654 (Host compiler documentation: Include private items) - #49667 (Add more features to rust_2018_preview) - #49674 (ci: Remove x86_64-gnu-incremental builder) Failed merges:
This adds two new methods to both
AtomicIsize
andAtomicUsize
with optimized safe compare-and-set loops, so users will no longer need to write their own, except in very strange circumstances.update_and_fetch
will apply the function and return its result, whereasfetch_and_update
will apply the function and return the previous value.This solves #48384 with
x.update_and_fetch(|x| x.max(y))
. It also relates to #48655 (which I misuse as tracking issue for now)..note This might need a crater run because the functions could clash with third party extension traits.