-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(deps): Upgrade to Rust 1.50.0 #6428
Conversation
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
I believe this is the equivalent compare_exchange based on the table in the docs: Original Success Failure Relaxed Relaxed Relaxed Acquire Acquire Acquire Release Release Relaxed AcqRel AcqRel Acquire SeqCst SeqCst SeqCst https://doc.rust-lang.org/std/sync/atomic/struct.AtomicUsize.html#migrating-to-compare_exchange-and-compare_exchange_weak Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@@ -69,13 +69,13 @@ fn gauge_add(gauge: &AtomicUsize, add: isize, emitter: impl Fn(usize)) { | |||
emitter(new_value); | |||
// Try to update gauge to new value and releasing writes to gauge metric in the process. | |||
// Otherwise acquire new writes to gauge metric. | |||
let latest = gauge.compare_and_swap(value, new_value, Ordering::AcqRel); | |||
if value == latest { | |||
value = match gauge.compare_exchange(value, new_value, Ordering::AcqRel, Ordering::Acquire) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bruceg I'd appreciate your 👀 here. compare_and_swap
was deprecated. I've swapped it for the equivalent compare_exchange
call, but there is also a compare_exchange_weak
call that can purportedly generate more efficient assembly. The difference is:
Unlike AtomicUsize::compare_exchange, this function is allowed to spuriously fail even when the comparison succeeds, which can result in more efficient code on some platforms. The return value is a result indicating whether the new value was written and containing the previous value.
The nuance being that it might sometimes fail even if it could have succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compare_exchange_weak
will play much better on ARM and we should aim to use it. compare_exchange
emits instructions that require strong ordering of memory. A strong ordering means atomic operations are implicit acquire/release and probably Total Sorted Order (every CPU agrees about the order memory operations took place) where weak ordering allows CPUs to disagree, which is cool because some of them can shut off if need be or do other stuff. (ARM is weak, data-dependent which means every CPU eventually agrees on the results of operations and A->B
dereference is guaranteed that A
is at least as fresh as B
. When systems don't support data-dependency things get bonkers.)
For x86 with implicit strong order we'll never see additional fail loops, if we were an omniscient observer of the system. On ARM it's possible that we will because the CPU our instructions run on is lagging getting updates from other CPUs but it's an acceptable trade-off except in specialized circumstances.
I also suggest using ordering semantics AcqRel
for success here, like you've done, and Relaxed
for failure. On ARM this'll mean more potential spins but overall more efficient results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and made myself #6477 to tackle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response! I think what you said makes sense and agree with the AcqRel
for success and Relaxed
for failure. I'll leave that to your PR since this one maintains the current state.
On ARM it's possible that we will because the CPU our instructions run on is lagging getting updates from other CPUs but it's an acceptable trade-off except in specialized circumstances.
To deepen my understanding, under what circumstances would it be unacceptable?
I did note that the Rust PR for deprecating compare_and_swap
only introduced one compare_exchange_weak
:
The PR comment seems to indicate that they were mostly just going for satisfying the deprecation rather than with an eye to improving performance though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response! I think what you said makes sense and agree with the AcqRel for success and Relaxed for failure. I'll leave that to your PR since this one maintains the current state.
Cool. I'm good with that.
On ARM it's possible that we will because the CPU our instructions run on is lagging getting updates from other CPUs but it's an acceptable trade-off except in specialized circumstances.
To deepen my understanding, under what circumstances would it be unacceptable?
Solid question. Consider a situation where you execute some expensive function per loop, maybe it has a non-trivial side-effect or is expensive in CPU terms but you can't migrate it out of the loop. You might be better off paying the higher synchronization cost here, depending on what your goals are.
The PR comment seems to indicate that they were mostly just going for satisfying the deprecation rather than with an eye to improving performance though.
Quite right. Huh. Well, I was going off my own background understanding; ARM'll get better instructions emitted if we use compare_exchange_weak
. If you take this program
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
pub fn mul(val: &Arc<AtomicU64>, factor: u64) {
let mut old = val.load(Ordering::Relaxed);
loop {
let new = old * factor;
match val.compare_exchange_weak(old, new, Ordering::AcqRel, Ordering::Relaxed) {
Ok(_) => break,
Err(x) => old = x,
}
}
}
pub fn main() {
let val = Arc::new(AtomicU64::new(4));
mul(&val, 8);
println!("{:?}", val);
}
and generate x86 you'll see a lock cmpxchg
instruction pair whether you use compare_exchange_weak
or compare_exchange
. On ARM when I compile for arm-unknown-linux-gnueabihf
at opt-level 3 the _weak
variant of the program clobbers less registers, has fewer exclusive loads and does less conditional work. I actually find opt-level 3 hard to read -- I think the compiler noticed all my values are powers of 2 and did clever things from that -- and fortunately when I call opt-level 0 it embeds the assembly of compare_exchange_weak
and compare_exchange
. The strong version has more callee-save registers than I understand and calls further into helper functions that aren't inlined. The compiler source calls out to intrinsics, which makes sense, but without digging in further and brushing up on ARM -- which I need to do eventually -- I think the best I can offer is the strong variant emits more instructions to force coordination with the other CPUs. An ARM expert could pin point exactly where in the assembly for the above this happens, and if someone out there reading this knows I'd love to get taught. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid question. Consider a situation where you execute some expensive function per loop, maybe it has a non-trivial side-effect or is expensive in CPU terms but you can't migrate it out of the loop. You might be better off paying the higher synchronization cost here, depending on what your goals are.
👍 Yeah that makes sense as a trade-off. Thanks again for the thorough explanation!
@@ -23,6 +23,7 @@ impl SplitFn { | |||
} | |||
|
|||
impl Function for SplitFn { | |||
#[allow(clippy::collapsible_match)] // I expect this file to be going away shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JeanMertz @FungusHumungus is this accurate? It seems like src/mapping
is largely vestigial at this point. Do you think we'd drop it as part of #6353 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, feel free to ignore this. I'll drop it once #6353 makes it into master (later today, as the CI is expected to be green within the hour).
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
c21f421
to
653f30f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I'd say I'm in favor of keeping the unnecessary_wraps
changes. It seems good to simplify the functions and not imply some kind of conditional that doesn't actually exist. Wrapping the return value at the call site that needs it to be wrapped seems slightly more intentional.
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. I echo the support for unnecessary_wraps
. My comment here is not blocking and I'd be happy to see a different PR that audits our codebase's atomics. Something I could take on or pair up with you to do.
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
type Item = Spanned<'input, usize>; | ||
type Item = SpannedResult<'input, usize>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these changes needed for the 1.50 upgrade, or did they too sneak in from a VRL merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ones are intentional. Clippy was complaining about unnecessary_wraps
so I renamed the existing Spanned
type SpannedResult
and added a new Spanned
type that represents the Ok
value of SpannedResult
so I could easily refer to Spanned
in the function return types where it was complaining about the unnecessary wrapping. For example: https://github.com/timberio/vector/pull/6428/files/f279014f63a058abe29e0afb8dbc1f1dffc2fb32#diff-3eaef579e2d938c6926fa17614f80aab7455adeeffd8907567b34fee18a07e94R575-R582
I tried to break up the commits to make the changes easier to follow.
The bulk of the changes were from the new
clippy::unnecessary_wraps
rule. I was a bit on the fence about that rule, but I've come around to the idea that it's better to push wrapping up to the callers to make it more clear that a function only every returns one of theOption
orResult
enum values. EDIT I'm back on the fence on this lint again. I think it hurts readibility in many cases. I'm curious to get other opinions though. I separated the commits so hopefully it is easy to back out just that lint.I also ignored some of the lints on the remap parsing code as I expect that to go away, but I'll have to confirm.