-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Add a replace_with method to Option #2490
Conversation
It would be ideal to implement a proper optimization instead of introducing this I have initially attempted to dig into the optimizations, but soon I realized that I don't have enough expertise yet to do that. If someone can mentor me, I would eagerly dive into the topic! UPDATE: I moved my findings into a StackOverflow question. I would love to learn more details about optimizations, so your help is appreciated! |
31a6959
to
e43de83
Compare
The proposed implementation does not look panic safe to me, i e, in case |
@diwic Well, initialization with |
I would rewrite this as: *self = f(mem::replace(self, None)); which is safe and depends directly on zero unsafe operations. |
@Centril and |
@diwic is right. If I initially started with the following implementation: let mut new_value = f(self.take());
mem::swap(self, &mut new_value);
// Since self was None after take(), new_value holds None here after swap(),
// so we can forget about it.
mem::forget(new_value); It is as performant as the proposed implementation but has one extra |
@frol So I primarily see the motivation for |
@Centril To be honest, I would prefer fixing optimizer to handle |
The problem is not in With mov rax, qword ptr [rdi]
mov rcx, rax
shr rcx, 32
xor edx, edx
test eax, eax
setne dl
add ecx, 1
mov dword ptr [rdi], edx
mov dword ptr [rdi + 4], ecx
ret With a different xor eax, eax
cmp dword ptr [rdi], 0
setne al
mov dword ptr [rdi], eax
add dword ptr [rdi + 4], 1
ret Which is pretty good, other than the usual rust-lang/rust#49420 (comment). I suspect start:
%1 = bitcast { i32, i32 }* %0 to i64*
%2 = load i64, i64* %1, align 1, !alias.scope !0, !noalias !9 That TL/DR: Edit: PR for |
mem::swap the obvious way for types smaller than the SIMD optimization's block size LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion. Examples of the improvements: <details> <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary> ```rust type Demo = [u16; 3]; pub fn swap_demo(x: &mut Demo, y: &mut Demo) { std::mem::swap(x, y); } ``` nightly: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: .seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE sub rsp, 32 .seh_stackalloc 32 .seh_endprologue movzx eax, word ptr [rcx + 4] mov word ptr [rsp + 4], ax mov eax, dword ptr [rcx] mov dword ptr [rsp], eax movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax movzx eax, word ptr [rsp + 4] mov word ptr [rdx + 4], ax mov eax, dword ptr [rsp] mov dword ptr [rdx], eax add rsp, 32 ret .seh_handlerdata .section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE .seh_endproc ``` this PR: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: mov r8d, dword ptr [rcx] movzx r9d, word ptr [rcx + 4] movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax mov word ptr [rdx + 4], r9w mov dword ptr [rdx], r8d ret ``` </details> <details> <summary>`replace_with` optimizes down much better</summary> Inspired by rust-lang/rfcs#2490, ```rust fn replace_with<T, F>(x: &mut Option<T>, f: F) where F: FnOnce(Option<T>) -> Option<T> { *x = f(x.take()); } pub fn inc_opt(mut x: &mut Option<i32>) { replace_with(&mut x, |i| i.map(|j| j + 1)); } ``` Rust 1.26.0: ```asm _ZN4blah7inc_opt17heb0acb64c51777cfE: mov rax, qword ptr [rcx] movabs r8, 4294967296 add r8, rax shl rax, 32 movabs rdx, -4294967296 and rdx, r8 xor r8d, r8d test rax, rax cmove rdx, rax setne r8b or rdx, r8 mov qword ptr [rcx], rdx ret ``` Nightly (better thanks to ScalarPair, maybe?): ```asm _ZN4blah7inc_opt17h66df690be0b5899dE: mov r8, qword ptr [rcx] mov rdx, r8 shr rdx, 32 xor eax, eax test r8d, r8d setne al add edx, 1 mov dword ptr [rcx], eax mov dword ptr [rcx + 4], edx ret ``` This PR: ```asm _ZN4blah7inc_opt17h1426dc215ecbdb19E: xor eax, eax cmp dword ptr [rcx], 0 setne al mov dword ptr [rcx], eax add dword ptr [rcx + 4], 1 ret ``` Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)). </details>
mem::swap the obvious way for types smaller than the SIMD optimization's block size LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't. Found in the `replace_with` RFC discussion. Examples of the improvements: <details> <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary> ```rust type Demo = [u16; 3]; pub fn swap_demo(x: &mut Demo, y: &mut Demo) { std::mem::swap(x, y); } ``` nightly: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: .seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE sub rsp, 32 .seh_stackalloc 32 .seh_endprologue movzx eax, word ptr [rcx + 4] mov word ptr [rsp + 4], ax mov eax, dword ptr [rcx] mov dword ptr [rsp], eax movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax movzx eax, word ptr [rsp + 4] mov word ptr [rdx + 4], ax mov eax, dword ptr [rsp] mov dword ptr [rdx], eax add rsp, 32 ret .seh_handlerdata .section .text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE .seh_endproc ``` this PR: ```asm _ZN4blah9swap_demo17ha1732a9b71393a7eE: mov r8d, dword ptr [rcx] movzx r9d, word ptr [rcx + 4] movzx eax, word ptr [rdx + 4] mov word ptr [rcx + 4], ax mov eax, dword ptr [rdx] mov dword ptr [rcx], eax mov word ptr [rdx + 4], r9w mov dword ptr [rdx], r8d ret ``` </details> <details> <summary>`replace_with` optimizes down much better</summary> Inspired by rust-lang/rfcs#2490, ```rust fn replace_with<T, F>(x: &mut Option<T>, f: F) where F: FnOnce(Option<T>) -> Option<T> { *x = f(x.take()); } pub fn inc_opt(mut x: &mut Option<i32>) { replace_with(&mut x, |i| i.map(|j| j + 1)); } ``` Rust 1.26.0: ```asm _ZN4blah7inc_opt17heb0acb64c51777cfE: mov rax, qword ptr [rcx] movabs r8, 4294967296 add r8, rax shl rax, 32 movabs rdx, -4294967296 and rdx, r8 xor r8d, r8d test rax, rax cmove rdx, rax setne r8b or rdx, r8 mov qword ptr [rcx], rdx ret ``` Nightly (better thanks to ScalarPair, maybe?): ```asm _ZN4blah7inc_opt17h66df690be0b5899dE: mov r8, qword ptr [rcx] mov rdx, r8 shr rdx, 32 xor eax, eax test r8d, r8d setne al add edx, 1 mov dword ptr [rcx], eax mov dword ptr [rcx + 4], edx ret ``` This PR: ```asm _ZN4blah7inc_opt17h1426dc215ecbdb19E: xor eax, eax cmp dword ptr [rcx], 0 setne al mov dword ptr [rcx], eax add dword ptr [rcx + 4], 1 ret ``` Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)). </details>
@scottmcm FYI, I have tried the latest Rust nightly (6a1c0637c 2018-07-23), which includes the patch from PR rust-lang/rust#52051 and even though I see that your example snippet has been improved with the patch, there is no improvement for my use-case. Given that the proposed optimization was concluded to be irrelevant for the implementation, and the fact that the proposed method basically duplicates the already existing way to do this operation in an obvious way ( |
UPDATE: There was a relevant RFC about the ideas I describe below, so feel free to ignore my message. I have just had a conversation where this #[derive(Debug)]
struct Bar;
#[derive(Debug)]
enum Foo {
A(Bar),
B(Bar),
}
#[derive(Debug)]
struct Baz {
foo: Foo,
}
impl Baz {
fn switch_variant_unsafe(&mut self) {
let mut foo_temp: Foo = unsafe { ::std::mem::uninitialized() };
::std::mem::swap(&mut self.foo, &mut foo_temp);
self.foo = match foo_temp {
Foo::A(bar) => Foo::B(bar),
Foo::B(bar) => Foo::A(bar),
}
}
fn switch_variant_safe(&mut self) {
self.foo = match self.foo {
Foo::A(bar) => Foo::B(bar),
Foo::B(bar) => Foo::A(bar),
})
}
}
fn main() {
let mut baz = Baz { foo: Foo::A(Bar) };
baz.foo = match baz.foo {
Foo::A(bar) => Foo::B(bar),
Foo::B(bar) => Foo::A(bar),
};
dbg!(&baz);
baz.switch_variant_unsafe();
dbg!(&baz);
baz.switch_variant_safe();
dbg!(&baz);
} As is, you get a compilation error:
There is also a similar question on SO. Here is the helper that I came up with (based on this RFC): fn replace_with<T, F>(dest: &mut T, mut f: F)
where
F: FnMut(T) -> T,
{
let mut old_value = unsafe { std::mem::uninitialized() };
std::mem::swap(dest, &mut old_value); // dest is "uninitialized" (in fact, it is not touched in release mode)
let mut new_value = f(old_value);
std::mem::swap(dest, &mut new_value); // dest holds new_value, and new_value is "uninitialized"
std::mem::forget(new_value); // since it is "uninitialized", we forget about it
} , and then we can implement fn switch_variant_safe(&mut self) {
replace_with(&mut self.foo, |foo| match foo {
Foo::A(bar) => Foo::B(bar),
Foo::B(bar) => Foo::A(bar),
})
} The generated assembly for As to the unsoundness concerns raised in #2490 (comment), in release mode, example::replace_with:
mov al, byte ptr [rdi]
not al
and al, 1
mov byte ptr [rdi], al
ret in fact, it gets automatically inlined unless I put mov al, byte ptr [rsp + 7]
not al
and al, 1 Thus, there is no unsoundness in the release mode. Yet, in debug mode, there is indeed an explicit uninitialized value gets assigned to the Another way to implement enum variant "toggle" is to pass ownership to fn switch_variant_owned(mut self) -> Self {
self.foo = match self.foo {
Foo::A(bar) => Foo::B(bar),
Foo::B(bar) => Foo::A(bar),
};
self
} , but it requires the API changes all the way down to the method (i.e. you have to use this API style all the way through your codebase if it is a low-level method). Sidenote, while the generated assembly is mostly the same (there are some variables rearrangements), there is one interesting optimization gets applied when I believe, there is a need for safe and sound P.S. This more generic let mut some_option: Option<i32> = Some(123);
some_option.replace_with(|old_value| consume_option_i32_and_produce_option_i32(old_value)); I would write let mut some_option: Option<i32> = Some(123);
std::mem::replace(&mut some_option, |old_value| consume_option_i32_and_produce_option_i32(old_value)); |
@frol Since it's unclear whether or not you're already aware of this: I believe that specific use case was a major part of the discussion around #1736 For those who aren't familiar with that discussion: The big sticking point that led to its closure was that apparently the only way to make a |
@Ixrec I was not aware of it. Thank you for pointing out in the right direction! |
Add the method
Option::replace_with
to the core library.This RFC proposes the addition of
Option::replace_with
to complimentOption::replace
(RFC #2296) andOption::take
methods. It replaces the actual value in the option with the value returned from a closure given as a parameter, while the old value is passed into the closure.Rendered