Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add a replace_with method to Option #2490

Closed
wants to merge 1 commit into from

Conversation

frol
Copy link

@frol frol commented Jun 29, 2018

Add the method Option::replace_with to the core library.

This RFC proposes the addition of Option::replace_with to compliment Option::replace (RFC #2296) and Option::take methods. It replaces the actual value in the option with the value returned from a closure given as a parameter, while the old value is passed into the closure.

Rendered

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jun 29, 2018
@frol
Copy link
Author

frol commented Jun 29, 2018

It would be ideal to implement a proper optimization instead of introducing this Option::replace_with method, so the naive implementation ideally should produce the optimized code without a need to use special methods like replace_with.

I have initially attempted to dig into the optimizations, but soon I realized that I don't have enough expertise yet to do that. If someone can mentor me, I would eagerly dive into the topic!

UPDATE: I moved my findings into a StackOverflow question.

I would love to learn more details about optimizations, so your help is appreciated!

@frol frol force-pushed the option-replace-with branch from 31a6959 to e43de83 Compare June 29, 2018 13:55
@diwic
Copy link

diwic commented Jun 29, 2018

    let mut old_value = unsafe { mem::uninitialized() };
    mem::swap(self, &mut old_value);
    let mut new_value = f(old_value);
    mem::swap(self, &mut new_value);
    mem::forget(new_value);

The proposed implementation does not look panic safe to me, i e, in case f panics, then self will be left uninitialized, which is unsound (and will likely lead to segfaults later on). I guess it could work if old_value was initialized to None instead - but would that give the same performance?

@frol
Copy link
Author

frol commented Jun 29, 2018

@diwic Well, initialization with None only adds a single mov qword ptr [r14 + 8], 0 instruction, that does not hurt the performance at all (yet I opted to avoid it using uninitialized, though I admit that I have no experience with the drawbacks uninitialized may introduce).

@Centril
Copy link
Contributor

Centril commented Jun 30, 2018

I would rewrite this as:

    *self = f(mem::replace(self, None));

which is safe and depends directly on zero unsafe operations.

@diwic
Copy link

diwic commented Jun 30, 2018

@Centril and *self = f(self.take()) is even shorter, but since that does not optimize well enough, I'd assume your version won't either...?

@Centril
Copy link
Contributor

Centril commented Jun 30, 2018

@diwic No idea; I'll leave it up to @frol to benchmark this :)

*self = with(self.take()) looks great!

@frol
Copy link
Author

frol commented Jun 30, 2018

@diwic is right. If *self = f(self.take()) would work, there would be no need in this Option::replace_with at all as it is mentioned in the drawbacks section.

I initially started with the following implementation:

        let mut new_value = f(self.take());
        mem::swap(self, &mut new_value);
        // Since self was None after take(), new_value holds None here after swap(),
        // so we can forget about it.
        mem::forget(new_value);

It is as performant as the proposed implementation but has one extra mov qword ptr [r14 + 8], 0 instruction, which I decided to avoid as well. Thus, even when I call swap for uninitialized new_value, it compiles down to just using the value of self and does not really replace it with uninitialized garbage in exchange, i.e. we may end up with double free in case of panic, but I am unsure about that due to lack of knowledge.

@Centril
Copy link
Contributor

Centril commented Jun 30, 2018

@frol So I primarily see the motivation for option.replace_with(|opt| ...) in that it is clearer wrt. intent even if it can be written as *option = (|opt| ...)(option.take()); which is itself short.

@frol
Copy link
Author

frol commented Jun 30, 2018

@Centril To be honest, I would prefer fixing optimizer to handle = efficiently instead of this extra method, but I need someone to help me with the starting point if possible. I tried to dive into it myself, but I stuck with understanding whether this optimization belongs to MIR or LLVM or anything else...

@scottmcm
Copy link
Member

scottmcm commented Jul 3, 2018

The problem is not in replace_with, it's in take somewhere, probably actually in swap: https://play.rust-lang.org/?gist=4e1c4da9cefb3f17a187a7581e2da938&version=nightly&mode=release

With std's take:

	mov	rax, qword ptr [rdi]
	mov	rcx, rax
	shr	rcx, 32
	xor	edx, edx
	test	eax, eax
	setne	dl
	add	ecx, 1
	mov	dword ptr [rdi], edx
	mov	dword ptr [rdi + 4], ecx
	ret

With a different take:

	xor	eax, eax
	cmp	dword ptr [rdi], 0
	setne	al
	mov	dword ptr [rdi], eax
	add	dword ptr [rdi + 4], 1
	ret

Which is pretty good, other than the usual rust-lang/rust#49420 (comment).

I suspect swap is the culprit here because looking at the LLVM-IR for the former:

start:
  %1 = bitcast { i32, i32 }* %0 to i64*
  %2 = load i64, i64* %1, align 1, !alias.scope !0, !noalias !9

That align 1 is coming from the fact that the current swap implementation always swaps via *mut u8, which I suspect is what's making LLVM unwilling to optimize further here.

TL/DR: replace_with should have the obvious all-safe-code implementation, and any problems in the code that emits should be fixed elsewhere.

Edit: PR for swap that improves take: rust-lang/rust#52051

kennytm added a commit to kennytm/rust that referenced this pull request Jul 21, 2018
mem::swap the obvious way for types smaller than the SIMD optimization's block size

LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't.  Found in the `replace_with` RFC discussion.

Examples of the improvements:
<details>
 <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>

```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
    std::mem::swap(x, y);
}
```

nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
	sub	rsp, 32
	.seh_stackalloc 32
	.seh_endprologue
	movzx	eax, word ptr [rcx + 4]
	mov	word ptr [rsp + 4], ax
	mov	eax, dword ptr [rcx]
	mov	dword ptr [rsp], eax
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	movzx	eax, word ptr [rsp + 4]
	mov	word ptr [rdx + 4], ax
	mov	eax, dword ptr [rsp]
	mov	dword ptr [rdx], eax
	add	rsp, 32
	ret
	.seh_handlerdata
	.section	.text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
	.seh_endproc
```

this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
	mov	r8d, dword ptr [rcx]
	movzx	r9d, word ptr [rcx + 4]
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	mov	word ptr [rdx + 4], r9w
	mov	dword ptr [rdx], r8d
	ret
```
</details>

<details>
 <summary>`replace_with` optimizes down much better</summary>

Inspired by rust-lang/rfcs#2490,

```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
    where F: FnOnce(Option<T>) -> Option<T>
{
    *x = f(x.take());
}

pub fn inc_opt(mut x: &mut Option<i32>) {
    replace_with(&mut x, |i| i.map(|j| j + 1));
}
```

Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
	mov	rax, qword ptr [rcx]
	movabs	r8, 4294967296
	add	r8, rax
	shl	rax, 32
	movabs	rdx, -4294967296
	and	rdx, r8
	xor	r8d, r8d
	test	rax, rax
	cmove	rdx, rax
	setne	r8b
	or	rdx, r8
	mov	qword ptr [rcx], rdx
	ret
```

Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
	mov	r8, qword ptr [rcx]
	mov	rdx, r8
	shr	rdx, 32
	xor	eax, eax
	test	r8d, r8d
	setne	al
	add	edx, 1
	mov	dword ptr [rcx], eax
	mov	dword ptr [rcx + 4], edx
	ret
```

This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
	xor	eax, eax
	cmp	dword ptr [rcx], 0
	setne	al
	mov	dword ptr [rcx], eax
	add	dword ptr [rcx + 4], 1
	ret
```

Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)).
</details>
kennytm added a commit to kennytm/rust that referenced this pull request Jul 22, 2018
mem::swap the obvious way for types smaller than the SIMD optimization's block size

LLVM isn't able to remove the alloca for the unaligned block in the post-SIMD tail in some cases, so doing this helps SRoA work in cases where it currently doesn't.  Found in the `replace_with` RFC discussion.

Examples of the improvements:
<details>
 <summary>swapping `[u16; 3]` takes 1/3 fewer instructions and no stackalloc</summary>

```rust
type Demo = [u16; 3];
pub fn swap_demo(x: &mut Demo, y: &mut Demo) {
    std::mem::swap(x, y);
}
```

nightly:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
.seh_proc _ZN4blah9swap_demo17ha1732a9b71393a7eE
	sub	rsp, 32
	.seh_stackalloc 32
	.seh_endprologue
	movzx	eax, word ptr [rcx + 4]
	mov	word ptr [rsp + 4], ax
	mov	eax, dword ptr [rcx]
	mov	dword ptr [rsp], eax
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	movzx	eax, word ptr [rsp + 4]
	mov	word ptr [rdx + 4], ax
	mov	eax, dword ptr [rsp]
	mov	dword ptr [rdx], eax
	add	rsp, 32
	ret
	.seh_handlerdata
	.section	.text,"xr",one_only,_ZN4blah9swap_demo17ha1732a9b71393a7eE
	.seh_endproc
```

this PR:
```asm
_ZN4blah9swap_demo17ha1732a9b71393a7eE:
	mov	r8d, dword ptr [rcx]
	movzx	r9d, word ptr [rcx + 4]
	movzx	eax, word ptr [rdx + 4]
	mov	word ptr [rcx + 4], ax
	mov	eax, dword ptr [rdx]
	mov	dword ptr [rcx], eax
	mov	word ptr [rdx + 4], r9w
	mov	dword ptr [rdx], r8d
	ret
```
</details>

<details>
 <summary>`replace_with` optimizes down much better</summary>

Inspired by rust-lang/rfcs#2490,

```rust
fn replace_with<T, F>(x: &mut Option<T>, f: F)
    where F: FnOnce(Option<T>) -> Option<T>
{
    *x = f(x.take());
}

pub fn inc_opt(mut x: &mut Option<i32>) {
    replace_with(&mut x, |i| i.map(|j| j + 1));
}
```

Rust 1.26.0:
```asm
_ZN4blah7inc_opt17heb0acb64c51777cfE:
	mov	rax, qword ptr [rcx]
	movabs	r8, 4294967296
	add	r8, rax
	shl	rax, 32
	movabs	rdx, -4294967296
	and	rdx, r8
	xor	r8d, r8d
	test	rax, rax
	cmove	rdx, rax
	setne	r8b
	or	rdx, r8
	mov	qword ptr [rcx], rdx
	ret
```

Nightly (better thanks to ScalarPair, maybe?):
```asm
_ZN4blah7inc_opt17h66df690be0b5899dE:
	mov	r8, qword ptr [rcx]
	mov	rdx, r8
	shr	rdx, 32
	xor	eax, eax
	test	r8d, r8d
	setne	al
	add	edx, 1
	mov	dword ptr [rcx], eax
	mov	dword ptr [rcx + 4], edx
	ret
```

This PR:
```asm
_ZN4blah7inc_opt17h1426dc215ecbdb19E:
	xor	eax, eax
	cmp	dword ptr [rcx], 0
	setne	al
	mov	dword ptr [rcx], eax
	add	dword ptr [rcx + 4], 1
	ret
```

Where that add is beautiful -- using an addressing mode to not even need to explicitly go through a register -- and the remaining imperfection is well-known (rust-lang#49420 (comment)).
</details>
@frol
Copy link
Author

frol commented Jul 24, 2018

@scottmcm FYI, I have tried the latest Rust nightly (6a1c0637c 2018-07-23), which includes the patch from PR rust-lang/rust#52051 and even though I see that your example snippet has been improved with the patch, there is no improvement for my use-case.

Given that the proposed optimization was concluded to be irrelevant for the implementation, and the fact that the proposed method basically duplicates the already existing way to do this operation in an obvious way (a = f(a.take())), I am closing this issue and will dig further into the optimizations space.

@frol frol closed this Jul 24, 2018
@frol
Copy link
Author

frol commented Jan 31, 2019

UPDATE: There was a relevant RFC about the ideas I describe below, so feel free to ignore my message.


I have just had a conversation where this replace_with idea had bubbled up in yet another context, that is how to "toggle" enum variants when there is a non-Copy value inside:

#[derive(Debug)]
struct Bar;

#[derive(Debug)]
enum Foo {
	A(Bar),
	B(Bar),
}

#[derive(Debug)]
struct Baz {
	foo: Foo,
}

impl Baz {
    fn switch_variant_unsafe(&mut self) {
        let mut foo_temp: Foo = unsafe { ::std::mem::uninitialized() };
        ::std::mem::swap(&mut self.foo, &mut foo_temp);
        self.foo = match foo_temp {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        }
    }
    
    fn switch_variant_safe(&mut self) {
        self.foo = match self.foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        })
    }
}

fn main() {
    let mut baz = Baz { foo: Foo::A(Bar) };
    baz.foo = match baz.foo {
        Foo::A(bar) => Foo::B(bar),
        Foo::B(bar) => Foo::A(bar),
    };
    dbg!(&baz);
    baz.switch_variant_unsafe();
    dbg!(&baz);
    baz.switch_variant_safe();
    dbg!(&baz);
}

Playground

As is, you get a compilation error:

error[E0507]: cannot move out of borrowed content
  --> src/main.rs:26:26
   |
26 |         self.foo = match self.foo {
   |                          ^^^^^^^^
   |                          |
   |                          cannot move out of borrowed content
   |                          help: consider borrowing here: `&self.foo`
27 |             Foo::A(bar) => Foo::B(bar),
   |                    --- data moved here
28 |             Foo::B(bar) => Foo::A(bar),
   |                    --- ...and here
   |
note: move occurs because these variables have types that don't implement the `Copy` trait
  --> src/main.rs:27:20
   |
27 |             Foo::A(bar) => Foo::B(bar),
   |                    ^^^
28 |             Foo::B(bar) => Foo::A(bar),
   |                    ^^^

There is also a similar question on SO.

Here is the helper that I came up with (based on this RFC):

fn replace_with<T, F>(dest: &mut T, mut f: F) 
where
    F: FnMut(T) -> T,
{
    let mut old_value = unsafe { std::mem::uninitialized() };
    std::mem::swap(dest, &mut old_value);  // dest is "uninitialized" (in fact, it is not touched in release mode)
    let mut new_value = f(old_value);
    std::mem::swap(dest, &mut new_value);   // dest holds new_value, and new_value is "uninitialized"
    std::mem::forget(new_value);  // since it is "uninitialized", we forget about it
}

, and then we can implement switch_variant_safe as:

    fn switch_variant_safe(&mut self) {
        replace_with(&mut self.foo, |foo| match foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        })
    }

The generated assembly for switch_variant_unsafe and switch_variant_safe (with replace_with helper) is identical in release mode.

As to the unsoundness concerns raised in #2490 (comment), in release mode, replace_with gets compiled to:

example::replace_with:
        mov     al, byte ptr [rdi]
        not     al
        and     al, 1
        mov     byte ptr [rdi], al
        ret

in fact, it gets automatically inlined unless I put #inline(never):

        mov     al, byte ptr [rsp + 7]
        not     al
        and     al, 1

Thus, there is no unsoundness in the release mode. Yet, in debug mode, there is indeed an explicit uninitialized value gets assigned to the self.foo. I have tried to implement replace_with using raw pointers, but failed at the point that f needs to take ownership ("cannot move out of dereference of raw pointer"). I have also been pointed to std::panic::UnwindSafe, yet I haven't figured out what is the proper use of it.

Another way to implement enum variant "toggle" is to pass ownership to self and return Self:

    fn switch_variant_owned(mut self) -> Self {
        self.foo = match self.foo {
            Foo::A(bar) => Foo::B(bar),
            Foo::B(bar) => Foo::A(bar),
        };
        self
    }

, but it requires the API changes all the way down to the method (i.e. you have to use this API style all the way through your codebase if it is a low-level method).

Sidenote, while the generated assembly is mostly the same (there are some variables rearrangements), there is one interesting optimization gets applied when switch_variant_owned is implemented, that is xor al, 1 instead of not al; and al, 1.

I believe, there is a need for safe and sound std::mem::replace_with. What do you think?


P.S. This more generic std::mem::replace_with can easily resolve the need for Option::replace_with covered in this RFC. Instead of the proposed:

let mut some_option: Option<i32> = Some(123);

some_option.replace_with(|old_value| consume_option_i32_and_produce_option_i32(old_value));

I would write

let mut some_option: Option<i32> = Some(123);

std::mem::replace(&mut some_option, |old_value| consume_option_i32_and_produce_option_i32(old_value));

@Ixrec
Copy link
Contributor

Ixrec commented Jan 31, 2019

@frol Since it's unclear whether or not you're already aware of this: I believe that specific use case was a major part of the discussion around #1736

For those who aren't familiar with that discussion: The big sticking point that led to its closure was that apparently the only way to make a replace_with method panic-safe is to have it abort() if the closure ever panics, which is very ungreat.

@frol
Copy link
Author

frol commented Jan 31, 2019

@Ixrec I was not aware of it. Thank you for pointing out in the right direction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants