Inline some Cursor calls for slices #60656

petertodd · 2019-05-09T02:55:46Z

(Partially) brings back #33921

I've noticed in some serialization code I was writing that writes to slices produce much, much, worse code than you'd expect even with optimizations turned on. For example, you'd expect something like this to be zero cost:

use std::io::{self, Cursor, Write};

pub fn serialize((a, b): (u64, u64)) -> [u8;8+8] {
    let mut r = [0u8;16];
    {
        let mut w = Cursor::new(&mut r[..]);

        w.write(&a.to_le_bytes()).unwrap();
        w.write(&b.to_le_bytes()).unwrap();
    }
    r
}

...but it compiles down to dozens of instructions because the slice_write() calls aren't inlined, which in turn means unwrap() can't be optimized away, and so on.

To be clear, this pull-req isn't sufficient by itself: if we want to go down that path we also need to add #[inline]'s to the default implementations for functions like write_all() in the Write trait and so on, or implement them separately in the Cursor impls. But I figured I'd start a conversation about what tradeoffs we're expecting here.

(Partially) brings back rust-lang#33921

rust-highfive · 2019-05-09T02:55:57Z

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

sfackler · 2019-05-09T03:44:01Z

These inlines seem plausible to me - what does the assembly look like after this change?

petertodd · 2019-05-09T08:55:13Z

For that specific example:

$ cargo asm foo::serialize
foo::serialize:
 sub     rsp, 16
 mov     rax, rdi
 mov     qword, ptr, [rsp], rsi
 mov     qword, ptr, [rsp, +, 8], rdx
 mov     rcx, qword, ptr, [rsp]
 mov     qword, ptr, [rdi], rcx
 mov     rcx, qword, ptr, [rsp, +, 8]
 mov     qword, ptr, [rdi, +, 8], rcx
 add     rsp, 16
 ret

vs before:

$ cargo asm foo::serialize
foo::serialize:
 push    r15
 push    r14
 push    rbx
 sub     rsp, 96
 mov     r15, rdx
 mov     rbx, rdi
 xorps   xmm0, xmm0
 movaps  xmmword, ptr, [rsp, +, 80], xmm0
 lea     rdx, [rsp, +, 80]
 mov     qword, ptr, [rsp, +, 56], rdx
 lea     r14, [rsp, +, 72]
 mov     eax, 16
 movq    xmm0, rax
 movdqu  xmmword, ptr, [rsp, +, 64], xmm0
 mov     qword, ptr, [rsp, +, 8], rsi
 lea     rdi, [rsp, +, 32]
 lea     r8, [rsp, +, 8]
 mov     ecx, 16
 mov     r9d, 8
 mov     rsi, r14
 call    qword, ptr, [rip, +, _ZN3std2io6cursor11slice_write17h3193265db7206f0bE@GOTPCREL]
 cmp     qword, ptr, [rsp, +, 32], 1
 je      .LBB5_3
 mov     qword, ptr, [rsp, +, 8], r15
 mov     rdx, qword, ptr, [rsp, +, 56]
 mov     rcx, qword, ptr, [rsp, +, 64]
 lea     rdi, [rsp, +, 32]
 lea     r8, [rsp, +, 8]
 mov     r9d, 8
 mov     rsi, r14
 call    qword, ptr, [rip, +, _ZN3std2io6cursor11slice_write17h3193265db7206f0bE@GOTPCREL]
 cmp     qword, ptr, [rsp, +, 32], 1
 je      .LBB5_3
 movaps  xmm0, xmmword, ptr, [rsp, +, 80]
 movups  xmmword, ptr, [rbx], xmm0
 mov     rax, rbx
 add     rsp, 96
 pop     rbx
 pop     r14
 pop     r15
 ret
.LBB5_3:
 movups  xmm0, xmmword, ptr, [rsp, +, 40]
 movaps  xmmword, ptr, [rsp, +, 16], xmm0
 lea     rdi, [rsp, +, 16]
 call    core::result::unwrap_failed
 ud2

Like I said, if we're going to merge this we should decide what category in general should be inlined and I should go through the full set for Read, Seek, etc.

sfackler · 2019-05-09T14:30:21Z

Ok awesome, that looks good.

The general policy is to inline functions in cases where we see concrete improvements from doing so.

@bors r+

bors · 2019-05-09T14:30:24Z

📌 Commit b9c4301 has been approved by sfackler

…lice, r=sfackler Inline some Cursor calls for slices (Partially) brings back rust-lang#33921 I've noticed in some serialization code I was writing that writes to slices produce much, much, worse code than you'd expect even with optimizations turned on. For example, you'd expect something like this to be zero cost: ``` use std::io::{self, Cursor, Write}; pub fn serialize((a, b): (u64, u64)) -> [u8;8+8] { let mut r = [0u8;16]; { let mut w = Cursor::new(&mut r[..]); w.write(&a.to_le_bytes()).unwrap(); w.write(&b.to_le_bytes()).unwrap(); } r } ``` ...but it compiles down to [dozens of instructions](https://rust.godbolt.org/z/bdwDzb) because the `slice_write()` calls aren't inlined, which in turn means `unwrap()` can't be optimized away, and so on. To be clear, this pull-req isn't sufficient by itself: if we want to go down that path we also need to add `#[inline]`'s to the default implementations for functions like `write_all()` in the `Write` trait and so on, or implement them separately in the `Cursor` impls. But I figured I'd start a conversation about what tradeoffs we're expecting here.

@ghost

Rollup of 5 pull requests Successful merges: - #60601 (Add a `cast` method to raw pointers.) - #60638 (pin: make the to-module link more visible) - #60647 (cleanup: Remove `DefIndexAddressSpace`) - #60656 (Inline some Cursor calls for slices) - #60657 (Stabilize and re-export core::array in std) Failed merges: r? @ghost

petertodd · 2019-05-09T22:15:25Z

Thanks!

I reviewed the rest of the slice cursor API and it looks like it all optimizes fine actually with this change due to how it's written as generic implementations. So unless I find something else we're done re: slices.

Vec has the same issue too. But there the optimization story is more complex as you'd have to be pre-allocating and what not for optimizations to really kick in, at which point using slices probably makes more sense. Not a problem for my code at least. :)

Inline some Cursor calls for slices

b9c4301

(Partially) brings back rust-lang#33921

rust-highfive assigned sfackler May 9, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 9, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 9, 2019

Centril mentioned this pull request May 9, 2019

Rollup of 5 pull requests #60672

Merged

bors merged commit b9c4301 into rust-lang:master May 9, 2019

petertodd deleted the 2019-inline-cursor-over-slice branch May 9, 2019 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inline some Cursor calls for slices #60656

Inline some Cursor calls for slices #60656

Uh oh!

petertodd commented May 9, 2019 •

edited

Loading

Uh oh!

rust-highfive commented May 9, 2019

Uh oh!

sfackler commented May 9, 2019

Uh oh!

petertodd commented May 9, 2019

Uh oh!

sfackler commented May 9, 2019

Uh oh!

bors commented May 9, 2019

Uh oh!

petertodd commented May 9, 2019

Uh oh!

Uh oh!

Inline some Cursor calls for slices #60656

Inline some Cursor calls for slices #60656

Uh oh!

Conversation

petertodd commented May 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented May 9, 2019

Uh oh!

sfackler commented May 9, 2019

Uh oh!

petertodd commented May 9, 2019

Uh oh!

sfackler commented May 9, 2019

Uh oh!

bors commented May 9, 2019

Uh oh!

petertodd commented May 9, 2019

Uh oh!

Uh oh!

petertodd commented May 9, 2019 •

edited

Loading