Let compiler assume, that `Layout::align()` is always a power of two #75281

TimDiekmann · 2020-08-08T06:39:38Z

A perf-run may be useful. I don't expect a great speed up there though, as Layouts created with Layout::new() are known at compile time and this is always (almost?) used in std.

Fixes #75264

rust-highfive · 2020-08-08T06:39:41Z

r? @LukasKalbertodt

(rust_highfive has picked a reviewer for you, use r? to override)

LukasKalbertodt · 2020-08-08T14:17:33Z

Good idea. Let's just try.

@bors try @rust-timer queue

rust-timer · 2020-08-08T14:17:34Z

Awaiting bors try build completion

bors · 2020-08-08T14:17:44Z

⌛ Trying commit 82cbd18 with merge c6f0e225a4165554133f9072026caea2e61861ff...

bors · 2020-08-08T14:58:16Z

☀️ Try build successful - checks-actions, checks-azure
Build commit: c6f0e225a4165554133f9072026caea2e61861ff (c6f0e225a4165554133f9072026caea2e61861ff)

rust-timer · 2020-08-08T14:58:17Z

Queued c6f0e225a4165554133f9072026caea2e61861ff with parent c989ac1, future comparison URL.

rust-timer · 2020-08-08T16:14:49Z

Finished benchmarking try commit (c6f0e225a4165554133f9072026caea2e61861ff): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never

LukasKalbertodt · 2020-08-08T16:28:28Z

A couple of cases have regressed by at most a few percent. I would almost say the differences are all just noise. But I am not sure, I don't really have experience with interpreting these perf results. But yeah, apart from the debug_assert, I can't possibly imagine how your change makes stuff slower...

Should we ping some people who can help interpret those results?

TimDiekmann · 2020-08-08T16:32:19Z

I could remove the debug-assertion and do another perf-run. Actually it just double checking if Layout::from_size_align_unchecked wasn't used wrong so maybe there should be an assertion instead. Or we'll leave it out altogether.

@Amanieu You may want to take a look at this?

I don't really have experience with interpreting these perf results

Me neither, I always hope everything is green 😄

Amanieu · 2020-08-08T17:00:22Z

I don't think we gain anything from this: in the standard library we are careful to always use from_size_align_unchecked when dealing with alignment that are known to be a power of 2 to avoid the extra check. And at this point I don't think anyone is using Layout outside of the standard library.

Mark-Simulacrum · 2020-08-08T17:15:27Z

The debug assert shouldn't matter (std in perf doesn't have them on), but the assume and check is likely causing more code to be generated or inclined - e.g. query counts here changed quite a bit: https://perf.rust-lang.org/detailed-query.html?commit=c6f0e225a4165554133f9072026caea2e61861ff&base_commit=c989ac132a81503ceaaa1054da0a3a8c5c305ef0&benchmark=issue-46449-opt&run_name=full

TimDiekmann · 2020-08-09T11:38:04Z

And at this point I don't think anyone is using Layout outside of the standard library.

I don't think this is a true. Layout is stable and anyone implementing GlobalAlloc deals with it. Additionally the allocator-API is using this type. It might be true, that often this path can be optimized away as a layout is used, which is known at compile time using Layout::new<T>.

the assume and check is likely causing more code to be generated or inclined

I also think this is the main reason, why there are a very few small regressions. Is there a change, that intrinsics::assume will work better once it's const?

LukasKalbertodt · 2020-08-10T23:05:59Z

And at this point I don't think anyone is using Layout outside of the standard library.

I do use it in a crate of mine (and already for quite some time). However, I do not call align().

I am not quite sure how to proceed with this PR :/
Should we ping more people to better judge if the regressions are real or noise? Or do we just close as a change only motivated by performance didn't show any performance advantages? Do we even have practical examples where we would expect a performance improvement? What exactly does the knowledge of an integer being power-of-2 give us (except eliding checks that check exactly that)? And maybe this all does not matter anyway because it is always next to allocation code which takes a lot more time anyway?

Amanieu · 2020-08-10T23:07:32Z

I'm personally tending towards closing this PR. I don't think we should be complicating the code with optimizations that don't actually improve performance in practice.

TimDiekmann · 2020-08-11T05:11:42Z

What exactly does the knowledge of an integer being power-of-2 give us?

It's very fast to align a pointer to a power of two or round it down to the previous power of two using bit operations. size & !(align - 1) removes the last bits and will never wrap, as align is always > 0 (0 is not a power of two), (ptr + align - 1) & !(align - 1) aligns up a pointer to align and only ptr + align has to be checked for wrapping.

However, this optimiziation is only useful if you have to check usize::is_power_of_two and if you have check it at runtime. In std, most of the code uses Layout::new (or mem::size_of/align_of), which takes T as parameter, so the compiler implies this check. As soon as you use Layout::from_size_align at runtime, the check has to be made. As @Amanieu pointed out, std uses the unsafe counterpart Layout::from_size_align_unchecked whenever possible to bypasses this check (and the check to avoid wrapping size + align mentioned above).

And maybe this all does not matter anyway because it is always next to allocation code which takes a lot more time anyway?

While this is true for most allocation code, there a still allocations in the hot path using primitive, very fast allocators like stack allocators (e.g. bumpalo or Region). For the latter, the allocation code is this, I thing bumpalo's allocator is similar. Note, that there are only two branches:

fn alloc(layout) -> NonNull<[u8]> {
    let new_ptr = self.ptr.checked_sub(layout.size()).ok_or(AllocErr)?;
    let aligned_ptr = new_ptr & !(layout.align() - 1);

    if unlikely(aligned_ptr < self.start()) {
        return Err(AllocErr);
    }

    let ptr = unsafe { NonNull::new_unchecked(ptr as *mut u8) };
    let memory = NonNull::slice_from_raw_parts(ptr, self.ptr - aligned_ptr);
    self.ptr = aligned_ptr;
    Ok(memory)
}

Amanieu · 2020-09-18T14:55:47Z

Considering the disappointing perf results, I'm going to close this.

Let compiler assume, that Layout::align() is always a power of two

82cbd18

rust-highfive assigned LukasKalbertodt Aug 8, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 8, 2020

crlf0710 added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2020

Dylan-DPC-zz added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 18, 2020

Amanieu closed this Sep 18, 2020

TimDiekmann mentioned this pull request Feb 10, 2021

[RFC] Add allocators for zero-copy conversion from Box<T> into Rc<T> #80273

Closed

Let compiler assume, that Layout::align() is always a power of two #75281

Let compiler assume, that Layout::align() is always a power of two #75281

Uh oh!

Conversation

TimDiekmann commented Aug 8, 2020

Uh oh!

rust-highfive commented Aug 8, 2020

Uh oh!

LukasKalbertodt commented Aug 8, 2020

Uh oh!

rust-timer commented Aug 8, 2020

Uh oh!

bors commented Aug 8, 2020

Uh oh!

bors commented Aug 8, 2020

Uh oh!

rust-timer commented Aug 8, 2020

Uh oh!

rust-timer commented Aug 8, 2020

Uh oh!

LukasKalbertodt commented Aug 8, 2020

Uh oh!

TimDiekmann commented Aug 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amanieu commented Aug 8, 2020

Uh oh!

Mark-Simulacrum commented Aug 8, 2020

Uh oh!

TimDiekmann commented Aug 9, 2020

Uh oh!

LukasKalbertodt commented Aug 10, 2020

Uh oh!

Amanieu commented Aug 10, 2020

Uh oh!

TimDiekmann commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amanieu commented Sep 18, 2020

Uh oh!

Uh oh!

Let compiler assume, that `Layout::align()` is always a power of two #75281

Let compiler assume, that `Layout::align()` is always a power of two #75281

TimDiekmann commented Aug 8, 2020 •

edited

Loading

TimDiekmann commented Aug 11, 2020 •

edited

Loading