Inline mem::size_of & mem::align_of #80631

tmiasko · 2021-01-02T22:38:44Z

Opened for perf results.

cc @bjorn3
r? @ghost

bjorn3 · 2021-01-02T23:10:44Z

@bors try @rust-timer queue

rust-timer · 2021-01-02T23:10:45Z

Awaiting bors try build completion.

bors · 2021-01-02T23:10:55Z

⌛ Trying commit b1c44d3450aeef671b36faa1cdb9f5ebd78cb74b with merge d0efbf33ce34c88893bff82178722695e0031b9c...

bors · 2021-01-03T00:11:17Z

☀️ Try build successful - checks-actions
Build commit: d0efbf33ce34c88893bff82178722695e0031b9c (d0efbf33ce34c88893bff82178722695e0031b9c)

rust-timer · 2021-01-03T00:11:18Z

Queued d0efbf33ce34c88893bff82178722695e0031b9c with parent fde6927, future comparison URL.

@rustbot label: +S-waiting-on-perf

rust-timer · 2021-01-03T03:15:00Z

Finished benchmarking try commit (d0efbf33ce34c88893bff82178722695e0031b9c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

tmiasko · 2021-01-03T12:00:04Z

library/core/src/alloc/layout.rs

        // SAFETY: we pass along the prerequisites of these functions to the caller
-        let (size, align) = unsafe { (mem::size_of_val_raw(t), mem::align_of_val_raw(t)) };
+        let size = size_of_val(t);
+        let align = align_of_val(t);


There is inconsistency in safety of size_of_val_raw function and intrinsic used to implement it.

bugadani · 2021-01-03T12:02:50Z

Just out of curiousity, looking at webrenderer results, how can this PR change the number of executions for certain queries?

tmiasko · 2021-01-03T13:12:47Z

The recurring reduction by 4 corresponds to the situation where we avoided creating any monomorphiziations of given item: size_align would be one because it is removed, but I would expect that is also have happened for some of wrapper functions in mem module.

The resolve_instance reduction by 721 corresponds to the reduction in the number of calls terminators inside codegen items. When lowering pass was first enabled (it lowers size_of intrinsic call into a simple MIR statement) this number dropped by 328. This PR removes one layer of indirection when calling both size & align, which is roughly twice that.

tmiasko · 2021-01-04T00:09:13Z

This approach avoids the perf regression encountered before. I would be interested in potentially landing this, although I am not sure what are general opinions about using intrinsics directly like that.

the8472 · 2021-01-04T04:42:25Z

library/alloc/src/collections/vec_deque/mod.rs

@@ -58,7 +59,7 @@ mod tests;
 const INITIAL_CAPACITY: usize = 7; // 2^3 - 1
 const MINIMUM_CAPACITY: usize = 1; // 2 - 1

-const MAXIMUM_ZST_CAPACITY: usize = 1 << (core::mem::size_of::<usize>() * 8 - 1); // Largest possible power of two
+const MAXIMUM_ZST_CAPACITY: usize = 1 << (size_of::<usize>() * 8 - 1); // Largest possible power of two


This could be updated to use usize::BITS

oli-obk · 2021-01-04T17:49:43Z

I don't feel comfortable with this extensive direct use of intrinsics. If we could turn intrinsics into regular function pointers, then we could do something like we did for transmute and re-export the intrinsic in mem. Since they have a different ABI, doing that without compatibility between rust and rust-intrinsic ABI would be a breaking change.

Maybe we could take a different approach and do it like

rust/library/core/src/ptr/mod.rs

Lines 177 to 185 in ab5b9ae

    
           #[lang = "drop_in_place"] 
        
           #[allow(unconditional_recursion)] 
        
           pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) { 
        
               // Code here does not matter - this is replaced by the 
        
               // real drop glue by the compiler. 
        
               // SAFETY: see comment above 
        
               unsafe { drop_in_place(to_drop) } 
        
           }

and not have intrinsics for these at all (so they don't end up with a rust-intrinsic ABI), allowing us to take pointers to them, while still essentially treating them as intrinsics.

Note: I don't know if your approach, the ABI compat approach or the lang item approach are something we want, so maybe let's open a zulip discussion showing that there are perf improvements to be had by doing something here. Then we can MCP whatever we find consensus on, but I think we need more input by the compiler team here, as this kind of change for perf may apply to more than just size_of and align_of

bjorn3 · 2021-01-04T17:58:21Z

drop_in_place is codegened as a real function with real MIR. This real MIR just depends on the generic param unlike regular functions. Intrinsics are codegened inline at the caller site.

oli-obk · 2021-01-04T18:04:49Z

Hmm right, that is a bit different. So that scheme is too different I guess. While we could make mir building automatically generate the appropriate MIR statements at all call sites to specific lang items, that is something completely new that we haven't done so far.

tmiasko · 2021-01-04T19:05:22Z

If we were comfortable with using intrinsics directly this seems like nice win, but otherwise I wouldn't consider it to merit an extra compiler work, except for continued efforts towards enabling MIR inlining by default.

The area that does need compiler work are intrinsic wrappers in stdarch where overhead in the range of hundreds basic blocks is typical, in some cases going up to thousands of unnecessary basic blocks.

oli-obk · 2021-01-04T19:09:35Z

The area that does need compiler work are intrinsic wrappers in stdarch where overhead in the range of hundreds basic blocks is typical, in some cases going up to thousands of unnecessary basic blocks.

Do you think the problem is that we build these blocks at all, or could we just run an early mir opt to inline all intrinsic wrappers?

tmiasko · 2021-01-04T19:22:45Z

Do you think the problem is that we build these blocks at all, or could we just run an early mir opt to inline all intrinsic wrappers?

For the details regarding stdarch situation see rust-lang/stdarch#248.

oli-obk · 2021-01-04T22:09:07Z

oof. that's a whole different situation imo. The stdarch situation is about large function bodies. Here we just have a trivial body forwarding to an intrinsic, so we could just as well have no body and make the function the intrinsic. Not sure how well that goes with them actually being intrinsics, but similar to the Box::new -> box optimization suggested on zulip, we could make mem::size_of a lang item and do some special magic on calls to it.

tmiasko · 2021-01-04T22:45:19Z

Anyway, since there are reservation about using intrinsics directly I think that answers question about this proposal.

(I feel you might have misinterpreted my tangential comment about stdarch. I wasn't commenting about similarities, quite the contrary. I think the wrappers in stdarch would benefit for extra compiler work to make those functions trivial, unlike the wrappers here which I don't think would).

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 3, 2021

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 3, 2021

tmiasko commented Jan 3, 2021

View reviewed changes

Inline mem::size_of & mem::align_of

744bdf5

tmiasko force-pushed the inline-size-align branch from b1c44d3 to 744bdf5 Compare January 3, 2021 16:29

the8472 reviewed Jan 4, 2021

View reviewed changes

tmiasko closed this Jan 4, 2021

tmiasko deleted the inline-size-align branch January 4, 2021 22:45

Inline mem::size_of & mem::align_of #80631

Inline mem::size_of & mem::align_of #80631

Uh oh!

Conversation

tmiasko commented Jan 2, 2021

Uh oh!

bjorn3 commented Jan 2, 2021

Uh oh!

rust-timer commented Jan 2, 2021

Uh oh!

bors commented Jan 2, 2021

Uh oh!

bors commented Jan 3, 2021

Uh oh!

rust-timer commented Jan 3, 2021

Uh oh!

rust-timer commented Jan 3, 2021

Uh oh!

tmiasko Jan 3, 2021

Choose a reason for hiding this comment

Uh oh!

bugadani commented Jan 3, 2021

Uh oh!

tmiasko commented Jan 3, 2021

Uh oh!

tmiasko commented Jan 4, 2021

Uh oh!

the8472 Jan 4, 2021

Choose a reason for hiding this comment

Uh oh!

oli-obk commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjorn3 commented Jan 4, 2021

Uh oh!

oli-obk commented Jan 4, 2021

Uh oh!

tmiasko commented Jan 4, 2021

Uh oh!

oli-obk commented Jan 4, 2021

Uh oh!

tmiasko commented Jan 4, 2021

Uh oh!

oli-obk commented Jan 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmiasko commented Jan 4, 2021

Uh oh!

Uh oh!

oli-obk commented Jan 4, 2021 •

edited

Loading

oli-obk commented Jan 4, 2021 •

edited

Loading