Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline mem::size_of & mem::align_of #80631

Closed
wants to merge 1 commit into from

Conversation

tmiasko
Copy link
Contributor

@tmiasko tmiasko commented Jan 2, 2021

Opened for perf results.

cc @bjorn3
r? @ghost

@bjorn3
Copy link
Member

bjorn3 commented Jan 2, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@bors
Copy link
Contributor

bors commented Jan 2, 2021

⌛ Trying commit b1c44d3450aeef671b36faa1cdb9f5ebd78cb74b with merge d0efbf33ce34c88893bff82178722695e0031b9c...

@bors
Copy link
Contributor

bors commented Jan 3, 2021

☀️ Try build successful - checks-actions
Build commit: d0efbf33ce34c88893bff82178722695e0031b9c (d0efbf33ce34c88893bff82178722695e0031b9c)

@rust-timer
Copy link
Collaborator

Queued d0efbf33ce34c88893bff82178722695e0031b9c with parent fde6927, future comparison URL.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 3, 2021
@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (d0efbf33ce34c88893bff82178722695e0031b9c): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 3, 2021
Comment on lines 166 to +168
// SAFETY: we pass along the prerequisites of these functions to the caller
let (size, align) = unsafe { (mem::size_of_val_raw(t), mem::align_of_val_raw(t)) };
let size = size_of_val(t);
let align = align_of_val(t);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is inconsistency in safety of size_of_val_raw function and intrinsic used to implement it.

@bugadani
Copy link
Contributor

bugadani commented Jan 3, 2021

Just out of curiousity, looking at webrenderer results, how can this PR change the number of executions for certain queries?

@tmiasko
Copy link
Contributor Author

tmiasko commented Jan 3, 2021

The recurring reduction by 4 corresponds to the situation where we avoided creating any monomorphiziations of given item: size_align would be one because it is removed, but I would expect that is also have happened for some of wrapper functions in mem module.

The resolve_instance reduction by 721 corresponds to the reduction in the number of calls terminators inside codegen items. When lowering pass was first enabled (it lowers size_of intrinsic call into a simple MIR statement) this number dropped by 328. This PR removes one layer of indirection when calling both size & align, which is roughly twice that.

@tmiasko tmiasko force-pushed the inline-size-align branch from b1c44d3 to 744bdf5 Compare January 3, 2021 16:29
@tmiasko
Copy link
Contributor Author

tmiasko commented Jan 4, 2021

This approach avoids the perf regression encountered before. I would be interested in potentially landing this, although I am not sure what are general opinions about using intrinsics directly like that.

@@ -58,7 +59,7 @@ mod tests;
const INITIAL_CAPACITY: usize = 7; // 2^3 - 1
const MINIMUM_CAPACITY: usize = 1; // 2 - 1

const MAXIMUM_ZST_CAPACITY: usize = 1 << (core::mem::size_of::<usize>() * 8 - 1); // Largest possible power of two
const MAXIMUM_ZST_CAPACITY: usize = 1 << (size_of::<usize>() * 8 - 1); // Largest possible power of two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be updated to use usize::BITS

@oli-obk
Copy link
Contributor

oli-obk commented Jan 4, 2021

I don't feel comfortable with this extensive direct use of intrinsics. If we could turn intrinsics into regular function pointers, then we could do something like we did for transmute and re-export the intrinsic in mem. Since they have a different ABI, doing that without compatibility between rust and rust-intrinsic ABI would be a breaking change.

Maybe we could take a different approach and do it like

#[lang = "drop_in_place"]
#[allow(unconditional_recursion)]
pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
// Code here does not matter - this is replaced by the
// real drop glue by the compiler.
// SAFETY: see comment above
unsafe { drop_in_place(to_drop) }
}
and not have intrinsics for these at all (so they don't end up with a rust-intrinsic ABI), allowing us to take pointers to them, while still essentially treating them as intrinsics.

Note: I don't know if your approach, the ABI compat approach or the lang item approach are something we want, so maybe let's open a zulip discussion showing that there are perf improvements to be had by doing something here. Then we can MCP whatever we find consensus on, but I think we need more input by the compiler team here, as this kind of change for perf may apply to more than just size_of and align_of

@bjorn3
Copy link
Member

bjorn3 commented Jan 4, 2021

drop_in_place is codegened as a real function with real MIR. This real MIR just depends on the generic param unlike regular functions. Intrinsics are codegened inline at the caller site.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 4, 2021

Hmm right, that is a bit different. So that scheme is too different I guess. While we could make mir building automatically generate the appropriate MIR statements at all call sites to specific lang items, that is something completely new that we haven't done so far.

@tmiasko
Copy link
Contributor Author

tmiasko commented Jan 4, 2021

If we were comfortable with using intrinsics directly this seems like nice win, but otherwise I wouldn't consider it to merit an extra compiler work, except for continued efforts towards enabling MIR inlining by default.

The area that does need compiler work are intrinsic wrappers in stdarch where overhead in the range of hundreds basic blocks is typical, in some cases going up to thousands of unnecessary basic blocks.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 4, 2021

The area that does need compiler work are intrinsic wrappers in stdarch where overhead in the range of hundreds basic blocks is typical, in some cases going up to thousands of unnecessary basic blocks.

Do you think the problem is that we build these blocks at all, or could we just run an early mir opt to inline all intrinsic wrappers?

@tmiasko
Copy link
Contributor Author

tmiasko commented Jan 4, 2021

Do you think the problem is that we build these blocks at all, or could we just run an early mir opt to inline all intrinsic wrappers?

For the details regarding stdarch situation see rust-lang/stdarch#248.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 4, 2021

oof. that's a whole different situation imo. The stdarch situation is about large function bodies. Here we just have a trivial body forwarding to an intrinsic, so we could just as well have no body and make the function the intrinsic. Not sure how well that goes with them actually being intrinsics, but similar to the Box::new -> box optimization suggested on zulip, we could make mem::size_of a lang item and do some special magic on calls to it.

@tmiasko
Copy link
Contributor Author

tmiasko commented Jan 4, 2021

Anyway, since there are reservation about using intrinsics directly I think that answers question about this proposal.

(I feel you might have misinterpreted my tangential comment about stdarch. I wasn't commenting about similarities, quite the contrary. I think the wrappers in stdarch would benefit for extra compiler work to make those functions trivial, unlike the wrappers here which I don't think would).

@tmiasko tmiasko closed this Jan 4, 2021
@tmiasko tmiasko deleted the inline-size-align branch January 4, 2021 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants