-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline mem::size_of & mem::align_of #80631
Conversation
@bors try @rust-timer queue |
Awaiting bors try build completion. |
⌛ Trying commit b1c44d3450aeef671b36faa1cdb9f5ebd78cb74b with merge d0efbf33ce34c88893bff82178722695e0031b9c... |
☀️ Try build successful - checks-actions |
Queued d0efbf33ce34c88893bff82178722695e0031b9c with parent fde6927, future comparison URL. @rustbot label: +S-waiting-on-perf |
Finished benchmarking try commit (d0efbf33ce34c88893bff82178722695e0031b9c): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
// SAFETY: we pass along the prerequisites of these functions to the caller | ||
let (size, align) = unsafe { (mem::size_of_val_raw(t), mem::align_of_val_raw(t)) }; | ||
let size = size_of_val(t); | ||
let align = align_of_val(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is inconsistency in safety of size_of_val_raw function and intrinsic used to implement it.
Just out of curiousity, looking at webrenderer results, how can this PR change the number of executions for certain queries? |
The recurring reduction by 4 corresponds to the situation where we avoided creating any monomorphiziations of given item: The |
b1c44d3
to
744bdf5
Compare
This approach avoids the perf regression encountered before. I would be interested in potentially landing this, although I am not sure what are general opinions about using intrinsics directly like that. |
@@ -58,7 +59,7 @@ mod tests; | |||
const INITIAL_CAPACITY: usize = 7; // 2^3 - 1 | |||
const MINIMUM_CAPACITY: usize = 1; // 2 - 1 | |||
|
|||
const MAXIMUM_ZST_CAPACITY: usize = 1 << (core::mem::size_of::<usize>() * 8 - 1); // Largest possible power of two | |||
const MAXIMUM_ZST_CAPACITY: usize = 1 << (size_of::<usize>() * 8 - 1); // Largest possible power of two |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be updated to use usize::BITS
I don't feel comfortable with this extensive direct use of intrinsics. If we could turn intrinsics into regular function pointers, then we could do something like we did for Maybe we could take a different approach and do it like rust/library/core/src/ptr/mod.rs Lines 177 to 185 in ab5b9ae
rust-intrinsic ABI), allowing us to take pointers to them, while still essentially treating them as intrinsics.
Note: I don't know if your approach, the ABI compat approach or the lang item approach are something we want, so maybe let's open a zulip discussion showing that there are perf improvements to be had by doing something here. Then we can MCP whatever we find consensus on, but I think we need more input by the compiler team here, as this kind of change for perf may apply to more than just |
|
Hmm right, that is a bit different. So that scheme is too different I guess. While we could make mir building automatically generate the appropriate MIR statements at all call sites to specific lang items, that is something completely new that we haven't done so far. |
If we were comfortable with using intrinsics directly this seems like nice win, but otherwise I wouldn't consider it to merit an extra compiler work, except for continued efforts towards enabling MIR inlining by default. The area that does need compiler work are intrinsic wrappers in stdarch where overhead in the range of hundreds basic blocks is typical, in some cases going up to thousands of unnecessary basic blocks. |
Do you think the problem is that we build these blocks at all, or could we just run an early mir opt to inline all intrinsic wrappers? |
For the details regarding stdarch situation see rust-lang/stdarch#248. |
oof. that's a whole different situation imo. The |
Anyway, since there are reservation about using intrinsics directly I think that answers question about this proposal. (I feel you might have misinterpreted my tangential comment about stdarch. I wasn't commenting about similarities, quite the contrary. I think the wrappers in stdarch would benefit for extra compiler work to make those functions trivial, unlike the wrappers here which I don't think would). |
Opened for perf results.
cc @bjorn3
r? @ghost