-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make some trivial functions #[inline(always)]
#105262
Conversation
r? @thomcc (rustbot has picked a reviewer for you, use r? to override) |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 2281b6aa29437c0ea7d5a8d84ed3269972b054f0 with merge 60f5d77084c24f92ebfb9d217e70cb687029f648... |
Why isn't #[inline] enough there? |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (60f5d77084c24f92ebfb9d217e70cb687029f648): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
These functions are not all equally trivial, and |
Splitting the PR seems to be a good idea. |
@@ -185,7 +185,7 @@ mod impls { | |||
#[stable(feature = "rust1", since = "1.0.0")] | |||
#[rustc_const_unstable(feature = "const_clone", issue = "91805")] | |||
impl const Clone for $t { | |||
#[inline] | |||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably does less than you'd expect, because of
fn combine_primitive_clone( |
…scottmcm Make integer-to-integer `From` impls `#[inline(always)]` Splited from rust-lang#105262
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so super sorry for the delay, I've been unfortunately busy.
Anyway. This is a mixed bag. The biggest advice I'd have would be:
- Keep in mind that our inlining (and thus its cost model) works bottom-up not top-down (even though thinking about it as top-down is much easier), so
inline(always)
on a trivial wrapper around a plausibly-large function is bad. - A lot of the operations have much more involved internals than you might expect, which can have a big impact, sadly.
[T]::len()
isn't a field access, and many unsafe ops come with some debug assertions and const eval machinery in certain cases, which may impact the compile time (or may not).
Anyway, here's my summary of what I'd like to see:
Ones that can stay
- A lot of these are identities or clearly trivial conversions. Those are clearly fine. This includes
addr
,expose_addr
, all the literal identites, the pointer derefs, casts, etc.
Ones that should be split into their own (ideally separate?) PRs
-
For the slice/str/ptr
len
s, this ends up going through a big chain like{NonNull::<[T]>,str,<[T]>}::len
=><*const [T]>::len
=>ptr::metadata::<[T]>
and so on. All these would probably need to be inlined for it to help. You should probably put this and the otherlen()
PRs in a different PR just forlen()
and relevant metadata functions. -
Most of the pointer arithmetic (especially
byte_sub
) does considerably more. It'd be a bug for it to be not inlined in a release build, but could have some big compile impact, since they're used all over the place.
Ones that should (IMO) be removed
- The blanket
Into::into()
I think it'd be a mistake given our inline cost model. So, that one's a no (this one has come up before too). - Same for
copy
andcopy_to_nonoverlapping
-- also seem to be bad choices.
If you're wondering about the paranoia here around compile times -- |
@rustbot author Please push this back into my queue when you've taken care of that with |
⌛ Testing commit 00e7b54 with merge 18c1d50e807c30ebfd4bb58035f85f7a7529e5df... |
💔 Test failed - checks-actions |
I am not sure what is is going on here
|
Uhhhhhhhh no idea. It does say "timeout" in it, so.... @bors retry |
Make integer-to-integer `From` impls `#[inline(always)]` Splited from rust-lang/rust#105262
☀️ Test successful - checks-actions |
Finished benchmarking commit (f058493): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
The job Click to see the possible cause of the failure (guessed by this bot)
|
… r=Mark-Simulacrum Make pointer `sub` and `wrapping_sub` methods `#[inline(always)]` Splitted from rust-lang#105262
OTOH, not inlining trivial functions means that the binary size goes through the roof in debug builds which is very troublesome when a trivial program (slightly more complex than hello world) can't fit into the flash memory of an MCU (say 16kB).
For reference, here's the "--release" build:
It is a quite a footgun that the default configuration of rustc often produces binaries which will either not even link or if they do frequently fail to run correctly due to being so slow that certain timing requirements (like for USB) cannot be achieved. IMNSHO trivial operations on intrinsic types should always be inlined and optimized "no questions asked" when code generation is enabled, despite adding some additional compile time. There's zero value in emitting functions which no-one is ever going to set a breakpoint on in a compiler and where the call is more expensive than the operation itself. NB: I'm one of the persons behind |
Make some trivial functions `#[inline(always)]` This is some kind of follow-up of PRs like rust-lang#85218, rust-lang#84061, rust-lang#87150. Functions that do very basic operations are made `#[inline(always)]` to avoid pessimizing them in debug builds when compared to using built-in operations directly.
Make integer-to-integer `From` impls `#[inline(always)]` Splited from rust-lang/rust#105262
…imulacrum Make pointer `sub` and `wrapping_sub` methods `#[inline(always)]` Splitted from rust-lang/rust#105262
This is some kind of follow-up of PRs like #85218, #84061, #87150. Functions that do very basic operations are made
#[inline(always)]
to avoid pessimizing them in debug builds when compared to using built-in operations directly.