Add `#[inline]` to functions which were missing it, and `#[track_caller]` to ones with runtime panics from user input #347

thomcc · 2023-05-07T15:03:08Z

Rationale for inlining is basically twofold. First: even though #[inline] doesn't help the LLVM inliner for generics, it apparently can help the MIR inliner (and the lack of it has been commented on before by @saethlin, at least).

Second, and perhaps more controversially, this library depends on inlining in a way that basically no other part of the stdlib does -- failing to inline most of these functions kind of defeats the point of the design we've chosen.

Concretely, the simd_foo intrinsics this library will codegen to take advantage of whatever vector operations are allowed in the function where they finally get codegenned (e.g. after all inlining occurs).

This means that if we define our high-level wrappers for some simd_foo intrinsic and the wrapper gets called from a downstream users #[target_feature(enable="foo")] function (for a relevant feature), then failing to inline that wrapper into the call-site is squandering the approach we've gained with by using the simd_foo intrinsic design.

Arguably, this justifies use of #[inline(always)]. I am in favor of that (it would also could help core::simd-using code have improved performance in unoptimized builds compared to core::arch use (which is absurdly slow without opts). That said, I didn't bother since that can be done later, and might be better as just some extra tuning on the MIR inliner. Not to mention, we support simd ops on unreasonably huge vectors and it seems completely unreasonable to use #[inline(always)] on functions of Simd<f64, 4096> or whatever.

And I also added a clippy warning for missing inline on public functions (although private functions that are called from public functions should get ideally get #[inline] too in most cases).

The rationale for track_caller should be obvious (much easier for users to track down the issue). While track_caller can have a very small perf impact, it does not matter here (the overhead it adds is one extra pointer argument passed into the function, but we expect these to be inlined, so that argument doesn't need to be passed).

saethlin · 2023-05-07T15:07:47Z

FYI since rust-lang/rust#109247, MIR inlining is permitted without #[inline], but the attribute still doubles(!) the inline cost threshold. The MIR inliner also now ignores the size of locals (rust-lang/rust#110705), so Simd<f64, 4096> does not become a bad inlining candidate on account of its size. It will still be a bad inlining candidate if its size results in a lot of statements.

thomcc · 2023-05-07T15:16:07Z

Yeah that makes sense. Even if it's not a hard block for the MIR inliner now, these are things that really really should be inlined.

calebzulawski · 2023-05-07T15:32:20Z

I think this is uncontroversial and mostly oversights. That lint is helpful

andy-thomason · 2023-05-07T17:22:54Z

Very sensible. It is a shame that rust does not inline methods by default like C++ does.

saethlin · 2023-05-07T19:51:24Z

It is a shame that rust does not inline methods by default like C++ does.

By default, Rust has exactly the same inlining behaviors as C++ does. Removing indirect calls from final codegen is not what is being discussed here.

The desirable effect here is from the MIR inliner, which runs before rustc hands off codegen to LLVM, because the LLVM optimizer (which was designed for C/C++ and written in C++) seems to de-optimize unless we do this.

crates/core_simd/src/vector.rs

andy-thomason · 2023-05-08T12:16:38Z

By default, Rust has exactly the same inlining behaviors as C++ does.

Experienced C++ users will generally build the entire project as a single compilation unit
which makes significant improvements to build time and code performance by enabling
IR reductions early in the lowering process. For example, I did a little example for the LLVM
group where I built LLVM from scratch nearly a thousand times faster than the default
build as less than ten CUs. The resulting code was also faster and smaller. With modern
computers, you could build LLVM in a single CU.

If we do a default build with Rust (i.e. large numbers of CUs), we get the same slow compilation time
and loss of performance from both lack of inlining and the inability of the compiler to do global optimisation
by introspection.

Of course inline semantics are more than just a hint to the compiler. In C++ they promote functions to
their own sections and enable the linker to remove duplicates.

I know that calling on all functions to be inlinable will fall on deaf ears, but it would be a simple experiment
to try. You never know, you might like the result.

bjorn3 · 2023-05-09T14:49:25Z

In C++ I think it is more that building as a single unit allows not parsing and typechecking headers again for every source file. Rustc already doesn't do this. As for the backend side, rustc in fact intentionally splits a single crate into multiple codegen units to improve compilation time by allowing parallelism. Cross crate inlining without #[inline] is supported to through LTO, but this is disabled by default because it is much slower. There is fat LTO which is effectively merging all codegen units of all crates into a single module and then optimizing it as a whole, which is really slow. And there is thin LTO, where only a summary is extracted from each module before optimizing each codegen unit in parallel using the combination of all summaries. This is much faster, but still slower than not doing any LTO. For release builds we default to thin local LTO which is a variant on thin LTO where only the codegen units within a single crate are taken into account. In the past we used a single codegen unit in release mode, but thin local LTO is much faster while barely regressing performance.

Of course inline semantics are more than just a hint to the compiler. In C++ they promote functions to
their own sections and enable the linker to remove duplicates.

In rust we put every function in their own section and tell LLVM that it is allowed to merge functions. Unless you enable ICF for the linker this doesn't allow merging identical functions on the linker side, but LLVM does merge functions within a codegen unit and the linker will remove all unused functions because we use --gc-sections.

…er]` to ones with runtime panics from user input

calebzulawski approved these changes May 7, 2023

View reviewed changes

programmerjake reviewed May 7, 2023

View reviewed changes

crates/core_simd/src/vector.rs Show resolved Hide resolved

thomcc and others added 2 commits May 21, 2023 14:10

Add #[inline] to functions which were missing it, and `#[track_call…

c0b7df5

…er]` to ones with runtime panics from user input

Re-add missing #[must_use]

0315db3

calebzulawski force-pushed the attrs branch from 70506e5 to 0315db3 Compare May 21, 2023 18:10

calebzulawski merged commit cd3c67b into rust-lang:master May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `#[inline]` to functions which were missing it, and `#[track_caller]` to ones with runtime panics from user input #347

Add `#[inline]` to functions which were missing it, and `#[track_caller]` to ones with runtime panics from user input #347

Uh oh!

thomcc commented May 7, 2023 •

edited

Loading

Uh oh!

saethlin commented May 7, 2023

Uh oh!

thomcc commented May 7, 2023

Uh oh!

calebzulawski commented May 7, 2023

Uh oh!

andy-thomason commented May 7, 2023

Uh oh!

saethlin commented May 7, 2023 •

edited

Loading

Uh oh!

Uh oh!

andy-thomason commented May 8, 2023

Uh oh!

bjorn3 commented May 9, 2023

Uh oh!

Uh oh!

Add #[inline] to functions which were missing it, and #[track_caller] to ones with runtime panics from user input #347

Add #[inline] to functions which were missing it, and #[track_caller] to ones with runtime panics from user input #347

Uh oh!

Conversation

thomcc commented May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saethlin commented May 7, 2023

Uh oh!

thomcc commented May 7, 2023

Uh oh!

calebzulawski commented May 7, 2023

Uh oh!

andy-thomason commented May 7, 2023

Uh oh!

saethlin commented May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andy-thomason commented May 8, 2023

Uh oh!

bjorn3 commented May 9, 2023

Uh oh!

Uh oh!

Add `#[inline]` to functions which were missing it, and `#[track_caller]` to ones with runtime panics from user input #347

Add `#[inline]` to functions which were missing it, and `#[track_caller]` to ones with runtime panics from user input #347

thomcc commented May 7, 2023 •

edited

Loading

saethlin commented May 7, 2023 •

edited

Loading