Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add #[inline] to functions which were missing it, and #[track_caller] to ones with runtime panics from user input #347

Merged
merged 2 commits into from
May 21, 2023

Conversation

thomcc
Copy link
Member

@thomcc thomcc commented May 7, 2023

Rationale for inlining is basically twofold. First: even though #[inline] doesn't help the LLVM inliner for generics, it apparently can help the MIR inliner (and the lack of it has been commented on before by @saethlin, at least).

Second, and perhaps more controversially, this library depends on inlining in a way that basically no other part of the stdlib does -- failing to inline most of these functions kind of defeats the point of the design we've chosen.

Concretely, the simd_foo intrinsics this library will codegen to take advantage of whatever vector operations are allowed in the function where they finally get codegenned (e.g. after all inlining occurs).

This means that if we define our high-level wrappers for some simd_foo intrinsic and the wrapper gets called from a downstream users #[target_feature(enable="foo")] function (for a relevant feature), then failing to inline that wrapper into the call-site is squandering the approach we've gained with by using the simd_foo intrinsic design.

Arguably, this justifies use of #[inline(always)]. I am in favor of that (it would also could help core::simd-using code have improved performance in unoptimized builds compared to core::arch use (which is absurdly slow without opts). That said, I didn't bother since that can be done later, and might be better as just some extra tuning on the MIR inliner. Not to mention, we support simd ops on unreasonably huge vectors and it seems completely unreasonable to use #[inline(always)] on functions of Simd<f64, 4096> or whatever.

And I also added a clippy warning for missing inline on public functions (although private functions that are called from public functions should get ideally get #[inline] too in most cases).


The rationale for track_caller should be obvious (much easier for users to track down the issue). While track_caller can have a very small perf impact, it does not matter here (the overhead it adds is one extra pointer argument passed into the function, but we expect these to be inlined, so that argument doesn't need to be passed).

@saethlin
Copy link
Member

saethlin commented May 7, 2023

FYI since rust-lang/rust#109247, MIR inlining is permitted without #[inline], but the attribute still doubles(!) the inline cost threshold. The MIR inliner also now ignores the size of locals (rust-lang/rust#110705), so Simd<f64, 4096> does not become a bad inlining candidate on account of its size. It will still be a bad inlining candidate if its size results in a lot of statements.

@thomcc
Copy link
Member Author

thomcc commented May 7, 2023

Yeah that makes sense. Even if it's not a hard block for the MIR inliner now, these are things that really really should be inlined.

@calebzulawski
Copy link
Member

I think this is uncontroversial and mostly oversights. That lint is helpful

@andy-thomason
Copy link

Very sensible. It is a shame that rust does not inline methods by default like C++ does.

@saethlin
Copy link
Member

saethlin commented May 7, 2023

It is a shame that rust does not inline methods by default like C++ does.

By default, Rust has exactly the same inlining behaviors as C++ does. Removing indirect calls from final codegen is not what is being discussed here.

The desirable effect here is from the MIR inliner, which runs before rustc hands off codegen to LLVM, because the LLVM optimizer (which was designed for C/C++ and written in C++) seems to de-optimize unless we do this.

@andy-thomason
Copy link

By default, Rust has exactly the same inlining behaviors as C++ does.

Experienced C++ users will generally build the entire project as a single compilation unit
which makes significant improvements to build time and code performance by enabling
IR reductions early in the lowering process. For example, I did a little example for the LLVM
group where I built LLVM from scratch nearly a thousand times faster than the default
build as less than ten CUs. The resulting code was also faster and smaller. With modern
computers, you could build LLVM in a single CU.

If we do a default build with Rust (i.e. large numbers of CUs), we get the same slow compilation time
and loss of performance from both lack of inlining and the inability of the compiler to do global optimisation
by introspection.

Of course inline semantics are more than just a hint to the compiler. In C++ they promote functions to
their own sections and enable the linker to remove duplicates.

I know that calling on all functions to be inlinable will fall on deaf ears, but it would be a simple experiment
to try. You never know, you might like the result.

@bjorn3
Copy link
Member

bjorn3 commented May 9, 2023

In C++ I think it is more that building as a single unit allows not parsing and typechecking headers again for every source file. Rustc already doesn't do this. As for the backend side, rustc in fact intentionally splits a single crate into multiple codegen units to improve compilation time by allowing parallelism. Cross crate inlining without #[inline] is supported to through LTO, but this is disabled by default because it is much slower. There is fat LTO which is effectively merging all codegen units of all crates into a single module and then optimizing it as a whole, which is really slow. And there is thin LTO, where only a summary is extracted from each module before optimizing each codegen unit in parallel using the combination of all summaries. This is much faster, but still slower than not doing any LTO. For release builds we default to thin local LTO which is a variant on thin LTO where only the codegen units within a single crate are taken into account. In the past we used a single codegen unit in release mode, but thin local LTO is much faster while barely regressing performance.

Of course inline semantics are more than just a hint to the compiler. In C++ they promote functions to
their own sections and enable the linker to remove duplicates.

In rust we put every function in their own section and tell LLVM that it is allowed to merge functions. Unless you enable ICF for the linker this doesn't allow merging identical functions on the linker side, but LLVM does merge functions within a codegen unit and the linker will remove all unused functions because we use --gc-sections.

@calebzulawski calebzulawski merged commit cd3c67b into rust-lang:master May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants