Skip to content

Inline smaller callees first #130696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 89 additions & 62 deletions compiler/rustc_mir_transform/src/inline.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
//! Inlining pass for MIR functions.

use std::cmp::Ordering;
use std::collections::BinaryHeap;
use std::iter;
use std::ops::{Range, RangeFrom};

use rustc_attr::InlineAttr;
use rustc_hir::def::DefKind;
use rustc_hir::def_id::DefId;
use rustc_index::bit_set::BitSet;
use rustc_index::Idx;
use rustc_middle::bug;
Expand All @@ -30,7 +31,7 @@ use crate::validate::validate_types;

pub(crate) mod cycle;

const TOP_DOWN_DEPTH_LIMIT: usize = 5;
const MAX_INLINED_CALLEES_PER_BODY: usize = 9;

// Made public so that `mir_drops_elaborated_and_const_checked` can be overridden
// by custom rustc drivers, running all the steps by themselves. See #114628.
Expand Down Expand Up @@ -102,46 +103,80 @@ fn inline<'tcx>(tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) -> bool {
tcx,
param_env,
codegen_fn_attrs,
history: Vec::new(),
changed: false,
queue: BinaryHeap::new(),
caller_is_inline_forwarder: matches!(
codegen_fn_attrs.inline,
InlineAttr::Hint | InlineAttr::Always
) && body_is_forwarder(body),
};
let blocks = START_BLOCK..body.basic_blocks.next_index();
this.process_blocks(body, blocks);
this.changed
let initial_blocks = START_BLOCK..body.basic_blocks.next_index();
this.enqueue_new_candidates(body, initial_blocks);

let mut changed = false;
let mut remaining_cost = tcx.sess.opts.unstable_opts.inline_mir_total_threshold.unwrap_or(300);
let mut remaining_count = MAX_INLINED_CALLEES_PER_BODY;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this go by some size heuristic? if the callee is absolutely tiny (e.g. just another call) then I'd expect the cost of inlining to be ~zero (replacing a call with another call), which means we can inline an unlimited amount of those cases.

Copy link
Member

@the8472 the8472 Sep 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looking at the testcases I guess the scopes make this non-zero cost?
Are those debug-only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a heterogeneous recursion example in the tests that will go ≈infinitely if there's no numeric limit, so there needs to be a limit more than just the total cost to keep that from excessively exploding.

I can definitely experiment with upping the count limit and have the normal case be hitting the cost limit, though, since as you say we almost always want to inline the ≈free things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also try some tricks like adding synthetic cost the deeper the inlining, or something, so that it ends up hitting the cost limits anyway. Might be better than the count limit anyway since it'll encourage more breadth instead of depth...

while remaining_count > 0
&& let Some(candidate) = this.queue.pop()
{
let Some(c) = remaining_cost.checked_sub(candidate.cost) else {
debug!(
"next-cheapest candidate has cost {}, but only {} left",
candidate.cost, remaining_cost,
);
break;
};
remaining_cost = c;
remaining_count -= 1;

if let Ok(new_blocks) = this.try_inline_candidate(body, candidate) {
changed = true;
this.enqueue_new_candidates(body, new_blocks);
}
}

changed
}

struct Inliner<'tcx> {
tcx: TyCtxt<'tcx>,
param_env: ParamEnv<'tcx>,
/// Caller codegen attributes.
codegen_fn_attrs: &'tcx CodegenFnAttrs,
/// Stack of inlined instances.
/// We only check the `DefId` and not the args because we want to
/// avoid inlining cases of polymorphic recursion.
/// The number of `DefId`s is finite, so checking history is enough
/// to ensure that we do not loop endlessly while inlining.
history: Vec<DefId>,
/// Indicates that the caller body has been modified.
changed: bool,
/// The currently-known inlining candidates.
queue: BinaryHeap<InlineCandidate<'tcx>>,
/// Indicates that the caller is #[inline] and just calls another function,
/// and thus we can inline less into it as it'll be inlined itself.
caller_is_inline_forwarder: bool,
}

struct InlineCandidate<'tcx> {
cost: usize,
callsite: CallSite<'tcx>,
callee_body: &'tcx Body<'tcx>,
destination_ty: Ty<'tcx>,
}

/// Smallest-cost is "max" for use in `BinaryHeap`, with ties broken
/// by preferring the one in the smaller-numbered `BasicBlock`.
impl Ord for InlineCandidate<'_> {
fn cmp(&self, other: &Self) -> Ordering {
(other.cost, other.callsite.block).cmp(&(self.cost, self.callsite.block))
}
}
impl PartialOrd for InlineCandidate<'_> {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
impl Eq for InlineCandidate<'_> {}
impl PartialEq for InlineCandidate<'_> {
fn eq(&self, other: &Self) -> bool {
self.cmp(other) == Ordering::Equal
}
}

impl<'tcx> Inliner<'tcx> {
fn process_blocks(&mut self, caller_body: &mut Body<'tcx>, blocks: Range<BasicBlock>) {
// How many callsites in this body are we allowed to inline? We need to limit this in order
// to prevent super-linear growth in MIR size
let inline_limit = match self.history.len() {
0 => usize::MAX,
1..=TOP_DOWN_DEPTH_LIMIT => 1,
_ => return,
};
let mut inlined_count = 0;
fn enqueue_new_candidates(&mut self, caller_body: &Body<'tcx>, blocks: Range<BasicBlock>) {
for bb in blocks {
let bb_data = &caller_body[bb];
if bb_data.is_cleanup {
Expand All @@ -152,44 +187,23 @@ impl<'tcx> Inliner<'tcx> {
continue;
};

let span = trace_span!("process_blocks", %callsite.callee, ?bb);
let _guard = span.enter();

match self.try_inlining(caller_body, &callsite) {
Err(reason) => {
debug!("not-inlined {} [{}]", callsite.callee, reason);
}
Ok(new_blocks) => {
debug!("inlined {}", callsite.callee);
self.changed = true;

self.history.push(callsite.callee.def_id());
self.process_blocks(caller_body, new_blocks);
self.history.pop();

inlined_count += 1;
if inlined_count == inline_limit {
debug!("inline count reached");
return;
}
}
}
let Ok(candidate) = self.validate_inlining_candidate(caller_body, callsite) else {
continue;
};
self.queue.push(candidate);
}
}

/// Attempts to inline a callsite into the caller body. When successful returns basic blocks
/// containing the inlined body. Otherwise returns an error describing why inlining didn't take
/// place.
fn try_inlining(
fn validate_inlining_candidate(
&self,
caller_body: &mut Body<'tcx>,
callsite: &CallSite<'tcx>,
) -> Result<std::ops::Range<BasicBlock>, &'static str> {
caller_body: &Body<'tcx>,
callsite: CallSite<'tcx>,
) -> Result<InlineCandidate<'tcx>, &'static str> {
self.check_mir_is_available(caller_body, callsite.callee)?;

let callee_attrs = self.tcx.codegen_fn_attrs(callsite.callee.def_id());
let cross_crate_inlinable = self.tcx.cross_crate_inlinable(callsite.callee.def_id());
self.check_codegen_attributes(callsite, callee_attrs, cross_crate_inlinable)?;
self.check_codegen_attributes(&callsite, callee_attrs, cross_crate_inlinable)?;

// Intrinsic fallback bodies are automatically made cross-crate inlineable,
// but at this stage we don't know whether codegen knows the intrinsic,
Expand All @@ -210,7 +224,20 @@ impl<'tcx> Inliner<'tcx> {
}

let callee_body = try_instance_mir(self.tcx, callsite.callee.def)?;
self.check_mir_body(callsite, callee_body, callee_attrs, cross_crate_inlinable)?;
let cost =
self.check_mir_body(&callsite, callee_body, callee_attrs, cross_crate_inlinable)?;
Ok(InlineCandidate { cost, callsite, callee_body, destination_ty })
}

/// Attempts to inline a callsite into the caller body. When successful returns basic blocks
/// containing the inlined body. Otherwise returns an error describing why inlining didn't take
/// place.
fn try_inline_candidate(
&self,
caller_body: &mut Body<'tcx>,
candidate: InlineCandidate<'tcx>,
) -> Result<std::ops::Range<BasicBlock>, &'static str> {
let InlineCandidate { callsite, callee_body, destination_ty, .. } = candidate;

if !self.tcx.consider_optimizing(|| {
format!("Inline {:?} into {:?}", callsite.callee, caller_body.source)
Expand Down Expand Up @@ -249,6 +276,10 @@ impl<'tcx> Inliner<'tcx> {
trace!(?output_type, ?destination_ty);
return Err("failed to normalize return type");
}

let terminator = caller_body[callsite.block].terminator.as_ref().unwrap();
let TerminatorKind::Call { args, .. } = &terminator.kind else { bug!() };

if callsite.fn_sig.abi() == Abi::RustCall {
// FIXME: Don't inline user-written `extern "rust-call"` functions,
// since this is generally perf-negative on rustc, and we hope that
Expand Down Expand Up @@ -294,7 +325,7 @@ impl<'tcx> Inliner<'tcx> {
}

let old_blocks = caller_body.basic_blocks.next_index();
self.inline_call(caller_body, callsite, callee_body);
self.inline_call(caller_body, &callsite, callee_body);
let new_blocks = old_blocks..caller_body.basic_blocks.next_index();

Ok(new_blocks)
Expand Down Expand Up @@ -396,10 +427,6 @@ impl<'tcx> Inliner<'tcx> {
return None;
}

if self.history.contains(&callee.def_id()) {
return None;
}

let fn_sig = self.tcx.fn_sig(def_id).instantiate(self.tcx, args);

// Additionally, check that the body that we're inlining actually agrees
Expand Down Expand Up @@ -493,7 +520,7 @@ impl<'tcx> Inliner<'tcx> {
callee_body: &Body<'tcx>,
callee_attrs: &CodegenFnAttrs,
cross_crate_inlinable: bool,
) -> Result<(), &'static str> {
) -> Result<usize, &'static str> {
let tcx = self.tcx;

if let Some(_) = callee_body.tainted_by_errors {
Expand Down Expand Up @@ -573,7 +600,7 @@ impl<'tcx> Inliner<'tcx> {
let cost = checker.cost();
if cost <= threshold {
debug!("INLINING {:?} [cost={} <= threshold={}]", callsite, cost, threshold);
Ok(())
Ok(cost)
} else {
debug!("NOT inlining {:?} [cost={} > threshold={}]", callsite, cost, threshold);
Err("cost above threshold")
Expand Down
3 changes: 3 additions & 0 deletions compiler/rustc_session/src/options.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1807,6 +1807,9 @@ options! {
(default: preserve for debuginfo != None, otherwise remove)"),
inline_mir_threshold: Option<usize> = (None, parse_opt_number, [TRACKED],
"a default MIR inlining threshold (default: 50)"),
inline_mir_total_threshold: Option<usize> = (None, parse_opt_number, [TRACKED],
"maximum total cost of functions inlined into one caller (default: 300), \
to limit eventual MIR size in functions which make many calls"),
input_stats: bool = (false, parse_bool, [UNTRACKED],
"gather statistics about the input (default: no)"),
instrument_mcount: bool = (false, parse_bool, [TRACKED],
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
let mut _3: u32;
scope 1 {
debug un => _1;
scope 3 (inlined std::mem::drop::<u32>) {
scope 2 (inlined std::mem::drop::<u32>) {
debug _x => _3;
}
}
scope 2 (inlined val) {
scope 3 (inlined val) {
}

bb0: {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
let mut _3: u32;
scope 1 {
debug un => _1;
scope 3 (inlined std::mem::drop::<u32>) {
scope 2 (inlined std::mem::drop::<u32>) {
debug _x => _3;
}
}
scope 2 (inlined val) {
scope 3 (inlined val) {
}

bb0: {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@
scope 3 {
debug precision => _8;
let _8: usize;
scope 5 (inlined Formatter::<'_>::precision) {
scope 4 (inlined Formatter::<'_>::precision) {
}
}
}
}
scope 4 (inlined Formatter::<'_>::sign_plus) {
scope 5 (inlined Formatter::<'_>::sign_plus) {
let mut _20: u32;
let mut _21: u32;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,12 @@
scope 3 {
debug precision => _8;
let _8: usize;
scope 5 (inlined Formatter::<'_>::precision) {
scope 4 (inlined Formatter::<'_>::precision) {
}
}
}
}
scope 4 (inlined Formatter::<'_>::sign_plus) {
scope 5 (inlined Formatter::<'_>::sign_plus) {
let mut _20: u32;
let mut _21: u32;
}
Expand Down
Loading
Loading