- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13.9k
Open
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationArea: Code generationA-mir-optArea: MIR optimizationsArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem teamRelevant to the opsem team
Description
When creating multiple instances of a small struct, each instance will be allocated separately on the stack even if they are known never to overlap.
Example: the following code will generate two alloca calls that are not optimized away by LLVM:
(Godbolt)
pub struct WithOffset<T> {
    pub data: T,
    pub offset: usize,
}
#[inline(never)]
pub fn use_w(w: WithOffset<&[u8; 16]>) {
    std::hint::black_box(w);
}
#[inline(never)]
pub fn peek_w(w: &WithOffset<&[u8; 16]>) {
    std::hint::black_box(w);
}
pub fn offsets(buf: [u8; 16]) {
    let w = WithOffset {
        data: &buf,
        offset: 0,
    };
    peek_w(&w);
    use_w(w);
    let w2 = WithOffset {
        data: &buf,
        offset: 1,
    };
    peek_w(&w2);
    use_w(w2);
}LLVM IR:
; playground::offsets
; Function Attrs: noinline nounwind
define internal fastcc void @playground::offsets(ptr noalias nocapture noundef nonnull readonly align 1 dereferenceable(16) %buf) unnamed_addr #0 {
start:
  %w2 = alloca [16 x i8], align 8
  %w = alloca [16 x i8], align 8
  store ptr %buf, ptr %w, align 8
  %0 = getelementptr inbounds nuw i8, ptr %w, i64 8
  store i64 0, ptr %0, align 8
; call playground::peek_w
  call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w) #88
; call playground::use_w
  call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 0) #88
  store ptr %buf, ptr %w2, align 8
  %1 = getelementptr inbounds nuw i8, ptr %w2, i64 8
  store i64 1, ptr %1, align 8
; call playground::peek_w
  call fastcc void @playground::peek_w(ptr noalias noundef readonly align 8 dereferenceable(16) %w2) #88
; call playground::use_w
  call fastcc void @playground::use_w(ptr noalias noundef readonly align 1 dereferenceable(16) %buf, i64 noundef 1) #88
  ret void
}It seems like a call to @llvm.lifetime.{start,end}.p0 is missing. If we instead use:
pub fn closures(buf: [u8; 16]) {
    (|| {
        let w = WithOffset {
            data: &buf,
            offset: 0,
        };
        peek_w(&w);
        use_w(w);
    })();
    (|| {
        let w2 = WithOffset {
            data: &buf,
            offset: 1,
        };
        peek_w(&w2);
        use_w(w2);
    })();
}We do get them and the second alloca is optimized away (see the Godbolt link).
I encountered this when working on memorysafety/rav1d#1402, where this misoptimization results in over 100 bytes of extra allocations in a specific function, which slows down the entire binary by ~0.5%.
This might also be related to #138544
scottmcm
Metadata
Metadata
Assignees
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-codegenArea: Code generationArea: Code generationA-mir-optArea: MIR optimizationsArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.T-opsemRelevant to the opsem teamRelevant to the opsem team