perf(codegen): Eliminate `size_of_val == 0` for DSTs with Non-zero-sized Prefix via NUW and Assume by TKanX · Pull Request #152843 · rust-lang/rust

TKanX · 2026-02-19T11:40:00Z

Summary:

Problem:

size_of_val(p) == 0 fails to optimize away for DST types that have a statically-known non-zero-sized prefix:

pub struct Foo<T: ?Sized>(pub [u32; 3], pub T);

pub fn demo(p: &Foo<dyn std::fmt::Debug>) -> bool {
    std::mem::size_of_val(p) == 0  // always false, but LLVM can't prove it
}

Foo has a 12-byte prefix, so its total size is always ≥ 12. Yet the comparison persists as a runtime computation in LLVM IR. This matters because Box<dyn T> drop emits this exact check to guard the deallocation call — for types with a guaranteed non-zero prefix, the branch should vanish but doesn't.

The slice tail variant Foo<[i32]> already optimized correctly; Foo<dyn Trait> and Foo<[u8]> did not.

Root Cause:

In size_and_align_of_dst (the ADT/Tuple branch), the size computation is:

full_size = (offset + unsized_size + (align-1)) & -align

LLVM cannot prove full_size > 0 because:

offset + unsized_size used plain add — no overflow flags, so LLVM cannot conclude the result is ≥ offset.
(x + addend) & -align — LLVM has no fold to prove that alignment rounding never reduces the value below x.

Solution:

Two changes:

add nuw nsw on offset + unsized_size — the sum is bounded by the rounded size ≤ isize::MAX, so neither signed nor unsigned overflow is possible. Tells LLVM: unrounded_size ≥ offset.
assume(full_size ≥ unrounded_size) — round_up(x, a) ≥ x is a mathematical identity for power-of-two a. Tells LLVM: full_size ≥ unrounded_size ≥ offset. If offset > 0, the chain proves full_size > 0.

LLVM IR Comparison:

Foo<dyn Debug> — before (godbolt):

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  %0 = getelementptr inbounds nuw i8, ptr %p.1, i64 8
  %1 = load i64, ptr %0, align 8, !range !3, !invariant.load !4
  %2 = getelementptr inbounds nuw i8, ptr %p.1, i64 16
  %3 = load i64, ptr %2, align 8, !range !5, !invariant.load !4
  %4 = tail call i64 @llvm.umax.i64(i64 %3, i64 4)
  %5 = add nuw i64 %1, 11
  %6 = add i64 %5, %4
  %7 = sub i64 0, %4
  %8 = and i64 %6, %7
  %_0 = icmp eq i64 %8, 0
  ret i1 %_0
}

Foo<dyn Debug> — after:

define noundef zeroext i1 @demo(ptr %p.0, ptr %p.1) {
start:
  ret i1 false
}

Foo<[u8]> — before:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  %0 = add i64 %p.1, 15
  %_0 = icmp ult i64 %0, 4
  ret i1 %_0
}

Foo<[u8]> — after:

define noundef zeroext i1 @demo_lessalignedslice(ptr %p.0, i64 %p.1) {
start:
  ret i1 false
}

Changes:

compiler/rustc_codegen_ssa/src/size_of_val.rs: add → unchecked_suadd (NUW+NSW) on offset + unsized_size; add assume(full_size ≥ unrounded_size).
tests/codegen-llvm/dst-size-of-val-not-zst.rs: new codegen test verifying size_of_val == 0 folds to ret i1 false for Foo<dyn Debug>, Foo<[u8]>, and Foo<[i32]>.

Fixes #152788.

TKanX · 2026-02-20T19:33:41Z

@rustbot label +A-LLVM +A-codegen +C-optimization +T-compiler

fmease · 2026-02-21T22:14:41Z

r? codegen

compiler/rustc_codegen_ssa/src/size_of_val.rs

rustbot · 2026-02-22T00:16:08Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

…= 0` for non-ZST DSTs

… on non-ZST DSTs

rustbot · 2026-02-22T05:32:48Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

TKanX · 2026-02-22T05:34:22Z

@rustbot ready

scottmcm · 2026-02-22T19:00:51Z

compiler/rustc_codegen_ssa/src/size_of_val.rs

+            // Alignment rounding can only increase the size, never decrease it:
+            // `round_up(x, a) >= x` for power-of-two `a`. With the `nuw` on the
+            // addition above, LLVM can therefore deduce
+            // `full_size >= unrounded_size >= offset`, which proves `full_size > 0`
+            // for types with a non-zero-sized prefix (#152788).
+            let size_ge = bx.icmp(IntPredicate::IntUGE, full_size, unrounded_size);
+            bx.assume(size_ge);


Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

@scottmcm

Can you elaborate on which things you tried and why this is the best one? Was it not enough to say that the alignment is a power-of-two? Or...

Tried nuw-only (unchecked_uadd) first. That gives LLVM unrounded >= offset > 0 but it stops at the rounding — LLVM can't prove (x + a-1) & -a >= x. Also checked whether feeding ctpop(align) == 1 would help, but there's no fold for "round-up is monotonic when alignment is pow2" in InstCombine/ValueTracking. So the assume tells LLVM the conclusion directly.

nsw (making it unchecked_suadd) is because unrounded ≤ rounded ≤ isize::MAX. Same reasoning as your #152867.

I ask because most of the text in the OP is just useless LLM slop, and the updates to the tests make me suspicious.

Sorry about the OP — English isn't my native language, I overwrite when trying to be precise. Will clean it up.

For the tests: CHECK-NOT: icmp broke because assume itself emits an icmp. The !range checks on the first two functions were dropped because the assume keeps the size computation alive, so there's now a size load before the alignment load — FileCheck hits the wrong one. Range metadata is still verified in align_load_from_align_of_val below. RANGE_META → ALIGN_RANGE since it only covers alignment loads now. Range value {1, 0} → {1, 0x20000001} is Align::max_for_target (same change as #152929).

Happy to close this if you'd rather land it as part of #152867.

Landing this separately is great -- I opened the issue because this particular bit about what LLVM can prove is different enough from the point of layout_of_val that it's better to have the changes separated. (That's why I pulled out #152929 too 🙂 )

Hmm, yeah, I experimented a bit https://llvm.godbolt.org/z/haGYz7aax and even getting lots of annotations on everything and assume it's still not able to understand what's happening properly.

(Also it's so annoying to see add nsw i64 %4, -1 since that used to be sub nuw nsw i64 %4, 1 but LLVM just insists on throwing that information away.)

dianqk · 2026-02-22T19:06:58Z

r? scottmcm

scottmcm · 2026-02-22T21:02:43Z

tests/codegen-llvm/dst-vtable-align-nonzero.rs

-    // CHECK: load [[USIZE:i[0-9]+]], {{.+}} !range [[RANGE_META:![0-9]+]]
+    // CHECK: load [[USIZE:i[0-9]+]]
    // CHECK-NOT: llvm.umax
-    // CHECK-NOT: icmp
    // CHECK-NOT: select
    // CHECK: ret


So the problem here is that if this was testing for "not icmp", just removing that check means this test is (potentially) no longer testing what it was trying to test before.

If there's an icmp now, probably what you want instead is something like

// CHECK-NOT: llvm.umax // CHECK-NOT: icmp // CHECK-NOT: select // CHECK: [[DOES_NOT_SHRINK:%.+]] = icmp ... something here ... // CHECK-NEXT: call void @llvm.assume(i1 [[DOES_NOT_SHRINK]]) // CHECK-NOT: llvm.umax // CHECK-NOT: icmp // CHECK-NOT: select

so that the test is that the only icmp is the expected one that's used for the assume.

Similarly, why remove the !range check? It's not being optimized out, is it? (If it is, that's also interesting.)

Checked the emitted IR — the assume (and the entire size computation) gets DCE'd in these two functions at -O3, since they only need alignment for the field projection. So there's no extra icmp at all, and the alignment load is still the first one with !range. Restored the original patterns as-is; the file is now unchanged from main.

tests/codegen-llvm/dst-vtable-align-nonzero.rs

…nzero Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

…ign-nonzero Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

TKanX · 2026-02-22T21:54:57Z

@rustbot ready

scottmcm · 2026-02-22T21:55:22Z

Ah, great, that other file just not changing at all any more is excellent. Diffs that aren't there are my favourite things, as a reviewer 🙂

This probably isn't instantiated enough for an assume to be a perf problem, but checking just in case
@bors try @rust-timer queue

…me, r=<try> perf(codegen): Eliminate `size_of_val == 0` for DSTs with Non-zero-sized Prefix via NUW and Assume

TKanX · 2026-02-22T22:00:57Z

Ah, great, that other file just not changing at all any more is excellent. Diffs that aren't there are my favourite things, as a reviewer 🙂

This probably isn't instantiated enough for an assume to be a perf problem, but checking just in case @bors try @rust-timer queue

The assume path is cold enough I wasn't worried, but data's data.

TKanX · 2026-02-22T22:29:31Z

@scottmcm You were right about the sandwich — I only tested on x86_64, where LLVM DCE'd the assume entirely, but aarch64 keeps it alive. Working on it now.

Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

scottmcm · 2026-02-22T23:50:53Z

Hmm, why would aarch64 do anything different here? The codegen-llvm tests are running only the middle-end of llvm, not the backend, so it shouldn't matter...

rust-bors · 2026-02-23T00:10:21Z

☀️ Try build successful (CI)
Build commit: 3cf407e (3cf407e8b04f1a796bf7b9360afd7972896f340d, parent: 1500f0f47f5fe8ddcd6528f6c6c031b210b4eac5)

TKanX · 2026-02-23T00:14:10Z

Hmm, why would aarch64 do anything different here? The codegen-llvm tests are running only the middle-end of llvm, not the backend, so it shouldn't matter...

You're right — the architecture has nothing to do with it. I was testing locally against LLVM 22, which DCEs the assume entirely. I verified this just now: same unoptimized IR through opt -O3 with both target triples produces identical output on LLVM 22. The x86_64-gnu-llvm-20 job was cancelled (not passed), so it would have failed the same way.

@scottmcm

rust-timer · 2026-02-23T00:50:11Z

Finished benchmarking commit (3cf407e): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.1%	[2.1%, 2.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-2.2%	[-2.5%, -1.9%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.8%	[-2.5%, 2.1%]	3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 483.037s -> 479.541s (-0.72%)
Artifact size: 397.95 MiB -> 397.91 MiB (-0.01%)

rustbot assigned fmease Feb 19, 2026

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 19, 2026

This comment has been minimized.

Sign in to view

rustbot added A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such labels Feb 20, 2026

rustbot assigned dianqk and unassigned fmease Feb 21, 2026

This comment has been minimized.

Sign in to view

scottmcm requested changes Feb 22, 2026

View reviewed changes

compiler/rustc_codegen_ssa/src/size_of_val.rs Outdated Show resolved Hide resolved

compiler/rustc_codegen_ssa/src/size_of_val.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026

TKanX added 2 commits February 21, 2026 21:31

perf(codegen): Use nuw nsw and assume to eliminate `size_of_val =…

12a18ee

…= 0` for non-ZST DSTs

test(codegen): Add regression test for size_of_val == 0 elimination…

8339cfe

… on non-ZST DSTs

TKanX force-pushed the bugfix/152788-codegen-dst-size-nuw-assume branch from a9ec27f to 8339cfe Compare February 22, 2026 05:32

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026

TKanX requested a review from scottmcm February 22, 2026 05:34

scottmcm reviewed Feb 22, 2026

View reviewed changes

rustbot assigned scottmcm and unassigned dianqk Feb 22, 2026

scottmcm reviewed Feb 22, 2026

View reviewed changes

tests/codegen-llvm/dst-vtable-align-nonzero.rs Outdated Show resolved Hide resolved

scottmcm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 22, 2026

TKanX and others added 2 commits February 22, 2026 13:42

test(codegen): Restore original CHECK patterns in dst-vtable-align-no…

689cd64

…nzero Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

test(codegen): Revert unnecessary ALIGN_RANGE rename in dst-vtable-al…

3e6f372

…ign-nonzero Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 22, 2026

TKanX requested a review from scottmcm February 22, 2026 21:55

This comment has been minimized.

Sign in to view

rust-bors bot pushed a commit that referenced this pull request Feb 22, 2026

Auto merge of #152843 - TKanX:bugfix/152788-codegen-dst-size-nuw-assu…

3cf407e

…me, r=<try> perf(codegen): Eliminate `size_of_val == 0` for DSTs with Non-zero-sized Prefix via NUW and Assume

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 22, 2026

This comment has been minimized.

Sign in to view

test(codegen): Fix dst-vtable-align-nonzero patterns for aarch64

45b1d74

Co-authored-by: Scott McMurray <scottmcm@users.noreply.github.com>

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 23, 2026

Uh oh!

Comments

Conversation

TKanX commented Feb 19, 2026 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary:

Problem:

Root Cause:

Solution:

LLVM IR Comparison:

Changes:

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

TKanX commented Feb 20, 2026

Uh oh!

fmease commented Feb 21, 2026

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

rustbot commented Feb 22, 2026

Uh oh!

rustbot commented Feb 22, 2026

Uh oh!

TKanX commented Feb 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dianqk commented Feb 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TKanX commented Feb 22, 2026

Uh oh!

scottmcm commented Feb 22, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

TKanX commented Feb 22, 2026

Uh oh!

This comment has been minimized.

This comment has been minimized.

TKanX commented Feb 22, 2026

Uh oh!

scottmcm commented Feb 22, 2026

Uh oh!

rust-bors bot commented Feb 23, 2026

Uh oh!

This comment has been minimized.

TKanX commented Feb 23, 2026

Uh oh!

rust-timer commented Feb 23, 2026

Overall result: no relevant changes - no action needed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

TKanX commented Feb 19, 2026 •

edited by rustbot

Loading