[Experiment] Clone blocks with bounds checks #112595

EgorBo · 2025-02-15T10:41:03Z

For an arbitrary block

arr[i + 1] = x;
arr[i + 3] = y;
arr[i + 5] = z;
arr[i + 6] = w;

In order to remove bounds checks, JIT can clone the whole thing to have fast and slow paths under "block clonning conditions":

if (i >= 0 && i < arr.Length - 6)
{
    // fast path
    arr[i + 1] = x; // no bounds check
    arr[i + 3] = y; // no bounds check
    arr[i + 5] = z; // no bounds check
    arr[i + 6] = w; // no bounds check
}
else
{
    // slow path
    arr[i + 1] = x;
    arr[i + 3] = y;
    arr[i + 5] = w;
    arr[i + 6] = w;
}

It works not only for stores, but for any expressions with bounds checks, e.g. arr[1] + arr[2] + arr[3]

Codegen example:

void Test(int[] arr)
{
    arr[0] = 10;
    arr[1] = 11;
    arr[2] = 12;
    arr[3] = 13;
}

Current codegen (4 bounds checks):

; Assembly listing for method Proga:Test(int[]):this (FullOpts)
    stp     fp, lr, [sp, #-0x10]!
    mov     fp, sp
    ldr     w0, [x1, #0x08]

    ;; arr[0] = 10
    cbz     w0, G_M65333_IG04
    mov     w2, #10
    str     w2, [x1, #0x10]

    ;; arr[1] = 11
    cmp     w0, #1
    bls     G_M65333_IG04
    mov     w2, #11
    str     w2, [x1, #0x14]

    ;; arr[2] = 12
    cmp     w0, #2
    bls     G_M65333_IG04
    mov     w2, #12
    str     w2, [x1, #0x18]

    ;; arr[3] = 13
    cmp     w0, #3
    bls     G_M65333_IG04
    mov     w0, #13
    str     w0, [x1, #0x1C]

    ldp     fp, lr, [sp], #0x10
    ret     lr
G_M65333_IG04:
    bl      CORINFO_HELP_RNGCHKFAIL
    brk     #0
; Total bytes of code 88

New codegen (single SIMD store in the fast path!):

; Assembly listing for method Proga:Test(int[]) (FullOpts)
    stp     fp, lr, [sp, #-0x10]!
    mov     fp, sp
    ldr     w1, [x0, #0x08]
    sxtw    w2, w1
    cmp     w1, #3
    ble     G_M50025_IG05
    ldr     q16, [@RWD00]
    str     q16, [x0, #0x10] ;; <-- single Vector128 store!
G_M50025_IG04:
    ldp     fp, lr, [sp], #0x10
    ret     lr

    ;; Slow path (cold block):
G_M50025_IG05:
    cbz     w2, G_M50025_IG06
    mov     w2, #10
    str     w2, [x0, #0x10]
    cmp     w1, #1
    bls     G_M50025_IG06
    mov     w2, #11
    str     w2, [x0, #0x14]
    cmp     w1, #2
    bls     G_M50025_IG06
    mov     w2, #12
    str     w2, [x0, #0x18]
    cmp     w1, #3
    bls     G_M50025_IG06
    mov     w1, #13
    str     w1, [x0, #0x1C]
    b       G_M50025_IG04
G_M50025_IG06:
    bl      CORINFO_HELP_RNGCHKFAIL
    brk     #0
RWD00  	dq	0000000B0000000Ah, 0000000D0000000Ch
; Total bytes of code 112

…locks-BCE

…lone-blocks-BCE

EgorBo · 2025-02-16T14:13:17Z

This PR also sort of closes #109983

For this case we don't have to clone the blocks actually, but it's better than nothing 🙂

EgorBo · 2025-02-16T18:17:57Z

@EgorBot -amd -arm --filter benchmarkLU

EgorBo · 2025-02-17T09:32:27Z

Actually, this should be good as is (as a start). Diffs - obviously, a size regression, but mostly a clean PerfScore improvement:

A random dotnet/performance benchmarks I ran (since it appeared in the diffs):

The PR scans all GT_BOUNDS_CHECK nodes in a block, then groups them by "base indexVN + lengthVN" in a hash table (sort of Dictionary<(baseVN, lenVN), BoundsCheck>), takes a group with the most bounds checks, calculates their Statement range, splits the block at the very first bounds check in the group (so the length and the index of the first bounds check are spilled by gtSplitTree if they have side effects) and clones the rest. Then repeats the same in case if we have more groups.

I decided to perform this optimization after the range check phase - it allows me to handle only those bounds checks nobody was able to handle before me. A bit unfortunate phase ordering issue is that CSE sometimes decides to perform CSE for the index tree (because it's used in the bounds check and the actual array access) and when my algorithm drops the bounds check node, we're left with a redundant local:

This is main source of the regressions - it'd be nice to have some late forward sub to clean these up. Also, this issue is solved with JitOptRepeat, perhaps, my phase could request an extra iteration of that in the future (not today as JitOptRepeat has issues).

PTAL @jakobbotsch @AndyAyersMS @dotnet/jit-contrib

src/coreclr/jit/rangecheckcloning.cpp

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

src/coreclr/jit/rangecheckcloning.cpp

jakobbotsch

This LGTM now. Cool opt 🙂

EgorBo added 3 commits February 14, 2025 11:19

Initial impl

ec14916

Merge branch 'main' of https://github.com/dotnet/runtime into clone-b…

d0d3572

…locks-BCE

Initial impl

d8c8b60

dotnet-policy-service bot assigned EgorBo Feb 15, 2025

Merge branch 'clone-blocks-BCE' of github.com:EgorBo/runtime-1 into c…

61a800d

…lone-blocks-BCE

build-analysis bot mentioned this pull request Feb 15, 2025

System.Numerics.Tensors.Tests.ConvertTests.ConvertChecked failing with System.OverflowException #112286

Closed

fix perfscore regressions

7b78418

This comment was marked as resolved.

Sign in to view

EgorBo added 7 commits February 16, 2025 12:46

move to main loop

5f1e3f7

clean up

3a67d86

fix assert

3f8526e

move to a separate file

ea5dce5

clean up

8d87c71

Clean up

de285cb

Clean up

dc37af2

EgorBo added 4 commits February 16, 2025 16:24

add comments

cf0a264

add comments

abfb580

Add debug diagnostics

591bcc3

clean up

6943f24

EgorBot mentioned this pull request Feb 16, 2025

EgorBot for EgorBo in #112595 EgorBot/runtime-utils#298

Open

clean up

52b9d88

This comment was marked as resolved.

Sign in to view

EgorBot mentioned this pull request Feb 16, 2025

EgorBot for EgorBo in #112595 EgorBot/runtime-utils#299

Open

clean up

cbd3799

This was referenced Feb 17, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

EgorBo marked this pull request as ready for review February 17, 2025 09:32

jakobbotsch reviewed Feb 17, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Feb 17, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Show resolved Hide resolved

jakobbotsch reviewed Feb 17, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Feb 17, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

EgorBo and others added 5 commits February 17, 2025 13:53

Apply suggestions from code review

f222014

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

Address feedback

f50dee4

Add PreOrderVisit

93e071a

FB

492043e

check complexity

b079680

jakobbotsch reviewed Feb 17, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

EgorBo added 2 commits February 17, 2025 19:02

Add a private copy of optRemoveRangeCheck

36578a1

oops

5b7ce55

EgorBo mentioned this pull request Feb 17, 2025

Remove bounds checks for a[i+c1] followed by a[i+c2] #112532

Closed

EgorBo added 2 commits February 17, 2025 23:16

Merge branch 'main' into clone-blocks-BCE

ed64fa6

Use gtComplexityExceeds

8f867f6

jakobbotsch reviewed Feb 18, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Feb 18, 2025

View reviewed changes

src/coreclr/jit/rangecheckcloning.cpp Outdated Show resolved Hide resolved

Address feedback

ff26909

jakobbotsch approved these changes Feb 18, 2025

View reviewed changes

Clean up

70de4e7

This was referenced Feb 18, 2025

[android] Android.Device_Emulator.JIT.Test failing on emulators with CoreCLR #112633

Open

Android emulator not booting completely on Helix queue dotnet/dnceng#1448

Open

EgorBo merged commit 5486a26 into dotnet:main Feb 18, 2025
110 of 112 checks passed

EgorBo deleted the clone-blocks-BCE branch February 18, 2025 17:37

EgorBo mentioned this pull request Feb 18, 2025

Handle overlapped groups of bounds checks #112660

Merged

LoopedBard3 mentioned this pull request Feb 25, 2025

[Perf] Linux/x64: MDArray2 Regression on 2/18/2025 9:18:32 PM +00:00 #112915

Open

EgorBo mentioned this pull request Feb 26, 2025

JIT: Redundant bounds check is not eliminated #109677

Open

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Clone blocks with bounds checks #112595

[Experiment] Clone blocks with bounds checks #112595

EgorBo commented Feb 15, 2025 •

edited

Loading

This comment was marked as resolved.

EgorBo commented Feb 16, 2025

EgorBo commented Feb 16, 2025

This comment was marked as resolved.

EgorBo commented Feb 17, 2025 •

edited

Loading

jakobbotsch left a comment

[Experiment] Clone blocks with bounds checks #112595

[Experiment] Clone blocks with bounds checks #112595

Conversation

EgorBo commented Feb 15, 2025 • edited Loading

Codegen example:

This comment was marked as resolved.

EgorBo commented Feb 16, 2025

EgorBo commented Feb 16, 2025

This comment was marked as resolved.

EgorBo commented Feb 17, 2025 • edited Loading

jakobbotsch left a comment

Choose a reason for hiding this comment

EgorBo commented Feb 15, 2025 •

edited

Loading

EgorBo commented Feb 17, 2025 •

edited

Loading