Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall #117583

EgorBo · 2025-07-13T14:58:16Z

Inline write barriers with covariance checks (in case if inliner decides that it's profitable) - this allows to remove redundant write barriers (previously we could only do that in early phases) and range checks (previously they were inside the helper call and now JIT can fold them).

Related: #9159

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    static object[] _strings = new string[4];
    static object _x = "";

    [Benchmark]
    public void AssignString()
    {
        var arr = _strings;
        if (arr.Length >= 4)
        {
            // We now can remove write barriers and redundant range checks if StelemRef gets inlined
            arr[0] = "";
            arr[1] = "";
            arr[2] = "";
            arr[3] = "";
        }
    }

    [Benchmark]
    public void SwapElements()
    {
        var arr = _strings;
        (arr[1], arr[0]) = (arr[0], arr[1]);
    }

    [Benchmark]
    public void SingleAssignmentCns()
    {
        _strings[0] = "";
    }

    [Benchmark]
    public void SingleAssignmentVar()
    {
        _strings[0] = _x;
    }
}

Linux-AMD (Genoa):

Method	Toolchain	Mean	Error	Ratio
AssignString	Main	7.5633 ns	0.0022 ns	1.00
AssignString	PR	1.3160 ns	0.0033 ns	0.17

SwapElements	Main	2.9979 ns	0.0008 ns	1.00
SwapElements	PR	1.0471 ns	0.0012 ns	0.35

SingleAssignmentCns	Main	1.9075 ns	0.0005 ns	1.00
SingleAssignmentCns	PR	0.2723 ns	0.0002 ns	0.14

SingleAssignmentVar	Main	1.9080 ns	0.0003 ns	1.00
SingleAssignmentVar	PR	1.6356 ns	0.0005 ns	0.86

Linux-ARM64 (Cobalt100):

Method	Toolchain	Mean	Error	Ratio
AssignString	Main	10.3296 ns	0.0032 ns	1.00
AssignString	PR	1.8738 ns	0.0006 ns	0.18

SwapElements	Main	2.6604 ns	0.0012 ns	1.00
SwapElements	PR	1.1768 ns	0.0024 ns	0.44

SingleAssignmentCns	Main	1.9655 ns	0.0023 ns	1.00
SingleAssignmentCns	PR	0.3821 ns	0.0003 ns	0.19

SingleAssignmentVar	Main	1.8030 ns	0.0013 ns	1.00
SingleAssignmentVar	PR	0.7219 ns	0.0006 ns	0.40

EgorBo · 2025-07-13T15:04:32Z

@MihuBot

src/coreclr/vm/ecalllist.h

src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/TypeCast.cs

jkotas · 2025-07-13T16:43:21Z

JIT should still see that the target is on the heap

Right, you would have to fix the JIT to be able to prove this in order to make this change without introducing regressions.

we might in the future, so the barrier will need to be made checked.

The barrier needs to be made checked when executed against stack allocated copy. It would be tough tradeoff to de-optimize all write barriers to enable more stack allocation.

EgorBo · 2025-07-13T17:09:40Z

Right, you would have to fix the JIT to be able to prove this in order to make this change without introducing regressions.

I believe I've already done that in this PR. The previous logic was a bit conservative - it tried to handle only knownHeapAddr + cns while in fact I think we can safely say that if we have object assignment for knownHeapAddr + anything we can use unchecked WB too (it's UB if the destination is not on the heap).

EgorBo · 2025-07-13T17:10:43Z

Although, this one can't be made unchecked by the JIT due to NoInlining:

And if I inline it, then StelemRef becomes too big and I was planning to inline it in the JIT (although, not sure, might be inlined in fact)

EgorBo · 2025-07-13T17:43:07Z

Ugh. another problem is that JIT never does tail-calling for helper-calls 😢

EgorBo · 2025-07-13T18:03:28Z

@MihuBot -arm

EgorBo · 2025-07-13T20:33:57Z

@EgorBot -amd -arm -windows_intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    static object[] _strings = new string[4];

    [Benchmark]
    public void AssignString()
    {
        var arr = _strings;
        if (arr.Length >= 4)
        {
            arr[0] = "";
            arr[1] = "";
            arr[2] = "";
            arr[3] = "";
        }
    }

    [Benchmark]
    public void SwapElements()
    {
        var arr = _strings;
        (arr[1], arr[0]) = (arr[0], arr[1]);
    }

    [Benchmark]
    public void SingleAssignment()
    {
        _strings[0] = "";
    }
}

EgorBo · 2025-09-03T13:24:45Z

Now that Main is .net 11, can we take a look at this?
PTAL @jkotas for non-JIT side and @jakobbotsch for JIT

jkotas · 2025-09-03T13:37:05Z

src/coreclr/jit/importercalls.cpp

+                    if (strcmp(className, "TypeCast") == 0)
+                    {
+                        if (strcmp(methodName, "WriteBarrier") == 0)
+                        {
+                            result = NI_System_Runtime_CompilerServices_RuntimeHelpers_WriteBarrier;
+                        }
+                    }
+                    else if (strcmp(namespaceName, "CompilerServices") == 0)


Suggested change

if (strcmp(className, "TypeCast") == 0)

{

if (strcmp(methodName, "WriteBarrier") == 0)

{

result = NI_System_Runtime_CompilerServices_RuntimeHelpers_WriteBarrier;

}

}

else if (strcmp(namespaceName, "CompilerServices") == 0)

if (strcmp(namespaceName, "CompilerServices") == 0)

And move the WriteBarrier into RuntimeHelpers for NAOT instead? We prefer things to be the same between NAOT and non-NAOT where possible.

@jkotas I could not use [Intrinsic] for RuntimeHelpers inside Runtime.Base (and presumably in Test.CoreLib) 🙁 so this was exactly why I ended up with this

Why not? Runtime.Base has its own simplified copy of [Intrinsic]: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CompilerServices/IntrinsicAttribute.cs

Ah, ignore me, I think I fixed it 🙂

jkotas · 2025-09-03T13:44:08Z

Closes #9159

This issue talks about removing store covariance check. I do not see it fixed by this change.

jkotas · 2025-09-03T13:45:34Z

in case if inliner decides that it's profitable

Does this optimization ever kicks in for NAOT? Or does it only ever kick in with runtime collected PGO data?

EgorBo · 2025-09-03T13:59:19Z

Closes #9159

This issue talks about removing store covariance check. I do not see it fixed by this change.

it is fixed by this change, once it inlines, it is smart enough to handle that, e.g example from that issue:

object temp = a[i];
a[i] = a[j];        // store check optimized by jit to simple write barrier
a[j] = temp;        // store check not optimized

EgorBo · 2025-09-03T14:02:08Z

in case if inliner decides that it's profitable

Does this optimization ever kicks in for NAOT? Or does it only ever kick in with runtime collected PGO data?

this optimization relies on Inliner to decide whether to inline the routine or not, and it turns out it needs a PGO to convince itself inlining large methods like this is profitable on hot paths. So it does need a profile on NAOT too. We may consider making inliner aggressive enough without PGO for PreferSpeed on NAOT.

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

…move-wb-fcall

src/libraries/Common/tests/System/Runtime/CompilerServices/RuntimeHelpers.cs

…timeHelpers.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>

jkotas · 2025-09-03T17:44:06Z

it is fixed by this change, once it inlines, it is smart enough to handle that, e.g example from that issue:

What's the code for this method with this change?

[MethodImpl(MethodImplOptions.NoInlining)]
void Swap(object[] a, int i, int j)
{
    object temp = a[i];
    a[i] = a[j];        // store check optimized by jit to simple write barrier
    a[j] = temp;        // store check not optimized
}

There are two array store invariance checks in this method. One of them is optimized before this change. Is this change enabling the second one to be optimized (the improvement tracked by #9159)? I do not see what makes it possible.

EgorBo · 2025-09-03T17:53:16Z

What's the code for this method with this change?

The codegen (win-x64) for this method is:

G_M39739_IG01:  ;; offset=0x0000
       push     rdi
       push     rsi
       push     rbp
       push     rbx
       sub      rsp, 40
       mov      rbx, rcx
       mov      esi, r8d
						;; size=14 bbWeight=1 PerfScore 4.75
G_M39739_IG02:  ;; offset=0x000E
       mov      edi, dword ptr [rbx+0x08]
       cmp      edx, edi
       jae      SHORT G_M39739_IG09
       mov      ecx, edx
       mov      rbp, gword ptr [rbx+8*rcx+0x10]
       cmp      esi, edi
       jae      SHORT G_M39739_IG09
       mov      edx, esi
       mov      rdx, gword ptr [rbx+8*rdx+0x10]
       lea      rcx, bword ptr [rbx+8*rcx+0x10]
       call     CORINFO_HELP_ASSIGN_REF
       mov      ecx, esi
       mov      edx, edi
       cmp      rdx, rcx
       jbe      SHORT G_M39739_IG06
       lea      rcx, bword ptr [rbx+8*rcx+0x10]
       mov      rdx, qword ptr [rbx]
       mov      rdx, qword ptr [rdx+0x30]
       test     rbp, rbp
       je       SHORT G_M39739_IG07
       cmp      rdx, qword ptr [rbp]
       jne      SHORT G_M39739_IG08
						;; size=67 bbWeight=1 PerfScore 23.00
G_M39739_IG03:  ;; offset=0x0051
       mov      rdx, rbp
       call     CORINFO_HELP_ASSIGN_REF
						;; size=8 bbWeight=1 PerfScore 1.25
G_M39739_IG04:  ;; offset=0x0059
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25
G_M39739_IG05:  ;; offset=0x005A
       add      rsp, 40
       pop      rbx
       pop      rbp
       pop      rsi
       pop      rdi
       ret      
						;; size=9 bbWeight=1 PerfScore 3.25
G_M39739_IG06:  ;; offset=0x0063
       call     [System.Runtime.CompilerServices.CastHelpers:ThrowIndexOutOfRangeException()]
       int3     
						;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG07:  ;; offset=0x006A
       xor      rax, rax
       mov      gword ptr [rcx], rax
       jmp      SHORT G_M39739_IG04
						;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG08:  ;; offset=0x0071
       mov      r8, 0x7FF8FA732308      ; System.Object[]
       cmp      qword ptr [rbx], r8
       je       SHORT G_M39739_IG03
       mov      r8, rbp
       call     [System.Runtime.CompilerServices.CastHelpers:StelemRef_Helper(byref,ptr,System.Object)]
       jmp      SHORT G_M39739_IG04
						;; size=26 bbWeight=0 PerfScore 0.00
G_M39739_IG09:  ;; offset=0x008B
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
						;; size=6 bbWeight=0 PerfScore 0.00

; Total bytes of code 145

EgorBo · 2025-09-03T18:00:47Z

I guess it's not entirely eliminated and the if (elementType != RuntimeHelpers.GetMethodTable(obj)) check and probably some things can be better CSE'd but it makes it 2-3x faster in microbenchmarks. The element type check might require teaching JIT about GetMethodTable(array)->ElementType field

jkotas · 2025-09-03T18:09:37Z

In order to eliminate the second invariant array store check, the JIT would have to figure out what the value came from the same array. Helper inlining won't help you with that.

I do not think that this PR is fixing #9159 that's specifically about eliminating invariant array store checks in code that is swapping elements of an array. Invariant array store checks can be very expensive depending on the type of the array and type of the item being stored into the array. It is why it is interesting to eliminate them.

jkotas · 2025-09-03T18:11:05Z

(The idea behind this change sounds reasonable to me otherwise.)

EgorBo · 2025-09-03T19:14:15Z

I do not think that this PR is fixing #9159

You're right I just misremembered that it did fix it when I was working on this. Changed the wording.

src/coreclr/jit/compiler.h

src/coreclr/jit/gentree.cpp

src/coreclr/jit/importercalls.cpp

jakobbotsch

LGTM

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

github-actions bot added the area-VM-coreclr label Jul 13, 2025

dotnet-policy-service bot assigned EgorBo Jul 13, 2025

MihuBot mentioned this pull request Jul 13, 2025

[JitDiff X64] [EgorBo] Remove WriteBarrier FCall MihuBot/runtime-utils#1223

Open

This comment was marked as resolved.

Sign in to view

EgorBot mentioned this pull request Jul 13, 2025

Benchmarks for #117583 (EgorBo) EgorBot/runtime-utils#435

Open

am11 reviewed Jul 13, 2025

View reviewed changes

src/coreclr/vm/ecalllist.h Outdated Show resolved Hide resolved

MihuBot mentioned this pull request Jul 13, 2025

[JitDiff X64] [EgorBo] Remove WriteBarrier FCall MihuBot/runtime-utils#1225

Open

jkotas reviewed Jul 13, 2025

View reviewed changes

src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/TypeCast.cs Show resolved Hide resolved

MihuBot mentioned this pull request Jul 13, 2025

[JitDiff ARM64] [EgorBo] Remove WriteBarrier FCall MihuBot/runtime-utils#1226

Open

EgorBo changed the title ~~Remove WriteBarrier FCall~~ Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall Jul 13, 2025

EgorBot mentioned this pull request Jul 13, 2025

Benchmarks for #117583 (EgorBo) EgorBot/runtime-utils#436

Open

EgorBo force-pushed the remove-wb-fcall branch from 5a686ab to a10e708 Compare July 14, 2025 00:29

EgorBo mentioned this pull request Jul 14, 2025

Fix side-effect extraction in optVNBasedFoldConstExpr #117599

Closed

This was referenced Jul 14, 2025

LibraryImportGenerator.Unit.Tests crashing on linux-x64 mono interpreter #100800

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

browser-wasm Windows build error #116746

Closed

Inline CORINFO_HELP_ARRADDR_ST helper call

872e969

EgorBo force-pushed the remove-wb-fcall branch from db666a2 to 872e969 Compare July 16, 2025 01:52

EgorBo added 3 commits July 16, 2025 03:55

clean up

e93c1cf

fix ci

7279b86

clean up

894c0ef

github-actions bot mentioned this pull request Jul 16, 2025

117583 MichalStrehovsky/rt-sz#150

Closed

Merge branch 'main' into remove-wb-fcall

fd9a81c

EgorBo requested a review from jakobbotsch September 3, 2025 13:24

jkotas reviewed Sep 3, 2025

View reviewed changes

This was referenced Sep 3, 2025

XHarness message of order error leading to timeout dotnet/dnceng#4823

Open

AppHost tests fail with "Failure extracting contents of the application bundle." #119249

Open

EgorBo and others added 3 commits September 3, 2025 19:04

Update src/coreclr/jit/importercalls.cpp

7300f5c

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

FB

8c3900a

Merge branch 'remove-wb-fcall' of github.com:EgorBo/runtime-1 into re…

e4d9fde

…move-wb-fcall

jkotas reviewed Sep 3, 2025

View reviewed changes

src/libraries/Common/tests/System/Runtime/CompilerServices/RuntimeHelpers.cs Outdated Show resolved Hide resolved

Update src/libraries/Common/tests/System/Runtime/CompilerServices/Run…

5bd0911

…timeHelpers.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>

build-analysis bot mentioned this pull request Sep 3, 2025

/root/helix/work/correlation/scripts/<hash>/execute.sh: Permission denied dotnet/dnceng#3412

Open

3 tasks

jakobbotsch reviewed Sep 5, 2025

View reviewed changes

src/coreclr/jit/compiler.h Show resolved Hide resolved

jakobbotsch reviewed Sep 5, 2025

View reviewed changes

src/coreclr/jit/gentree.cpp Show resolved Hide resolved

jakobbotsch reviewed Sep 5, 2025

View reviewed changes

src/coreclr/jit/importercalls.cpp Show resolved Hide resolved

jakobbotsch reviewed Sep 5, 2025

View reviewed changes

src/coreclr/jit/importercalls.cpp Outdated Show resolved Hide resolved

jakobbotsch approved these changes Sep 5, 2025

View reviewed changes

EgorBo and others added 2 commits September 5, 2025 13:21

Update src/coreclr/jit/importercalls.cpp

2731fb3

Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>

Merge branch 'main' into remove-wb-fcall

130f490

build-analysis bot mentioned this pull request Sep 8, 2025

Test failure: Microsoft.Extensions.Configuration.EnvironmentVariables.Test.EnvironmentVariablesTest.BindingDoesNotThrowIfReloadedDuringBinding #109904

Closed

Merge branch 'main' into remove-wb-fcall

5b5b037

Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall #117583

Are you sure you want to change the base?

Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall #117583

Conversation

EgorBo commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

EgorBo commented Jul 13, 2025

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

jkotas commented Jul 13, 2025

Uh oh!

EgorBo commented Jul 13, 2025

Uh oh!

EgorBo commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Jul 13, 2025

Uh oh!

EgorBo commented Jul 13, 2025

Uh oh!

EgorBo commented Jul 13, 2025

Uh oh!

EgorBo commented Sep 3, 2025

Uh oh!

jkotas Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

EgorBo Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

EgorBo Sep 3, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas commented Sep 3, 2025

Uh oh!

jkotas commented Sep 3, 2025

Uh oh!

EgorBo commented Sep 3, 2025

Uh oh!

EgorBo commented Sep 3, 2025

Uh oh!

Uh oh!

jkotas commented Sep 3, 2025

Uh oh!

EgorBo commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EgorBo commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Sep 3, 2025

Uh oh!

jkotas commented Sep 3, 2025

Uh oh!

EgorBo commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jakobbotsch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EgorBo commented Jul 13, 2025 •

edited

Loading

EgorBo commented Jul 13, 2025 •

edited

Loading

EgorBo commented Sep 3, 2025 •

edited

Loading

EgorBo commented Sep 3, 2025 •

edited

Loading