-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Inline CORINFO_HELP_ARRADDR_ST helper call, remove WriteBarrier FCall #117583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Right, you would have to fix the JIT to be able to prove this in order to make this change without introducing regressions.
The barrier needs to be made checked when executed against stack allocated copy. It would be tough tradeoff to de-optimize all write barriers to enable more stack allocation. |
I believe I've already done that in this PR. The previous logic was a bit conservative - it tried to handle only |
@MihuBot -arm |
@EgorBot -amd -arm -windows_intel using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
public class Benchmarks
{
static object[] _strings = new string[4];
[Benchmark]
public void AssignString()
{
var arr = _strings;
if (arr.Length >= 4)
{
arr[0] = "";
arr[1] = "";
arr[2] = "";
arr[3] = "";
}
}
[Benchmark]
public void SwapElements()
{
var arr = _strings;
(arr[1], arr[0]) = (arr[0], arr[1]);
}
[Benchmark]
public void SingleAssignment()
{
_strings[0] = "";
}
} |
Now that Main is .net 11, can we take a look at this? |
src/coreclr/jit/importercalls.cpp
Outdated
if (strcmp(className, "TypeCast") == 0) | ||
{ | ||
if (strcmp(methodName, "WriteBarrier") == 0) | ||
{ | ||
result = NI_System_Runtime_CompilerServices_RuntimeHelpers_WriteBarrier; | ||
} | ||
} | ||
else if (strcmp(namespaceName, "CompilerServices") == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (strcmp(className, "TypeCast") == 0) | |
{ | |
if (strcmp(methodName, "WriteBarrier") == 0) | |
{ | |
result = NI_System_Runtime_CompilerServices_RuntimeHelpers_WriteBarrier; | |
} | |
} | |
else if (strcmp(namespaceName, "CompilerServices") == 0) | |
if (strcmp(namespaceName, "CompilerServices") == 0) |
And move the WriteBarrier into RuntimeHelpers for NAOT instead? We prefer things to be the same between NAOT and non-NAOT where possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas I could not use [Intrinsic] for RuntimeHelpers inside Runtime.Base (and presumably in Test.CoreLib) 🙁 so this was exactly why I ended up with this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? Runtime.Base has its own simplified copy of [Intrinsic]
: https://github.com/dotnet/runtime/blob/main/src/coreclr/nativeaot/Runtime.Base/src/System/Runtime/CompilerServices/IntrinsicAttribute.cs
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ignore me, I think I fixed it 🙂
This issue talks about removing store covariance check. I do not see it fixed by this change. |
Does this optimization ever kicks in for NAOT? Or does it only ever kick in with runtime collected PGO data? |
it is fixed by this change, once it inlines, it is smart enough to handle that, e.g example from that issue:
|
this optimization relies on Inliner to decide whether to inline the routine or not, and it turns out it needs a PGO to convince itself inlining large methods like this is profitable on hot paths. So it does need a profile on NAOT too. We may consider making inliner aggressive enough without PGO for PreferSpeed on NAOT. |
Co-authored-by: Jan Kotas <jkotas@microsoft.com>
src/libraries/Common/tests/System/Runtime/CompilerServices/RuntimeHelpers.cs
Outdated
Show resolved
Hide resolved
…timeHelpers.cs Co-authored-by: Jan Kotas <jkotas@microsoft.com>
What's the code for this method with this change?
There are two array store invariance checks in this method. One of them is optimized before this change. Is this change enabling the second one to be optimized (the improvement tracked by #9159)? I do not see what makes it possible. |
The codegen (win-x64) for this method is: G_M39739_IG01: ;; offset=0x0000
push rdi
push rsi
push rbp
push rbx
sub rsp, 40
mov rbx, rcx
mov esi, r8d
;; size=14 bbWeight=1 PerfScore 4.75
G_M39739_IG02: ;; offset=0x000E
mov edi, dword ptr [rbx+0x08]
cmp edx, edi
jae SHORT G_M39739_IG09
mov ecx, edx
mov rbp, gword ptr [rbx+8*rcx+0x10]
cmp esi, edi
jae SHORT G_M39739_IG09
mov edx, esi
mov rdx, gword ptr [rbx+8*rdx+0x10]
lea rcx, bword ptr [rbx+8*rcx+0x10]
call CORINFO_HELP_ASSIGN_REF
mov ecx, esi
mov edx, edi
cmp rdx, rcx
jbe SHORT G_M39739_IG06
lea rcx, bword ptr [rbx+8*rcx+0x10]
mov rdx, qword ptr [rbx]
mov rdx, qword ptr [rdx+0x30]
test rbp, rbp
je SHORT G_M39739_IG07
cmp rdx, qword ptr [rbp]
jne SHORT G_M39739_IG08
;; size=67 bbWeight=1 PerfScore 23.00
G_M39739_IG03: ;; offset=0x0051
mov rdx, rbp
call CORINFO_HELP_ASSIGN_REF
;; size=8 bbWeight=1 PerfScore 1.25
G_M39739_IG04: ;; offset=0x0059
nop
;; size=1 bbWeight=1 PerfScore 0.25
G_M39739_IG05: ;; offset=0x005A
add rsp, 40
pop rbx
pop rbp
pop rsi
pop rdi
ret
;; size=9 bbWeight=1 PerfScore 3.25
G_M39739_IG06: ;; offset=0x0063
call [System.Runtime.CompilerServices.CastHelpers:ThrowIndexOutOfRangeException()]
int3
;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG07: ;; offset=0x006A
xor rax, rax
mov gword ptr [rcx], rax
jmp SHORT G_M39739_IG04
;; size=7 bbWeight=0 PerfScore 0.00
G_M39739_IG08: ;; offset=0x0071
mov r8, 0x7FF8FA732308 ; System.Object[]
cmp qword ptr [rbx], r8
je SHORT G_M39739_IG03
mov r8, rbp
call [System.Runtime.CompilerServices.CastHelpers:StelemRef_Helper(byref,ptr,System.Object)]
jmp SHORT G_M39739_IG04
;; size=26 bbWeight=0 PerfScore 0.00
G_M39739_IG09: ;; offset=0x008B
call CORINFO_HELP_RNGCHKFAIL
int3
;; size=6 bbWeight=0 PerfScore 0.00
; Total bytes of code 145 |
I guess it's not entirely eliminated and the |
In order to eliminate the second invariant array store check, the JIT would have to figure out what the value came from the same array. Helper inlining won't help you with that. I do not think that this PR is fixing #9159 that's specifically about eliminating invariant array store checks in code that is swapping elements of an array. Invariant array store checks can be very expensive depending on the type of the array and type of the item being stored into the array. It is why it is interesting to eliminate them. |
(The idea behind this change sounds reasonable to me otherwise.) |
You're right I just misremembered that it did fix it when I was working on this. Changed the wording. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Jakob Botsch Nielsen <Jakob.botsch.nielsen@gmail.com>
Inline write barriers with covariance checks (in case if inliner decides that it's profitable) - this allows to remove redundant write barriers (previously we could only do that in early phases) and range checks (previously they were inside the helper call and now JIT can fold them).
Related: #9159
Benchmark
Linux-AMD (Genoa):
Linux-ARM64 (Cobalt100):