Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Widespread regressions after #71498 (Spans to use ref fields) #71667

Closed
EgorBo opened this issue Jul 5, 2022 · 5 comments · Fixed by #71673
Closed

Widespread regressions after #71498 (Spans to use ref fields) #71667

EgorBo opened this issue Jul 5, 2022 · 5 comments · Fixed by #71673
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Jul 5, 2022

We're seeing a lot of perf regressions (large reports in dotnet/perf-autofiling-issues#6587 and dotnet/perf-autofiling-issues#6588) due to #71498

Example of a regression:
image

@DrewScoggins noticed a similar picture on TE:
image

Codegen example:

Span<byte> Test(byte[] array) => array;

Before #71498:

; Method Program:Test(System.Byte[]):System.Span`1[Byte]:this
G_M42296_IG01:
G_M42296_IG02:
       test     r8, r8
       je       SHORT G_M42296_IG06
G_M42296_IG03:
       lea      rcx, bword ptr [r8+16]
       mov      r8d, dword ptr [r8+8]
G_M42296_IG04:
       mov      bword ptr [rdx], rcx
       mov      dword ptr [rdx+8], r8d
       mov      rax, rdx
G_M42296_IG05:
       ret      
G_M42296_IG06:
       xor      rcx, rcx
       xor      r8d, r8d
       jmp      SHORT G_M42296_IG04
; Total bytes of code: 31

After #71498:

; Method Program:Test(System.Byte[]):System.Span`1[Byte]:this
G_M42296_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 16
       vzeroupper 
       mov      rbx, rdx
G_M42296_IG02:
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rsp], xmm0
       test     r8, r8
       je       SHORT G_M42296_IG06
G_M42296_IG03:
       lea      rax, bword ptr [r8+16]
       mov      bword ptr [rsp], rax
       mov      eax, dword ptr [r8+8]
       mov      dword ptr [rsp+08H], eax
G_M42296_IG04:
       mov      rdi, rbx
       lea      rsi, bword ptr [rsp]
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
       mov      rax, rbx
G_M42296_IG05:
       add      rsp, 16
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
G_M42296_IG06:
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rsp], xmm0
       jmp      SHORT G_M42296_IG04
; Total bytes of code: 79

Obviously, looks like a JIT issue that wasn't ready for that change, the question is - is it fixable in a short term..

/cc @tannergooding @kunalspathak @DrewScoggins @AaronRobinsonMSFT @dotnet/jit-contrib

@EgorBo EgorBo added the tenet-performance Performance related issue label Jul 5, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 5, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 5, 2022
@ghost
Copy link

ghost commented Jul 5, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

We're seeing a lot of perf regressions (still being generated) due to #71498

Example of a regression:
image

@DrewScoggins noticed a similar picture on TE:
image

Codegen example:

Span<byte> Test(byte[] array) => array;

Was:

; Method Program:Test(System.Byte[]):System.Span`1[Byte]:this
G_M42296_IG01:
G_M42296_IG02:
       test     r8, r8
       je       SHORT G_M42296_IG06
G_M42296_IG03:
       lea      rcx, bword ptr [r8+16]
       mov      r8d, dword ptr [r8+8]
G_M42296_IG04:
       mov      bword ptr [rdx], rcx
       mov      dword ptr [rdx+8], r8d
       mov      rax, rdx
G_M42296_IG05:
       ret      
G_M42296_IG06:
       xor      rcx, rcx
       xor      r8d, r8d
       jmp      SHORT G_M42296_IG04
; Total bytes of code: 31

After #71498:

; Method Program:Test(System.Byte[]):System.Span`1[Byte]:this
G_M42296_IG01:
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 16
       vzeroupper 
       mov      rbx, rdx
G_M42296_IG02:
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rsp], xmm0
       test     r8, r8
       je       SHORT G_M42296_IG06
G_M42296_IG03:
       lea      rax, bword ptr [r8+16]
       mov      bword ptr [rsp], rax
       mov      eax, dword ptr [r8+8]
       mov      dword ptr [rsp+08H], eax
G_M42296_IG04:
       mov      rdi, rbx
       lea      rsi, bword ptr [rsp]
       call     CORINFO_HELP_ASSIGN_BYREF
       movsq    
       mov      rax, rbx
G_M42296_IG05:
       add      rsp, 16
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
G_M42296_IG06:
       vxorps   xmm0, xmm0
       vmovdqu  xmmword ptr [rsp], xmm0
       jmp      SHORT G_M42296_IG04
; Total bytes of code: 79

Obviously, looks like a JIT issue that wasn't ready for that change, the question is - is it fixable in a short term..

/cc @tannergooding @kunalspathak @DrewScoggins @AaronRobinsonMSFT @dotnet/jit-contrib

Author: EgorBo
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2022

JitDump diff: https://www.diffchecker.com/4Tn7vDrw (before vs after)

@EgorBo EgorBo added this to the 7.0.0 milestone Jul 5, 2022
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Jul 5, 2022
@tannergooding
Copy link
Member

I think the PR in general is still fairly critical to properly vetting and shipping ref fields in .NET 7/C# 11 and so it should stay in with a planned fix, unless something egregious limits that.

It looks like this is likely coming down to two main points:

  1. We now have a call call CORINFO_HELP_ASSIGN_BYREF in the midst which will add several cycles, potentially cause spilling/etc
  2. We now have zeroing where we didn't before vxorps xmm0, xmm0; vmovdqu xmmword ptr [rsp], xmm0. This is potentially due to the field being a TYP_BYREF and the normal expectation being that GC tracked types always require zeroing

CC. @AaronRobinsonMSFT as requested

@SingleAccretion
Copy link
Contributor

The bulk of the regression is likely to have been caused by the fact the Jit now sees Span<T> as having "custom layout". This causes a bunch of pessimizations in the transformations morph performs on promoted structs.

For reference, here's Jit code that talks about the problem:

// If we have "Custom Layout" then we might have an explicit Size attribute
// Managed C++ uses this for its structs, such C++ types will not contain GC pointers.
//
// The current VM implementation also incorrectly sets the CORINFO_FLG_CUSTOMLAYOUT
// whenever a managed value class contains any GC pointers.
// (See the comment for VMFLAG_NOT_TIGHTLY_PACKED in class.h)
//
// It is important to struct promote managed value classes that have GC pointers
// So we compute the correct value for "CustomLayout" here
//
if (StructHasCustomLayout(typeFlags) && ((typeFlags & CORINFO_FLG_CONTAINS_GC_PTR) == 0))
{
structPromotionInfo.customLayout = true;
}

Essentially, before the change, Span<T> had no CORINFO_FLG_CUSTOMLAYOUT flag, but now it does, and it also doesn't have the CORINFO_FLG_CONTAINS_GC_PTR flag, so the compensating logic above doesn't kick in.

Some possible fixes:
a) Work around the issue by adding CORINFO_FLG_BYREF_LIKE to the condition above. This will expose ref structs to another kind of bug: the compensating logic is not actually correct because it assumes no types with GC pointers can have "custom" (explicit / with Size metadata) layout, which is not true.
b) Fix the VM to not report "custom layout" for types which don't actually have the Size metadata or explicit layout. See also: #64863.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2022

For history: we also tried to introduce int __padding; field in Span and changing int _length to nint _length to make spans "without holes" and it even worked 😄 (discussed in the Discord channel)

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 6, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 6, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants