[ARM64] Incorrect HFA/HVA property calculation #35144

AntonLapounov · 2020-04-17T23:09:33Z

While reviewing changes in dotnet/coreclr#23675, I noticed that the code added to EEClass::CheckForHFA does not handle wrapped Vector64, Vector128, and Vector256 intrinsic types correctly. For instance, it does not distinguish a wrapped Vector64 and a double. Moreover, the elemSize check is skipped for wrapped Vector128 and Vector256 types. As a result, the HFA/HVA property may be calculated incorrectly. For instance (here { } denotes a struct):

{{Vector64}, Vector64} is incorrectly treated as non-HVA.
{{Vector64}, double} is incorrectly treated as HFA(double).
{Vector128, {Vector256}} is incorrectly treated as HVA(simd16).

The code in question:

runtime/src/coreclr/src/vm/class.cpp

Lines 1345 to 1381 in ce416f4

 case ELEMENT_TYPE_VALUETYPE: 

 { 

 #ifdef TARGET_ARM64 

 // hfa/hva types are unique by size, except for Vector64 which we can conveniently 

 // treat as if it were a double for ABI purposes. However, it only qualifies as 

 // an HVA if all fields are the same type. This will ensure that we only 

 // consider it an HVA if all the fields are ELEMENT_TYPE_VALUETYPE (which have been 

 // determined above to be vectors) of the same size. 

 MethodTable* pMT; 

 #if defined(FEATURE_HFA) 

 pMT = pByValueClassCache[i]; 

 #else 

 pMT = pFD->LookupApproxFieldTypeHandle().AsMethodTable(); 

 #endif 

 int thisElemSize = pMT->GetVectorSize(); 

 if (thisElemSize != 0) 

 { 

 if (elemSize == 0) 

 { 

 elemSize = thisElemSize; 

 } 

 else if ((thisElemSize != elemSize) || (hfaType != ELEMENT_TYPE_VALUETYPE)) 

 { 

 return false; 

 } 

 } 

 else 

 #endif // TARGET_ARM64 

 { 

 #if defined(FEATURE_HFA) 

 fieldType = pByValueClassCache[i]->GetHFAType(); 

 #else 

 fieldType = pFD->LookupApproxFieldTypeHandle().AsMethodTable()->GetHFAType(); 

 #endif 

 } 

 } 

 break;

The repro program (set COMPlus_JITDisasm=C::* to see HFA/HVA properties and registers used):

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

#pragma warning disable 0169 // warning CS0169: The field '{0}' is never used
struct WrappedVector64  { Vector64<byte> _; }
struct WrappedVector128 { Vector128<byte> _; }
struct WrappedVector256 { Vector256<byte> _; }

// Incorrectly treated as non-HVA: passed in x0, x1
struct S1 { WrappedVector64 x; Vector64<byte> y; }

// Incorrectly treated as HFA(double): passed in d0, d1
struct S2 { WrappedVector64 x; double y; }

// Incorrectly treated as HVA(simd16): passed in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

static class C
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Foo<T>(T x) { }

    static void Main()
    {
        Foo(new S1());
        Foo(new S2());
        Foo(new S3());
    }
}

@CarolEidt @echesakovMSFT @sdmaclea

The text was updated successfully, but these errors were encountered:

sdmaclea · 2020-04-17T23:41:13Z

@AntonLapounov and I spent a lot of time discussing this earlier this week. I believe his assertions above are correct.

It should also be noted that @AntonLapounov compared VC++ compiler and our JIT regarding handling doubles and vector types to confirm our analysis.

davidwrighton · 2020-04-17T23:42:41Z

@CarolEidt needs to be aware of this.

sdmaclea · 2020-04-17T23:44:33Z

For the Vector256<T> case, it is a bit unclear. Since it is not a hardware supported arm64 SIMD type, our handling is debatable. Vector256<T> could be treated as if it were an HVA {Vector128<T>, Vector128<T>}. In which case this could be...

// Correctly treated as HVA(simd16): passed in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

CarolEidt · 2020-04-18T00:03:47Z

Thanks @davidwrighton - I saw this. This is unfortunate as it will require a more complex interaction between the VM and the JIT. I suspect it needs some VM/JIT collaboration to address.

AntonLapounov · 2020-04-21T00:09:17Z

For reference, the C++ definitions below are equivalent to the S1 and the S2 managed cases above and demonstrate correct behavior of VC++ and GCC compilers regarding the ABI spec. There is no equivalent of the S3 case.

#include <arm_neon.h>

// HVA(__n64): passed in d0, d1 registers
struct S1 { 
    struct { uint8x8_t z; } x;
    uint8x8_t y;
};

// Non-HFA/HVA: passed in x0, x1 registers
struct S2 { 
    struct { uint8x8_t z; } x;
    double y;
};

void Foo(S1 x) { }
void Foo(S2 x) { }

AntonLapounov · 2020-04-21T00:16:49Z

Vector256<T> could be treated as if it were an HVA {Vector128<T>, Vector128<T>}.

Yes, but that might be not future-proof if one day Neon registers are extended to 256 bits.

sdmaclea · 2020-04-21T00:38:21Z

be not future-proof

Possibly true.

Although we have Arm64 feature bits to controll what architecture we are targeting.

if one day Neon registers are extended to 256 bits.

It is unlikely. The next generation SIMD for ARM is the scalable vector extension. It generalizes the handling of vectors. It is designed to allow hardware to add support for longer vectors (up to 2048 bits).

The extending neon registers would require new instructions. As neon encodes the vector length in the instruction.

AntonLapounov · 2020-04-21T16:09:22Z

@sdmaclea Sounds reasonable. Anyway, we still have a bug that the two structs below are handled differently at present. That is caused by ignoring the size of the nested Vector256<byte> field for S3, but taking it into account for S4.

// Treated as HVA(simd16): passed by value in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

// Treated as non-HVA: passed by reference in x0
struct S4 { Vector128<byte> x; Vector256<byte> y; }

davidwrighton · 2020-04-21T18:37:14Z

Calling conventions for Vector256 is quite complex. For instance on X64, the appropriate calling convention changes based on whether or not the architecture has AVX support. I'm slowing working on a fix to make the runtime/JIT handle this correctly, but its slow going.

Fix dotnet#35144

* Fix HFA/HVA classification Fix #35144

AntonLapounov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 17, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 17, 2020

BruceForstall added this to the 5.0 milestone Apr 20, 2020

BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 20, 2020

AntonLapounov mentioned this issue May 1, 2020

Crossgen2: Support HVA for ARM64 #35576

Merged

sandreenko mentioned this issue Jun 3, 2020

HFA/Vector calling convention representation in the Jit on arm64. #37341

Closed

CarolEidt self-assigned this Jun 4, 2020

This was referenced Jun 5, 2020

ARM64: Vector64<T> is converted to HVA TYP_DOUBLE #35976

Closed

Fix HFA/HVA classification #37499

Merged

CarolEidt added a commit to CarolEidt/runtime that referenced this issue Jun 9, 2020

Fix HFA/HVA classification

61b07f0

Fix dotnet#35144

CarolEidt closed this as completed in #37499 Jun 12, 2020

CarolEidt added a commit that referenced this issue Jun 12, 2020

Fix HFA/HVA classification (#37499)

f7458a8

* Fix HFA/HVA classification Fix #35144

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM64] Incorrect HFA/HVA property calculation #35144

[ARM64] Incorrect HFA/HVA property calculation #35144

AntonLapounov commented Apr 17, 2020

sdmaclea commented Apr 17, 2020 •

edited

Loading

davidwrighton commented Apr 17, 2020

sdmaclea commented Apr 17, 2020 •

edited

Loading

CarolEidt commented Apr 18, 2020

AntonLapounov commented Apr 21, 2020

AntonLapounov commented Apr 21, 2020

sdmaclea commented Apr 21, 2020

AntonLapounov commented Apr 21, 2020

davidwrighton commented Apr 21, 2020

[ARM64] Incorrect HFA/HVA property calculation #35144

[ARM64] Incorrect HFA/HVA property calculation #35144

Comments

AntonLapounov commented Apr 17, 2020

sdmaclea commented Apr 17, 2020 • edited Loading

davidwrighton commented Apr 17, 2020

sdmaclea commented Apr 17, 2020 • edited Loading

CarolEidt commented Apr 18, 2020

AntonLapounov commented Apr 21, 2020

AntonLapounov commented Apr 21, 2020

sdmaclea commented Apr 21, 2020

AntonLapounov commented Apr 21, 2020

davidwrighton commented Apr 21, 2020

sdmaclea commented Apr 17, 2020 •

edited

Loading

sdmaclea commented Apr 17, 2020 •

edited

Loading