Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM64] Incorrect HFA/HVA property calculation #35144

Closed
AntonLapounov opened this issue Apr 17, 2020 · 9 comments · Fixed by #37499
Closed

[ARM64] Incorrect HFA/HVA property calculation #35144

AntonLapounov opened this issue Apr 17, 2020 · 9 comments · Fixed by #37499
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AntonLapounov
Copy link
Member

While reviewing changes in dotnet/coreclr#23675, I noticed that the code added to EEClass::CheckForHFA does not handle wrapped Vector64, Vector128, and Vector256 intrinsic types correctly. For instance, it does not distinguish a wrapped Vector64 and a double. Moreover, the elemSize check is skipped for wrapped Vector128 and Vector256 types. As a result, the HFA/HVA property may be calculated incorrectly. For instance (here { } denotes a struct):

  • {{Vector64}, Vector64} is incorrectly treated as non-HVA.
  • {{Vector64}, double} is incorrectly treated as HFA(double).
  • {Vector128, {Vector256}} is incorrectly treated as HVA(simd16).

The code in question:

case ELEMENT_TYPE_VALUETYPE:
{
#ifdef TARGET_ARM64
// hfa/hva types are unique by size, except for Vector64 which we can conveniently
// treat as if it were a double for ABI purposes. However, it only qualifies as
// an HVA if all fields are the same type. This will ensure that we only
// consider it an HVA if all the fields are ELEMENT_TYPE_VALUETYPE (which have been
// determined above to be vectors) of the same size.
MethodTable* pMT;
#if defined(FEATURE_HFA)
pMT = pByValueClassCache[i];
#else
pMT = pFD->LookupApproxFieldTypeHandle().AsMethodTable();
#endif
int thisElemSize = pMT->GetVectorSize();
if (thisElemSize != 0)
{
if (elemSize == 0)
{
elemSize = thisElemSize;
}
else if ((thisElemSize != elemSize) || (hfaType != ELEMENT_TYPE_VALUETYPE))
{
return false;
}
}
else
#endif // TARGET_ARM64
{
#if defined(FEATURE_HFA)
fieldType = pByValueClassCache[i]->GetHFAType();
#else
fieldType = pFD->LookupApproxFieldTypeHandle().AsMethodTable()->GetHFAType();
#endif
}
}
break;

The repro program (set COMPlus_JITDisasm=C::* to see HFA/HVA properties and registers used):

using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;

#pragma warning disable 0169 // warning CS0169: The field '{0}' is never used
struct WrappedVector64  { Vector64<byte> _; }
struct WrappedVector128 { Vector128<byte> _; }
struct WrappedVector256 { Vector256<byte> _; }

// Incorrectly treated as non-HVA: passed in x0, x1
struct S1 { WrappedVector64 x; Vector64<byte> y; }

// Incorrectly treated as HFA(double): passed in d0, d1
struct S2 { WrappedVector64 x; double y; }

// Incorrectly treated as HVA(simd16): passed in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

static class C
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Foo<T>(T x) { }

    static void Main()
    {
        Foo(new S1());
        Foo(new S2());
        Foo(new S3());
    }
}

@CarolEidt @echesakovMSFT @sdmaclea

@AntonLapounov AntonLapounov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Apr 17, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 17, 2020
@sdmaclea
Copy link
Contributor

sdmaclea commented Apr 17, 2020

@AntonLapounov and I spent a lot of time discussing this earlier this week. I believe his assertions above are correct.

It should also be noted that @AntonLapounov compared VC++ compiler and our JIT regarding handling doubles and vector types to confirm our analysis.

@davidwrighton
Copy link
Member

@CarolEidt needs to be aware of this.

@sdmaclea
Copy link
Contributor

sdmaclea commented Apr 17, 2020

For the Vector256<T> case, it is a bit unclear. Since it is not a hardware supported arm64 SIMD type, our handling is debatable. Vector256<T> could be treated as if it were an HVA {Vector128<T>, Vector128<T>}. In which case this could be...

// Correctly treated as HVA(simd16): passed in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

@CarolEidt
Copy link
Contributor

Thanks @davidwrighton - I saw this. This is unfortunate as it will require a more complex interaction between the VM and the JIT. I suspect it needs some VM/JIT collaboration to address.

@BruceForstall BruceForstall added this to the 5.0 milestone Apr 20, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 20, 2020
@AntonLapounov
Copy link
Member Author

For reference, the C++ definitions below are equivalent to the S1 and the S2 managed cases above and demonstrate correct behavior of VC++ and GCC compilers regarding the ABI spec. There is no equivalent of the S3 case.

#include <arm_neon.h>

// HVA(__n64): passed in d0, d1 registers
struct S1 { 
    struct { uint8x8_t z; } x;
    uint8x8_t y;
};

// Non-HFA/HVA: passed in x0, x1 registers
struct S2 { 
    struct { uint8x8_t z; } x;
    double y;
};

void Foo(S1 x) { }
void Foo(S2 x) { }

@AntonLapounov
Copy link
Member Author

Vector256<T> could be treated as if it were an HVA {Vector128<T>, Vector128<T>}.

Yes, but that might be not future-proof if one day Neon registers are extended to 256 bits.

@sdmaclea
Copy link
Contributor

be not future-proof

Possibly true.

Although we have Arm64 feature bits to controll what architecture we are targeting.

if one day Neon registers are extended to 256 bits.

It is unlikely. The next generation SIMD for ARM is the scalable vector extension. It generalizes the handling of vectors. It is designed to allow hardware to add support for longer vectors (up to 2048 bits).

The extending neon registers would require new instructions. As neon encodes the vector length in the instruction.

@AntonLapounov
Copy link
Member Author

@sdmaclea Sounds reasonable. Anyway, we still have a bug that the two structs below are handled differently at present. That is caused by ignoring the size of the nested Vector256<byte> field for S3, but taking it into account for S4.

// Treated as HVA(simd16): passed by value in q0, q1, q2
struct S3 { Vector128<byte> x; WrappedVector256 y; }

// Treated as non-HVA: passed by reference in x0
struct S4 { Vector128<byte> x; Vector256<byte> y; }

@davidwrighton
Copy link
Member

Calling conventions for Vector256 is quite complex. For instance on X64, the appropriate calling convention changes based on whether or not the architecture has AVX support. I'm slowing working on a fix to make the runtime/JIT handle this correctly, but its slow going.

@CarolEidt CarolEidt self-assigned this Jun 4, 2020
CarolEidt added a commit to CarolEidt/runtime that referenced this issue Jun 9, 2020
CarolEidt added a commit that referenced this issue Jun 12, 2020
* Fix HFA/HVA classification

Fix #35144
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants