Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Avx10v1 to the runtime #99784

Merged
merged 10 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -760,6 +760,10 @@ RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX512F, W("EnableAVX512F
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX512F_VL, W("EnableAVX512F_VL"), 1, "Allows AVX512F_VL+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX512VBMI, W("EnableAVX512VBMI"), 1, "Allows AVX512VBMI+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX512VBMI_VL, W("EnableAVX512VBMI_VL"), 1, "Allows AVX512VBMI_VL+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX10v1, W("EnableAVX10v1"), 1, "Allows AVX10v1+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX10v1_V256, W("EnableAVX10v1_V256"), 1, "Allows AVX10v1_V256+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVX10v1_V512, W("EnableAVX10v1_V512"), 1, "Allows AVX10v1_V512+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_Avx10MaxVectorLength, W("Avx10MaxVectorLength"), 0, "The max vector length supported", CLRConfig::LookupOptions::ParseIntegerAsBase10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these three.

In general these DOTNET_Enable{Isa} knobs are primarily there for testing purposes. That is, they exist so that developers with newer hardware can still test downlevel code paths without having to recompile their applications. Such code is not typically meant for use in actual production scenarios.

To that end, we typically expose one knob per CPUID ISA bit and so you can disable Avx but not Avx.X64. Avx512F.VL was somewhat an exception and in hindsight, probably a knob we don't actually need to let users control. I expect that the V256 and V512 nested classes fit into the same bucket, where users don't need the ability to enable/disable them directly and independently of Avx10 itself.

Instead, users who need to control access to the used vector bit width can utilize the DOTNET_PreferredVectorBitWidth knob and the corresponding Vector###.IsHardwareAccelerated checks, which is how they're expected to detect this in existing code paths.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- @BruceForstall, there should be no issue in removing a config knob like the various DOTNET_EnableAvx512*_VL entries, right?

We can simply remove them and document that users should utilize DOTNET_PreferredVectorBitWidth and DOTNET_EnableAvx512* instead? This would allow a consistent user story here and reduce the overall complexity we need to support/consider

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any problem with removing the DOTNET_EnableAvx512*_VL configs. They are available in Release and have been shipped, and use the EXTERNAL prefix, which indicates (or at least historically indicated) a documented config. (Maybe we should have used the prefix UNSUPPORTED?) So technically removing them is a breaking change, but it doesn't seem like a problem.

Copy link
Contributor Author

@Ruihan-Yin Ruihan-Yin Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback.

Please correct me if I am wrong, so we want to remove DOTNET_EnableAvx10v1_V256/512 and DOTNET_Avx10MaxVectorLength. And keep DOTNET_EnableAvx10v1, let it plus DOTNET_PreferredVectorBitWidth to control the emit behavior of Vector256/512 APIs. (Which I suppose won't be covered in this PR.)

As for the VL vars, do we want to handle it in this PR, or separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we want to

Yep!

Which I suppose won't be covered in this PR

This should already be somewhat implicit based on the existing checks we have in the JIT, so I don't expect we need to do anything special. Users will still see Avx10v1.V512 report supported if the hardware actually supports it and they would simply check Vector512.IsHardwareAccelerated if they don't want limit 512-bit usage to the user selected behavior (which is controlled by PreferredVectorBitWidth).

As for the VL vars, do we want to handle it in this PR, or separately?

I'll cover it in a separate PR.

RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableAVXVNNI, W("EnableAVXVNNI"), 1, "Allows AVXVNNI+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableBMI1, W("EnableBMI1"), 1, "Allows BMI1+ hardware intrinsics to be disabled")
RETAIL_CONFIG_DWORD_INFO(EXTERNAL_EnableBMI2, W("EnableBMI2"), 1, "Allows BMI2+ hardware intrinsics to be disabled")
Expand Down
152 changes: 88 additions & 64 deletions src/coreclr/inc/corinfoinstructionset.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,38 +75,41 @@ enum CORINFO_InstructionSet
InstructionSet_AVX512DQ_VL=30,
InstructionSet_AVX512VBMI=31,
InstructionSet_AVX512VBMI_VL=32,
InstructionSet_VectorT128=33,
InstructionSet_VectorT256=34,
InstructionSet_VectorT512=35,
InstructionSet_X86Base_X64=36,
InstructionSet_SSE_X64=37,
InstructionSet_SSE2_X64=38,
InstructionSet_SSE3_X64=39,
InstructionSet_SSSE3_X64=40,
InstructionSet_SSE41_X64=41,
InstructionSet_SSE42_X64=42,
InstructionSet_AVX_X64=43,
InstructionSet_AVX2_X64=44,
InstructionSet_AES_X64=45,
InstructionSet_BMI1_X64=46,
InstructionSet_BMI2_X64=47,
InstructionSet_FMA_X64=48,
InstructionSet_LZCNT_X64=49,
InstructionSet_PCLMULQDQ_X64=50,
InstructionSet_POPCNT_X64=51,
InstructionSet_AVXVNNI_X64=52,
InstructionSet_MOVBE_X64=53,
InstructionSet_X86Serialize_X64=54,
InstructionSet_AVX512F_X64=55,
InstructionSet_AVX512F_VL_X64=56,
InstructionSet_AVX512BW_X64=57,
InstructionSet_AVX512BW_VL_X64=58,
InstructionSet_AVX512CD_X64=59,
InstructionSet_AVX512CD_VL_X64=60,
InstructionSet_AVX512DQ_X64=61,
InstructionSet_AVX512DQ_VL_X64=62,
InstructionSet_AVX512VBMI_X64=63,
InstructionSet_AVX512VBMI_VL_X64=64,
InstructionSet_AVX10v1=33,
InstructionSet_AVX10v1_V256=34,
InstructionSet_AVX10v1_V512=35,
InstructionSet_VectorT128=36,
InstructionSet_VectorT256=37,
InstructionSet_VectorT512=38,
InstructionSet_X86Base_X64=39,
InstructionSet_SSE_X64=40,
InstructionSet_SSE2_X64=41,
InstructionSet_SSE3_X64=42,
InstructionSet_SSSE3_X64=43,
InstructionSet_SSE41_X64=44,
InstructionSet_SSE42_X64=45,
InstructionSet_AVX_X64=46,
InstructionSet_AVX2_X64=47,
InstructionSet_AES_X64=48,
InstructionSet_BMI1_X64=49,
InstructionSet_BMI2_X64=50,
InstructionSet_FMA_X64=51,
InstructionSet_LZCNT_X64=52,
InstructionSet_PCLMULQDQ_X64=53,
InstructionSet_POPCNT_X64=54,
InstructionSet_AVXVNNI_X64=55,
InstructionSet_MOVBE_X64=56,
InstructionSet_X86Serialize_X64=57,
InstructionSet_AVX512F_X64=58,
InstructionSet_AVX512F_VL_X64=59,
InstructionSet_AVX512BW_X64=60,
InstructionSet_AVX512BW_VL_X64=61,
InstructionSet_AVX512CD_X64=62,
InstructionSet_AVX512CD_VL_X64=63,
InstructionSet_AVX512DQ_X64=64,
InstructionSet_AVX512DQ_VL_X64=65,
InstructionSet_AVX512VBMI_X64=66,
InstructionSet_AVX512VBMI_VL_X64=67,
#endif // TARGET_AMD64
#ifdef TARGET_X86
InstructionSet_X86Base=1,
Expand Down Expand Up @@ -141,38 +144,41 @@ enum CORINFO_InstructionSet
InstructionSet_AVX512DQ_VL=30,
InstructionSet_AVX512VBMI=31,
InstructionSet_AVX512VBMI_VL=32,
InstructionSet_VectorT128=33,
InstructionSet_VectorT256=34,
InstructionSet_VectorT512=35,
InstructionSet_X86Base_X64=36,
InstructionSet_SSE_X64=37,
InstructionSet_SSE2_X64=38,
InstructionSet_SSE3_X64=39,
InstructionSet_SSSE3_X64=40,
InstructionSet_SSE41_X64=41,
InstructionSet_SSE42_X64=42,
InstructionSet_AVX_X64=43,
InstructionSet_AVX2_X64=44,
InstructionSet_AES_X64=45,
InstructionSet_BMI1_X64=46,
InstructionSet_BMI2_X64=47,
InstructionSet_FMA_X64=48,
InstructionSet_LZCNT_X64=49,
InstructionSet_PCLMULQDQ_X64=50,
InstructionSet_POPCNT_X64=51,
InstructionSet_AVXVNNI_X64=52,
InstructionSet_MOVBE_X64=53,
InstructionSet_X86Serialize_X64=54,
InstructionSet_AVX512F_X64=55,
InstructionSet_AVX512F_VL_X64=56,
InstructionSet_AVX512BW_X64=57,
InstructionSet_AVX512BW_VL_X64=58,
InstructionSet_AVX512CD_X64=59,
InstructionSet_AVX512CD_VL_X64=60,
InstructionSet_AVX512DQ_X64=61,
InstructionSet_AVX512DQ_VL_X64=62,
InstructionSet_AVX512VBMI_X64=63,
InstructionSet_AVX512VBMI_VL_X64=64,
InstructionSet_AVX10v1=33,
InstructionSet_AVX10v1_V256=34,
InstructionSet_AVX10v1_V512=35,
InstructionSet_VectorT128=36,
InstructionSet_VectorT256=37,
InstructionSet_VectorT512=38,
InstructionSet_X86Base_X64=39,
InstructionSet_SSE_X64=40,
InstructionSet_SSE2_X64=41,
InstructionSet_SSE3_X64=42,
InstructionSet_SSSE3_X64=43,
InstructionSet_SSE41_X64=44,
InstructionSet_SSE42_X64=45,
InstructionSet_AVX_X64=46,
InstructionSet_AVX2_X64=47,
InstructionSet_AES_X64=48,
InstructionSet_BMI1_X64=49,
InstructionSet_BMI2_X64=50,
InstructionSet_FMA_X64=51,
InstructionSet_LZCNT_X64=52,
InstructionSet_PCLMULQDQ_X64=53,
InstructionSet_POPCNT_X64=54,
InstructionSet_AVXVNNI_X64=55,
InstructionSet_MOVBE_X64=56,
InstructionSet_X86Serialize_X64=57,
InstructionSet_AVX512F_X64=58,
InstructionSet_AVX512F_VL_X64=59,
InstructionSet_AVX512BW_X64=60,
InstructionSet_AVX512BW_VL_X64=61,
InstructionSet_AVX512CD_X64=62,
InstructionSet_AVX512CD_VL_X64=63,
InstructionSet_AVX512DQ_X64=64,
InstructionSet_AVX512DQ_VL_X64=65,
InstructionSet_AVX512VBMI_X64=66,
InstructionSet_AVX512VBMI_VL_X64=67,
#endif // TARGET_X86

};
Expand Down Expand Up @@ -902,6 +908,12 @@ inline const char *InstructionSetToString(CORINFO_InstructionSet instructionSet)
return "AVX512VBMI_VL";
case InstructionSet_AVX512VBMI_VL_X64 :
return "AVX512VBMI_VL_X64";
case InstructionSet_AVX10v1 :
return "AVX10v1";
case InstructionSet_AVX10v1_V256 :
return "AVX10v1_V256";
case InstructionSet_AVX10v1_V512 :
return "AVX10v1_V512";
case InstructionSet_VectorT128 :
return "VectorT128";
case InstructionSet_VectorT256 :
Expand Down Expand Up @@ -974,6 +986,12 @@ inline const char *InstructionSetToString(CORINFO_InstructionSet instructionSet)
return "AVX512VBMI";
case InstructionSet_AVX512VBMI_VL :
return "AVX512VBMI_VL";
case InstructionSet_AVX10v1 :
return "AVX10v1";
case InstructionSet_AVX10v1_V256 :
return "AVX10v1_V256";
case InstructionSet_AVX10v1_V512 :
return "AVX10v1_V512";
case InstructionSet_VectorT128 :
return "VectorT128";
case InstructionSet_VectorT256 :
Expand Down Expand Up @@ -1044,6 +1062,9 @@ inline CORINFO_InstructionSet InstructionSetFromR2RInstructionSet(ReadyToRunInst
case READYTORUN_INSTRUCTION_Avx512DQ_VL: return InstructionSet_AVX512DQ_VL;
case READYTORUN_INSTRUCTION_Avx512Vbmi: return InstructionSet_AVX512VBMI;
case READYTORUN_INSTRUCTION_Avx512Vbmi_VL: return InstructionSet_AVX512VBMI_VL;
case READYTORUN_INSTRUCTION_Avx10v1: return InstructionSet_AVX10v1;
case READYTORUN_INSTRUCTION_Avx10v1_V256: return InstructionSet_AVX10v1_V256;
case READYTORUN_INSTRUCTION_Avx10v1_V512: return InstructionSet_AVX10v1_V512;
case READYTORUN_INSTRUCTION_VectorT128: return InstructionSet_VectorT128;
case READYTORUN_INSTRUCTION_VectorT256: return InstructionSet_VectorT256;
case READYTORUN_INSTRUCTION_VectorT512: return InstructionSet_VectorT512;
Expand Down Expand Up @@ -1078,6 +1099,9 @@ inline CORINFO_InstructionSet InstructionSetFromR2RInstructionSet(ReadyToRunInst
case READYTORUN_INSTRUCTION_Avx512DQ_VL: return InstructionSet_AVX512DQ_VL;
case READYTORUN_INSTRUCTION_Avx512Vbmi: return InstructionSet_AVX512VBMI;
case READYTORUN_INSTRUCTION_Avx512Vbmi_VL: return InstructionSet_AVX512VBMI_VL;
case READYTORUN_INSTRUCTION_Avx10v1: return InstructionSet_AVX10v1;
case READYTORUN_INSTRUCTION_Avx10v1_V256: return InstructionSet_AVX10v1_V256;
case READYTORUN_INSTRUCTION_Avx10v1_V512: return InstructionSet_AVX10v1_V512;
case READYTORUN_INSTRUCTION_VectorT128: return InstructionSet_VectorT128;
case READYTORUN_INSTRUCTION_VectorT256: return InstructionSet_VectorT256;
case READYTORUN_INSTRUCTION_VectorT512: return InstructionSet_VectorT512;
Expand Down
10 changes: 5 additions & 5 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,11 @@ typedef const GUID *LPCGUID;
#define GUID_DEFINED
#endif // !GUID_DEFINED

constexpr GUID JITEEVersionIdentifier = { /* 86eab154-5d93-4fad-bc07-e94fd9268b70 */
0x86eab154,
0x5d93,
0x4fad,
{0xbc, 0x07, 0xe9, 0x4f, 0xd9, 0x26, 0x8b, 0x70}
constexpr GUID JITEEVersionIdentifier = { /* 0c094642-1416-492c-a49d-9ababfa6f7d1 */
0x0c094642,
0x1416,
0x492c,
{0xa4, 0x9d, 0x9a, 0xba, 0xbf, 0xa6, 0xf7, 0xd1}
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////
Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/inc/readytoruninstructionset.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ enum ReadyToRunInstructionSet
READYTORUN_INSTRUCTION_VectorT512=41,
READYTORUN_INSTRUCTION_Rcpc2=42,
READYTORUN_INSTRUCTION_Sve=43,
READYTORUN_INSTRUCTION_Avx10v1=44,
READYTORUN_INSTRUCTION_Avx10v1_V256=45,
READYTORUN_INSTRUCTION_Avx10v1_V512=46,

};

Expand Down
15 changes: 15 additions & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6241,6 +6241,21 @@ int Compiler::compCompile(CORINFO_MODULE_HANDLE classPtr,
{
instructionSetFlags.AddInstructionSet(InstructionSet_AVX512VBMI_VL);
}

if (JitConfig.EnableAVX10v1() != 0)
{
instructionSetFlags.AddInstructionSet(InstructionSet_AVX10v1);
}

if (JitConfig.EnableAVX10v1_V256() != 0)
{
instructionSetFlags.AddInstructionSet(InstructionSet_AVX10v1_V256);
}

if (JitConfig.EnableAVX10v1_V512() != 0)
{
instructionSetFlags.AddInstructionSet(InstructionSet_AVX10v1_V512);
}
#endif

// These calls are important and explicitly ordered to ensure that the flags are correct in
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/jitconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,10 @@ CONFIG_INTEGER(EnableAVX512F, W("EnableAVX512F"), 1) /
CONFIG_INTEGER(EnableAVX512F_VL, W("EnableAVX512F_VL"), 1) // Allows AVX512F+ AVX512VL+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableAVX512VBMI, W("EnableAVX512VBMI"), 1) // Allows AVX512VBMI+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableAVX512VBMI_VL, W("EnableAVX512VBMI_VL"), 1) // Allows AVX512VBMI_VL+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableAVX10v1, W("EnableAVX10v1"), 1) // Allows AVX10v1+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableAVX10v1_V256, W("EnableAVX10v1_V256"), 1) // Allows AVX10v1_V256+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableAVX10v1_V512, W("EnableAVX10v1_V512"), 1) // Allows AVX10v1_V512+ hardware intrinsics to be disabled
CONFIG_INTEGER(Avx10MaxVectorLength, W("Avx10MaxVectorLength"), 0) // The max vector length supported
CONFIG_INTEGER(EnableAVXVNNI, W("EnableAVXVNNI"), 1) // Allows AVXVNNI+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableBMI1, W("EnableBMI1"), 1) // Allows BMI1+ hardware intrinsics to be disabled
CONFIG_INTEGER(EnableBMI2, W("EnableBMI2"), 1) // Allows BMI2+ hardware intrinsics to be disabled
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ public enum ReadyToRunInstructionSet
VectorT512=41,
Rcpc2=42,
Sve=43,
Avx10v1=44,
Avx10v1_V256=45,
Avx10v1_V512=46,

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,9 @@ public static class ReadyToRunInstructionSetHelper
case InstructionSet.X64_AVX512VBMI_X64: return ReadyToRunInstructionSet.Avx512Vbmi;
case InstructionSet.X64_AVX512VBMI_VL: return ReadyToRunInstructionSet.Avx512Vbmi_VL;
case InstructionSet.X64_AVX512VBMI_VL_X64: return ReadyToRunInstructionSet.Avx512Vbmi_VL;
case InstructionSet.X64_AVX10v1: return ReadyToRunInstructionSet.Avx10v1;
case InstructionSet.X64_AVX10v1_V256: return ReadyToRunInstructionSet.Avx10v1_V256;
case InstructionSet.X64_AVX10v1_V512: return ReadyToRunInstructionSet.Avx10v1_V512;
case InstructionSet.X64_VectorT128: return ReadyToRunInstructionSet.VectorT128;
case InstructionSet.X64_VectorT256: return ReadyToRunInstructionSet.VectorT256;
case InstructionSet.X64_VectorT512: return ReadyToRunInstructionSet.VectorT512;
Expand Down Expand Up @@ -191,6 +194,9 @@ public static class ReadyToRunInstructionSetHelper
case InstructionSet.X86_AVX512VBMI_X64: return null;
case InstructionSet.X86_AVX512VBMI_VL: return ReadyToRunInstructionSet.Avx512Vbmi_VL;
case InstructionSet.X86_AVX512VBMI_VL_X64: return null;
case InstructionSet.X86_AVX10v1: return ReadyToRunInstructionSet.Avx10v1;
case InstructionSet.X86_AVX10v1_V256: return ReadyToRunInstructionSet.Avx10v1_V256;
case InstructionSet.X86_AVX10v1_V512: return ReadyToRunInstructionSet.Avx10v1_V512;
case InstructionSet.X86_VectorT128: return ReadyToRunInstructionSet.VectorT128;
case InstructionSet.X86_VectorT256: return ReadyToRunInstructionSet.VectorT256;
case InstructionSet.X86_VectorT512: return ReadyToRunInstructionSet.VectorT512;
Expand Down
Loading