Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GetHardwareIntrinsicId on 32bit platforms #110238

Merged
merged 2 commits into from
Dec 3, 2024

Conversation

MichalStrehovsky
Copy link
Member

AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in #109137.

Cc @dotnet/ilc-contrib

AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jkotas
Copy link
Member

jkotas commented Nov 28, 2024

cc @saucecontrol

@saucecontrol
Copy link
Member

There was a slight change to the ISA mapping logic in #109137, but I don't believe it could have caused any change in behavior. GetHardwareIntrinsicId is called during IsSupported property IL generation and only looks for a match to items in the lists built from

case TargetArchitecture.X64:
yield return new InstructionSetInfo("base", "X86Base", InstructionSet.X64_X86Base, true);
yield return new InstructionSetInfo("sse", "Sse", InstructionSet.X64_SSE, true);
yield return new InstructionSetInfo("sse2", "Sse2", InstructionSet.X64_SSE2, true);
yield return new InstructionSetInfo("sse3", "Sse3", InstructionSet.X64_SSE3, true);
yield return new InstructionSetInfo("ssse3", "Ssse3", InstructionSet.X64_SSSE3, true);
yield return new InstructionSetInfo("sse4.1", "Sse41", InstructionSet.X64_SSE41, true);
yield return new InstructionSetInfo("sse4.2", "Sse42", InstructionSet.X64_SSE42, true);
yield return new InstructionSetInfo("avx", "Avx", InstructionSet.X64_AVX, true);
yield return new InstructionSetInfo("avx2", "Avx2", InstructionSet.X64_AVX2, true);
yield return new InstructionSetInfo("aes", "Aes", InstructionSet.X64_AES, true);
yield return new InstructionSetInfo("bmi", "Bmi1", InstructionSet.X64_BMI1, true);
yield return new InstructionSetInfo("bmi2", "Bmi2", InstructionSet.X64_BMI2, true);
yield return new InstructionSetInfo("fma", "Fma", InstructionSet.X64_FMA, true);
yield return new InstructionSetInfo("lzcnt", "Lzcnt", InstructionSet.X64_LZCNT, true);
yield return new InstructionSetInfo("pclmul", "Pclmulqdq", InstructionSet.X64_PCLMULQDQ, true);
yield return new InstructionSetInfo("vpclmul", "Pclmulqdq_V256", InstructionSet.X64_PCLMULQDQ_V256, true);
yield return new InstructionSetInfo("vpclmul_v512", "Pclmulqdq_V512", InstructionSet.X64_PCLMULQDQ_V512, true);
yield return new InstructionSetInfo("popcnt", "Popcnt", InstructionSet.X64_POPCNT, true);
yield return new InstructionSetInfo("Vector128", "", InstructionSet.X64_Vector128, false);
yield return new InstructionSetInfo("Vector256", "", InstructionSet.X64_Vector256, false);
yield return new InstructionSetInfo("Vector512", "", InstructionSet.X64_Vector512, false);
yield return new InstructionSetInfo("avxvnni", "AvxVnni", InstructionSet.X64_AVXVNNI, true);
yield return new InstructionSetInfo("movbe", "Movbe", InstructionSet.X64_MOVBE, true);
yield return new InstructionSetInfo("serialize", "X86Serialize", InstructionSet.X64_X86Serialize, true);
yield return new InstructionSetInfo("evex", "EVEX", InstructionSet.X64_EVEX, true);
yield return new InstructionSetInfo("avx512f", "Avx512F", InstructionSet.X64_AVX512F, true);
yield return new InstructionSetInfo("avx512f_vl", "Avx512F_VL", InstructionSet.X64_AVX512F_VL, true);
yield return new InstructionSetInfo("avx512bw", "Avx512BW", InstructionSet.X64_AVX512BW, true);
yield return new InstructionSetInfo("avx512bw_vl", "Avx512BW_VL", InstructionSet.X64_AVX512BW_VL, true);
yield return new InstructionSetInfo("avx512cd", "Avx512CD", InstructionSet.X64_AVX512CD, true);
yield return new InstructionSetInfo("avx512cd_vl", "Avx512CD_VL", InstructionSet.X64_AVX512CD_VL, true);
yield return new InstructionSetInfo("avx512dq", "Avx512DQ", InstructionSet.X64_AVX512DQ, true);
yield return new InstructionSetInfo("avx512dq_vl", "Avx512DQ_VL", InstructionSet.X64_AVX512DQ_VL, true);
yield return new InstructionSetInfo("avx512vbmi", "Avx512Vbmi", InstructionSet.X64_AVX512VBMI, true);
yield return new InstructionSetInfo("avx512vbmi_vl", "Avx512Vbmi_VL", InstructionSet.X64_AVX512VBMI_VL, true);
yield return new InstructionSetInfo("avx10v1", "Avx10v1", InstructionSet.X64_AVX10v1, true);
yield return new InstructionSetInfo("avx10v1_v512", "Avx10v1_V512", InstructionSet.X64_AVX10v1_V512, true);
yield return new InstructionSetInfo("vectort128", "VectorT128", InstructionSet.X64_VectorT128, true);
yield return new InstructionSetInfo("vectort256", "VectorT256", InstructionSet.X64_VectorT256, true);
yield return new InstructionSetInfo("vectort512", "VectorT512", InstructionSet.X64_VectorT512, true);
yield return new InstructionSetInfo("apx", "Apx", InstructionSet.X64_APX, true);
yield return new InstructionSetInfo("avx10v2", "Avx10v2", InstructionSet.X64_AVX10v2, true);
yield return new InstructionSetInfo("avx10v2_v512", "Avx10v2_V512", InstructionSet.X64_AVX10v2_V512, true);
yield return new InstructionSetInfo("gfni", "Gfni", InstructionSet.X64_GFNI, true);
yield return new InstructionSetInfo("gfni_v256", "Gfni_V256", InstructionSet.X64_GFNI_V256, true);
yield return new InstructionSetInfo("gfni_v512", "Gfni_V512", InstructionSet.X64_GFNI_V512, true);
break;
case TargetArchitecture.X86:
yield return new InstructionSetInfo("base", "X86Base", InstructionSet.X86_X86Base, true);
yield return new InstructionSetInfo("sse", "Sse", InstructionSet.X86_SSE, true);
yield return new InstructionSetInfo("sse2", "Sse2", InstructionSet.X86_SSE2, true);
yield return new InstructionSetInfo("sse3", "Sse3", InstructionSet.X86_SSE3, true);
yield return new InstructionSetInfo("ssse3", "Ssse3", InstructionSet.X86_SSSE3, true);
yield return new InstructionSetInfo("sse4.1", "Sse41", InstructionSet.X86_SSE41, true);
yield return new InstructionSetInfo("sse4.2", "Sse42", InstructionSet.X86_SSE42, true);
yield return new InstructionSetInfo("avx", "Avx", InstructionSet.X86_AVX, true);
yield return new InstructionSetInfo("avx2", "Avx2", InstructionSet.X86_AVX2, true);
yield return new InstructionSetInfo("aes", "Aes", InstructionSet.X86_AES, true);
yield return new InstructionSetInfo("bmi", "Bmi1", InstructionSet.X86_BMI1, true);
yield return new InstructionSetInfo("bmi2", "Bmi2", InstructionSet.X86_BMI2, true);
yield return new InstructionSetInfo("fma", "Fma", InstructionSet.X86_FMA, true);
yield return new InstructionSetInfo("lzcnt", "Lzcnt", InstructionSet.X86_LZCNT, true);
yield return new InstructionSetInfo("pclmul", "Pclmulqdq", InstructionSet.X86_PCLMULQDQ, true);
yield return new InstructionSetInfo("vpclmul", "Pclmulqdq_V256", InstructionSet.X86_PCLMULQDQ_V256, true);
yield return new InstructionSetInfo("vpclmul_v512", "Pclmulqdq_V512", InstructionSet.X86_PCLMULQDQ_V512, true);
yield return new InstructionSetInfo("popcnt", "Popcnt", InstructionSet.X86_POPCNT, true);
yield return new InstructionSetInfo("Vector128", "", InstructionSet.X86_Vector128, false);
yield return new InstructionSetInfo("Vector256", "", InstructionSet.X86_Vector256, false);
yield return new InstructionSetInfo("Vector512", "", InstructionSet.X86_Vector512, false);
yield return new InstructionSetInfo("avxvnni", "AvxVnni", InstructionSet.X86_AVXVNNI, true);
yield return new InstructionSetInfo("movbe", "Movbe", InstructionSet.X86_MOVBE, true);
yield return new InstructionSetInfo("serialize", "X86Serialize", InstructionSet.X86_X86Serialize, true);
yield return new InstructionSetInfo("evex", "EVEX", InstructionSet.X86_EVEX, true);
yield return new InstructionSetInfo("avx512f", "Avx512F", InstructionSet.X86_AVX512F, true);
yield return new InstructionSetInfo("avx512f_vl", "Avx512F_VL", InstructionSet.X86_AVX512F_VL, true);
yield return new InstructionSetInfo("avx512bw", "Avx512BW", InstructionSet.X86_AVX512BW, true);
yield return new InstructionSetInfo("avx512bw_vl", "Avx512BW_VL", InstructionSet.X86_AVX512BW_VL, true);
yield return new InstructionSetInfo("avx512cd", "Avx512CD", InstructionSet.X86_AVX512CD, true);
yield return new InstructionSetInfo("avx512cd_vl", "Avx512CD_VL", InstructionSet.X86_AVX512CD_VL, true);
yield return new InstructionSetInfo("avx512dq", "Avx512DQ", InstructionSet.X86_AVX512DQ, true);
yield return new InstructionSetInfo("avx512dq_vl", "Avx512DQ_VL", InstructionSet.X86_AVX512DQ_VL, true);
yield return new InstructionSetInfo("avx512vbmi", "Avx512Vbmi", InstructionSet.X86_AVX512VBMI, true);
yield return new InstructionSetInfo("avx512vbmi_vl", "Avx512Vbmi_VL", InstructionSet.X86_AVX512VBMI_VL, true);
yield return new InstructionSetInfo("avx10v1", "Avx10v1", InstructionSet.X86_AVX10v1, true);
yield return new InstructionSetInfo("avx10v1_v512", "Avx10v1_V512", InstructionSet.X86_AVX10v1_V512, true);
yield return new InstructionSetInfo("vectort128", "VectorT128", InstructionSet.X86_VectorT128, true);
yield return new InstructionSetInfo("vectort256", "VectorT256", InstructionSet.X86_VectorT256, true);
yield return new InstructionSetInfo("vectort512", "VectorT512", InstructionSet.X86_VectorT512, true);
yield return new InstructionSetInfo("apx", "Apx", InstructionSet.X86_APX, true);
yield return new InstructionSetInfo("avx10v2", "Avx10v2", InstructionSet.X86_AVX10v2, true);
yield return new InstructionSetInfo("avx10v2_v512", "Avx10v2_V512", InstructionSet.X86_AVX10v2_V512, true);
yield return new InstructionSetInfo("gfni", "Gfni", InstructionSet.X86_GFNI, true);
yield return new InstructionSetInfo("gfni_v256", "Gfni_V256", InstructionSet.X86_GFNI_V256, true);
yield return new InstructionSetInfo("gfni_v512", "Gfni_V512", InstructionSet.X86_GFNI_V512, true);
break;
}

Previously, a 64-bit nested class, e.g. Sse41.X64 would have passed through the logic for the x86 platform unchanged (i.e. as X64) and would not be found in the list, so it would get a constant return false IL body. The change resulted in the same lookup becoming Sse41_X64, which is also not in the list, so it would also result in constant false.

The fixup for 64-bit platforms is there because the _X64 ISAs are also not in the list for 64-bit platforms, so we map to the parent ISA to look for support.

Anyway, I do see some failures from illegal instruction encoding for GFNI instructions, which landed a day or two after the mentioned PR and was also mine. I'll see what those are and put up a fix.

@saucecontrol
Copy link
Member

Looks like the scenario that's failing is that Gfni.IsSupported is correctly returning false on the CI hardware but then the Gfni methods are still emitting code rather than throwing PNSE.

@saucecontrol
Copy link
Member

And actually, now that I think about it, that's the correct behavior for NAOT. Since we add gfni to the opportunistic instruction set list, we make IsSupported a dynamic check and then emit the instructions wherever they're used so that they can light up.

The easy fix is to remove gfni from the opportunistic list.

The comments indicate that list is meant to be things that are highly likely to exist on most hardware, which GFNI currently is not. The interesting thing is, if we were to test on hardware that lacked support for any of the other opportunistic ISAs, we'd see exactly the same failures.

// We set these hardware features as opportunistically enabled as most of hardware in the wild supports them.
// Note that we do not indicate support for AVX, or any other instruction set which uses the VEX encodings as
// the presence of those makes otherwise acceptable code be unusable on hardware which does not support VEX encodings.
//
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("sse4.2"); // Lower SSE versions included by implication
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("aes");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("pclmul");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("movbe");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("popcnt");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("lzcnt");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("serialize");
optimisticInstructionSetSupportBuilder.AddSupportedInstructionSet("gfni");

@teo-tsirpanis
Copy link
Contributor

serialize is also pretty new BTW (added in Alder Lake).

@saucecontrol
Copy link
Member

serialize is also pretty new BTW (added in Alder Lake).

It looks like serialize is probably interpreted as nop on non-supporting hardware, so that wouldn't have surfaced the issue.

@saucecontrol
Copy link
Member

Ah, I see the bug now. Nested classes don't have a namespace, so the X64 lookup in the original code would bail out on the namespace check and never look up the name in the dictionary. The updated logic walks all the way up to the topmost containing class before checking the namespace, so it does look up the [ISA]_X64 key, which isn't found.

Seems we need both this one and #110250

The special handling for the 64-bit ISA variants is confusing, where they're hidden sometimes and explicit sometimes. If it's ok, I'd like to take a look at cleaning that up, along with all the other implied ISAs that are currently leaking into the --instruction-set list.

@MichalStrehovsky
Copy link
Member Author

Seems we need both this one and #110250

Yes, your PR is likely to fix #110240 (I filed that one as a bug since I didn't know where the fix would be).

The special handling for the 64-bit ISA variants is confusing, where they're hidden sometimes and explicit sometimes. If it's ok, I'd like to take a look at cleaning that up, along with all the other implied ISAs that are currently leaking into the --instruction-set list.

I would welcome anything that simplifies this since I don't understand any of this, but cc @tannergooding on that.

@MichalStrehovsky
Copy link
Member Author

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member Author

@tannergooding could you have a look when you get a chance?

@tannergooding
Copy link
Member

The special handling for the 64-bit ISA variants is confusing, where they're hidden sometimes and explicit sometimes. If it's ok, I'd like to take a look at cleaning that up, along with all the other implied ISAs that are currently leaking into the --instruction-set list.

Also happy for things to be simplified here. I believe we have a tracking issue already discussing some of the leakage and ways we’d like to clean it up

@MichalStrehovsky MichalStrehovsky merged commit 12afded into dotnet:main Dec 3, 2024
95 of 110 checks passed
@MichalStrehovsky MichalStrehovsky deleted the fixhwinstr branch December 3, 2024 08:00
eduardo-vp pushed a commit to eduardo-vp/runtime that referenced this pull request Dec 5, 2024
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this pull request Dec 10, 2024
AFAIK methods on the nested X64/Arm64 classes shouldn't be considered intrinsics on 32bit platforms since they are as relevant as e.g. WASM intrinsics. This should fix widespread runtime-nativeaot-outerloop failure on x86. I think this regressed in dotnet#109137.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants