Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add predefined cpu names for --instruction-set (e.g. haswell) #71911

Merged
merged 30 commits into from
Jul 13, 2022

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 10, 2022

This PR adds a few known CPUs as named groups of instructions sets. A similar approach is used in Clang/LLVM (e.g. -march=skylake). It allows us to quickly specify instruction sets instead of listing all of them, e.g.:

--instruction-set skylake

which is a sort of "generate me a binary for any modern x86 cpu". Instead of what we do today:

--instruction-set avx2,+bmi,+fma,+lzcnt,+pclmul,+popcnt,+movbe

and the following functions:

bool _f1, _f2;

void test_Sse41()
{
    _f1 = Sse41.IsSupported;
    _f2 = Sse41.X64.IsSupported;
}
void test_Avx()
{
    _f1 = Avx.IsSupported;
    _f2 = Avx.X64.IsSupported;
}
void test_Avx2()
{
    _f1 = Avx2.IsSupported;
    _f2 = Avx2.X64.IsSupported;
}
void test_Bmi1()
{
    _f1 = Bmi1.IsSupported;
    _f2 = Bmi1.X64.IsSupported;
}
void test_Bmi2()
{
    _f1 = Bmi2.IsSupported;
    _f2 = Bmi2.X64.IsSupported;
}

will be prejitted to:

; Assembly listing for method Program:test_Sse41():this
       C6410801             mov      byte  ptr [rcx+8], 1
       C6410901             mov      byte  ptr [rcx+9], 1
       C3                   ret

; Assembly listing for method Program:test_Avx():this
       C6410801             mov      byte  ptr [rcx+8], 1
       C6410901             mov      byte  ptr [rcx+9], 1
       C3                   ret

; Assembly listing for method Program:test_Avx2():this
       C6410801             mov      byte  ptr [rcx+8], 1
       C6410901             mov      byte  ptr [rcx+9], 1
       C3                   ret

; Assembly listing for method Program:test_Bmi1():this
       C6410801             mov      byte  ptr [rcx+8], 1
       C6410901             mov      byte  ptr [rcx+9], 1
       C3                   ret

; Assembly listing for method Program:test_Bmi2():this
       C6410800             mov      byte  ptr [rcx+8], 0
       C6410900             mov      byte  ptr [rcx+9], 0
       C3                   ret

TODO: Implement native group name, but it's a way more complicated, not in this PR.

cc @dotnet/crossgen-contrib, @dotnet/ilc-contrib @tannergooding

@MichalStrehovsky
Copy link
Member

Both ILC and crossgen can print out all allowed values for instruction-set (the ILC one starts around here:

foreach (string arch in ValidArchitectures)
). I would be nice if we could print out the supported macros too.

@am11
Copy link
Member

am11 commented Jul 11, 2022

How was the set of (only 7) targets selected; based on their popularity or some other metric?

For reference (x86):

-march -mtune
clang-14 (56)
nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4
(82)
i386, i486, winchip-c6, winchip2, c3, i586, pentium, pentium-mmx, pentiumpro, i686, pentium2, pentium3, pentium3m, pentium-m, c3-2, yonah, pentium4, pentium4m, prescott, nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, lakemont, k6, k6-2, k6-3, athlon, athlon-tbird, athlon-xp, athlon-mp, athlon-4, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, geode
gcc-12 (97)
i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 btver1 btver2 generic native
(40)
generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake intel geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2

@EgorBo
Copy link
Member Author

EgorBo commented Jul 11, 2022

How was the set of (only 7) targets selected; based on their popularity or some other metric?

Yeah, just popularity from my opinion, I've seen mostly generic, ivybridge (it's when you don't target avx2), haswell and skylake (which is almost the same as haswell) in the C++ world.

But I can add more, just didn't want to display a lot of irrelevant stuff in the help such as e.g. opteron-sse3 🙂
Users mostly want to target something popular such as Skylake (which is by the way is a bottom-line for Windows-11) or use whatever is default to cover everyone.

Another story is avx512 but we don't support that (except AvxVnni)

@EgorBo
Copy link
Member Author

EgorBo commented Jul 12, 2022

I would like that something like native or host macro would be introduced which match native architecture of the processor on which publishing happens.

@kant2002 I am leaving that up-for-grabs, it requires cpu-feature detection logic which is pretty bulky (for all platforms/archs)

@kant2002
Copy link
Contributor

@EgorBo having this up-for-grabs is find for me. Just want to make sure that this aligned with overall goals and would be valuable addition.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 12, 2022

@EgorBo having this up-for-grabs is find for me. Just want to make sure that this aligned with overall goals and would be valuable addition.

native was mentioned in the main description ;-)

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@EgorBo
Copy link
Member Author

EgorBo commented Jul 12, 2022

@tannergooding could you please re-review, did I address your feedback?

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. The current set looks correct to me according to the combination of official docs and what LLVM uses.

Happy for the crypto sets to be added if we talk with our friends at Intel/AMD/Arm first and get clarification.

@EgorBo EgorBo merged commit d544295 into dotnet:main Jul 13, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants