Extend use of ps instructions to AVX-1 and SSE functions #9

JimmyLefevre · 2020-11-03T23:52:19Z

This is a follow-up to #7 and #8 in which we started using vxorps instead of vpxor in the AVX functions in order to be compatible with hardware that supports AVX-1, but not AVX-2.

We now do this for SSE and AVX-1 functions. We do not do this for the AVX-512 functions.

Here is the compatibility breakdown of float versus integer vector ops per size:

128-bit: xorps is SSE, pxor is SSE2.
Moving to ps here allows us to technically support the Pentium 3, which is clearly the most
important hardware target for any application in 2020.
256-bit: vxorps is AVX-1, vpxor is AVX-2.
Supporting AVX here actually allows us to support Sandy Bridge (2011) up to Haswell (2013),
which is a decent amount of processors from the past decade. (I still have a Sandy Bridge chip myself!)
512-bit: vxorps is AVX512DQ, vpxord is AVX512F.
AVX512F actually stands for "AVX-512 Foundation" and is the base extension, so here, the
si512 intrinsics actually support more hardware than the ps ones.

This is the reason why we did not use ps for the AVX-512 functions, although to be honest I am not quite sure how much hardware supports AVX512F, but not AVX512DQ.

This is a follow-up to cmuratori#7 and cmuratori#8 in which we started using vxorps instead of vpxor in the AVX functions in order to be compatible with hardware that supports AVX-1, but not AVX-2. We now do this for SSE and AVX-1 functions. We do not do this for the AVX-512 functions. Here is the compatibility breakdown of float versus integer vector ops per size: - 128-bit: `xorps` is SSE, `pxor` is SSE2. Moving to ps here allows us to technically support the Pentium 3, which is clearly the most important hardware target for any application in 2020. - 256-bit: `vxorps` is AVX-1, `vpxor` is AVX-2. Supporting AVX here actually allows us to support Sandy Bridge (2011) up to Haswell (2013), which is a decent amount of processors from the past decade. (I still have a Sandy Bridge chip myself.) - 512-bit: `vxorps` is AVX512DQ, `vpxord` is AVX512F. AVX512F actually stands for "AVX-512 Foundation" and is the base extension, so here, the si512 intrinsics actually support _more_ hardware than the ps ones. (Note: I am not sure exactly how much hardware supports AVX512F, but doesn't support AVX512DQ.) This is the reason why we did not use ps for the AVX-512 functions.

JimmyLefevre · 2020-11-04T11:43:34Z

x64_blandwidth.c

@@ -1,331 +1,330 @@
 /* ========================================================================
   $File: work/tools/blandwidth/x64_blandwidth.c $
   $Date: 2020/06/16 21:46:28 UTC $
-   $Revision: 1 $
+   $Revision: 3 $


I forgot to update the revision in the last pull request, so I incremented it by 2.

JimmyLefevre force-pushed the master branch from c42e18d to 60827ce Compare November 4, 2020 11:42

JimmyLefevre commented Nov 4, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend use of ps instructions to AVX-1 and SSE functions #9

Extend use of ps instructions to AVX-1 and SSE functions #9

JimmyLefevre commented Nov 3, 2020

JimmyLefevre Nov 4, 2020

Extend use of ps instructions to AVX-1 and SSE functions #9

Are you sure you want to change the base?

Extend use of ps instructions to AVX-1 and SSE functions #9

Conversation

JimmyLefevre commented Nov 3, 2020

JimmyLefevre Nov 4, 2020

Choose a reason for hiding this comment