Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend use of ps instructions to AVX-1 and SSE functions #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JimmyLefevre
Copy link
Contributor

This is a follow-up to #7 and #8 in which we started using vxorps instead of vpxor in the AVX functions in order to be compatible with hardware that supports AVX-1, but not AVX-2.

We now do this for SSE and AVX-1 functions. We do not do this for the AVX-512 functions.

Here is the compatibility breakdown of float versus integer vector ops per size:

  • 128-bit: xorps is SSE, pxor is SSE2.
    Moving to ps here allows us to technically support the Pentium 3, which is clearly the most
    important hardware target for any application in 2020.

  • 256-bit: vxorps is AVX-1, vpxor is AVX-2.
    Supporting AVX here actually allows us to support Sandy Bridge (2011) up to Haswell (2013),
    which is a decent amount of processors from the past decade. (I still have a Sandy Bridge chip myself!)

  • 512-bit: vxorps is AVX512DQ, vpxord is AVX512F.
    AVX512F actually stands for "AVX-512 Foundation" and is the base extension, so here, the
    si512 intrinsics actually support more hardware than the ps ones.

This is the reason why we did not use ps for the AVX-512 functions, although to be honest I am not quite sure how much hardware supports AVX512F, but not AVX512DQ.

This is a follow-up to cmuratori#7 and cmuratori#8
in which we started using vxorps instead of vpxor in the AVX functions in order to be compatible with hardware that supports
AVX-1, but not AVX-2.

We now do this for SSE and AVX-1 functions. We do not do this for the AVX-512 functions.
Here is the compatibility breakdown of float versus integer vector ops per size:

- 128-bit: `xorps` is SSE, `pxor` is SSE2.
Moving to ps here allows us to technically support the Pentium 3, which is clearly the most
important hardware target for any application in 2020.

- 256-bit: `vxorps` is AVX-1, `vpxor` is AVX-2.
Supporting AVX here actually allows us to support Sandy Bridge (2011) up to Haswell (2013),
which is a decent amount of processors from the past decade. (I still have a Sandy Bridge chip myself.)

- 512-bit: `vxorps` is AVX512DQ, `vpxord` is AVX512F.
AVX512F actually stands for "AVX-512 Foundation" and is the base extension, so here, the
si512 intrinsics actually support _more_ hardware than the ps ones.
(Note: I am not sure exactly how much hardware supports AVX512F, but doesn't support AVX512DQ.)

This is the reason why we did not use ps for the AVX-512 functions.
@@ -1,331 +1,330 @@
/* ========================================================================
$File: work/tools/blandwidth/x64_blandwidth.c $
$Date: 2020/06/16 21:46:28 UTC $
$Revision: 1 $
$Revision: 3 $
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to update the revision in the last pull request, so I incremented it by 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant