Extend use of ps instructions to AVX-1 and SSE functions #9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a follow-up to #7 and #8 in which we started using vxorps instead of vpxor in the AVX functions in order to be compatible with hardware that supports AVX-1, but not AVX-2.
We now do this for SSE and AVX-1 functions. We do not do this for the AVX-512 functions.
Here is the compatibility breakdown of float versus integer vector ops per size:
128-bit:
xorps
is SSE,pxor
is SSE2.Moving to ps here allows us to technically support the Pentium 3, which is clearly the most
important hardware target for any application in 2020.
256-bit:
vxorps
is AVX-1,vpxor
is AVX-2.Supporting AVX here actually allows us to support Sandy Bridge (2011) up to Haswell (2013),
which is a decent amount of processors from the past decade. (I still have a Sandy Bridge chip myself!)
512-bit:
vxorps
is AVX512DQ,vpxord
is AVX512F.AVX512F actually stands for "AVX-512 Foundation" and is the base extension, so here, the
si512 intrinsics actually support more hardware than the ps ones.
This is the reason why we did not use ps for the AVX-512 functions, although to be honest I am not quite sure how much hardware supports AVX512F, but not AVX512DQ.