Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/x86/adler32: add an AVX-512 implementation #342

Merged
merged 4 commits into from
Feb 24, 2024
Merged

lib/x86/adler32: add an AVX-512 implementation #342

merged 4 commits into from
Feb 24, 2024

Conversation

ebiggers
Copy link
Owner

@ebiggers ebiggers commented Feb 21, 2024

  • ci.yml: ensure MSVC can find zlib
  • lib/x86: disambiguate 512-bit vector from AVX-512F
  • lib/x86: fix XCR0 check for AVX-512VL
  • lib/x86/adler32: add an AVX-512 implementation

@ebiggers ebiggers force-pushed the dev branch 5 times, most recently from 4f5c559 to a327744 Compare February 22, 2024 08:17
@ebiggers ebiggers changed the title lib/x86/adler32: add back an AVX-512BW implementation lib/x86/adler32: add an AVX-512 implementation Feb 22, 2024
@ebiggers ebiggers force-pushed the dev branch 2 times, most recently from 5bb78a5 to d44bb93 Compare February 22, 2024 08:24
crc32_x86_vpclmulqdq_avx512vl and crc32_x86_vpclmulqdq_avx512f_avx512vl
actually use the same CPU features, considering that vpternlog always
requires at least avx512f, and compilers consider avx512vl to imply
avx512f.  Rename them to *_avx512_vl256 and *_avx512_vl512 to reflect
that they differ only in vector length, and fix the CPU feature checking
to use a separate flag for whether 512-bit vectors are enabled.
According to the Intel manual, the ZMM_Hi256 bit needs to be checked for
all AVX-512 instructions, even if 512-bit vectors aren't being used.
libdeflate used to (before commit 416bac3) have an AVX512BW
implementation of Adler-32, but I removed it due to AVX-512's
downclocking issues.  Since then, newer Intel and AMD CPUs have come out
with better AVX-512 implementations, and these CPUs tend to have
AVX512VNNI which includes a dot product instruction which is useful for
Adler-32.  Therefore, add an AVX512VNNI/AVX512BW implementation.
@ebiggers ebiggers merged commit a026a04 into master Feb 24, 2024
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant