-
Notifications
You must be signed in to change notification settings - Fork 173
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- As was recently done in the Adler-32 code, take advantage of the fact that on recent x86 processors, vmovdqu with an aligned pointer is just as fast as vmovdqa. Don't waste time aligning the pointer unless the length is very large, and at the same time, handle all cases of len >= 8*VL using the main loop so that the 4*VL wide loop isn't needed. (Before, aligning the pointer was tied to whether the main loop was used or not, since the main loop used vmovdqa.) - Handle short lengths more efficiently. Instead of falling back to crc32_slice1() for all len < VL, use AVX-512 masking (when available) to handle 4 <= len <= 15, and use 128-bit vector instructions to handle 16 <= len < VL. - Document why the main loop uses a width of 8*VL instead of 4*VL.
- Loading branch information
Showing
5 changed files
with
148 additions
and
137 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.