Performance Improvements #7

emilypi · 2020-02-02T17:36:09Z

This issue is a catch-all for performance improvements, for which help is wanted. I can list some off the top of my head:

The Text conversions via decodeUtf8 in encodeBase64 is unnecessary. We've already decoded to an alphabet we know is safe for decoding, so the conversion can be streamlined.
Inline the lookup tables as static arrays so we can stop building them with every encode/decode. (Closed in Factor Out Lookup Tables As C-FFI #9 - not faster than pure haskell)
The inner loop can be optimized in terms of Aklomp's algorithm. We should be able to read off hunks of Word32 or Word64 from the input, convert to big endian, and do things with single-read, 4 bitshift instructions per round. (Done as of Optimize Loops for 8/16, 32 and 64-bit Words #8)
Inner loops can pack words (performance pending) and do a single write in the unpadded case as a large word to the output pointer, eliminating unnecessary writes per iteration. (Done as of Word-Packing Optimizations #10)
Data.ByteString.Short-optimized inner-loops. See: https://github.com/emilypi/Base16/blob/master/src/Data/ByteString/Base16/Internal/W16/ShortLoop.hs
SSE/AVX2 vectorization is possible via Aklomp's library. I have no experience with this. Help please!

The text was updated successfully, but these errors were encountered:

emilypi added help wanted Extra attention is needed enhancement New feature or request labels Feb 5, 2020

Provide feedback