You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is a catch-all for performance improvements, for which help is wanted. I can list some off the top of my head:
The Text conversions via decodeUtf8 in encodeBase64 is unnecessary. We've already decoded to an alphabet we know is safe for decoding, so the conversion can be streamlined.
Inline the lookup tables as static arrays so we can stop building them with every encode/decode. (Closed in Factor Out Lookup Tables As C-FFI #9 - not faster than pure haskell)
The inner loop can be optimized in terms of Aklomp's algorithm. We should be able to read off hunks of Word32 or Word64 from the input, convert to big endian, and do things with single-read, 4 bitshift instructions per round. (Done as of Optimize Loops for 8/16, 32 and 64-bit Words #8)
Inner loops can pack words (performance pending) and do a single write in the unpadded case as a large word to the output pointer, eliminating unnecessary writes per iteration. (Done as of Word-Packing Optimizations #10)
This issue is a catch-all for performance improvements, for which help is wanted. I can list some off the top of my head:
The
Text
conversions viadecodeUtf8
inencodeBase64
is unnecessary. We've already decoded to an alphabet we know is safe for decoding, so the conversion can be streamlined.Inline the lookup tables as static arrays so we can stop building them with every encode/decode. (Closed in Factor Out Lookup Tables As C-FFI #9 - not faster than pure haskell)
The inner loop can be optimized in terms of Aklomp's algorithm. We should be able to read off hunks of
Word32
orWord64
from the input, convert to big endian, and do things with single-read, 4 bitshift instructions per round. (Done as of Optimize Loops for 8/16, 32 and 64-bit Words #8)Inner loops can pack words (performance pending) and do a single write in the unpadded case as a large word to the output pointer, eliminating unnecessary writes per iteration. (Done as of Word-Packing Optimizations #10)
Data.ByteString.Short
-optimized inner-loops. See: https://github.com/emilypi/Base16/blob/master/src/Data/ByteString/Base16/Internal/W16/ShortLoop.hsSSE/AVX2 vectorization is possible via Aklomp's library. I have no experience with this. Help please!
The text was updated successfully, but these errors were encountered: