-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I just looked at the numba-accelerated Huffman decoder and had an idea to potentially speed it up even more ...
Currently the stream is read bit by bit, traversing the Huffman tree accordingly. How about building a lookup table for all possible sequences of, say, 16 bits (i.e. 65536 entries), where each of the entries contains
- a list of the values for all complete Huffman keys in those 16 bits
- the number of "leftover" bits
- the tree node where one would be after processing all 16 bits
If the returned list is not empty, one
- attaches the values in the list to the output vector
- shifts the 16 bits by 16-leftover
- gets the next 16-leftover bits from the compressed data stream
- starts again
The only exception would be if a Huffman key is longer than 16 bits: in this case one will need to fetch the remaining bits individually until the key is complete. (This should happen quite rarely.) Afterwards the whole process starts again.
Memory overhead would be fairly small for a 16-bit lookup table (a few MB), and the speedup should be substantial (factor of a few).
What do you think, @jgslunde, @unfunfunt ?