Idea for faster Huffman decoding?

I just looked at the numba-accelerated Huffman decoder and had an idea to potentially speed it up even more ...

Currently the stream is read bit by bit, traversing the Huffman tree accordingly. How about building a lookup table for all possible sequences of, say, 16 bits (i.e. 65536 entries), where each of the entries contains
- a list of the values for all *complete* Huffman keys in those 16 bits
- the number of "leftover" bits
- the tree node where one would be after processing all 16 bits

If the returned list is not empty, one
- attaches the values in the list to the output vector
- shifts the 16 bits by 16-leftover
- gets the next 16-leftover bits from the compressed data stream
- starts again

The only exception would be if a Huffman key is longer than 16 bits: in this case one will need to fetch the remaining bits individually until the key is complete. (This should happen quite rarely.) Afterwards the whole process starts again.

Memory overhead would be fairly small for a 16-bit lookup table (a few MB), and the speedup should be substantial (factor of a few).

What do you think, @jgslunde, @unfunfunt ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea for faster Huffman decoding? #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idea for faster Huffman decoding? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions