Skip to content

Idea for faster Huffman decoding? #28

@mreineck

Description

@mreineck

I just looked at the numba-accelerated Huffman decoder and had an idea to potentially speed it up even more ...

Currently the stream is read bit by bit, traversing the Huffman tree accordingly. How about building a lookup table for all possible sequences of, say, 16 bits (i.e. 65536 entries), where each of the entries contains

  • a list of the values for all complete Huffman keys in those 16 bits
  • the number of "leftover" bits
  • the tree node where one would be after processing all 16 bits

If the returned list is not empty, one

  • attaches the values in the list to the output vector
  • shifts the 16 bits by 16-leftover
  • gets the next 16-leftover bits from the compressed data stream
  • starts again

The only exception would be if a Huffman key is longer than 16 bits: in this case one will need to fetch the remaining bits individually until the key is complete. (This should happen quite rarely.) Afterwards the whole process starts again.

Memory overhead would be fairly small for a 16-bit lookup table (a few MB), and the speedup should be substantial (factor of a few).

What do you think, @jgslunde, @unfunfunt ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions