Just one of the things I'm learning. https://github.com/hchiam/learning
https://hackernoon.com/unixs-lzw-compression-algorithm-how-does-it-work-cp65347h which in turn adapts a version of the code here: https://github.com/adityagupta3006/LZW-Compressor-in-Python?ref=hackernoon.com
"Unlike Huffman Encoding, LZW doesn’t require this dictionary to be included in the compressed data. One of the unique characteristics of this algorithm is that this dictionary can be rebuilt while uncompressing the data, ... with the 256 ASCII codes as a preliminary step." LZW can be implemented with variable-width encoding. A maximum width would then be used to put a cap on memory consumption, so once the dictionary is "filled", no more words can be added. LZW works better on data that has redundancy.
def lzw(input: array):
output = compressed_data
dictionary.pre_filled_with(basic_ascii_chars)
for word in input:
if word in dictionary:
output.add(dictionary[word])
elif dictionary.is_full():
output.add(word)
else: # not in dictionary and dictionary has space
dictionary.add(word)
output.add(dictionary[word])
return output
python encode_lzw.py # example_text.txt -> example_text.lzw
python decode_lzw.py # example_text.lzw -> example_text_decoded.txt
bash check_diffs.sh