-
Notifications
You must be signed in to change notification settings - Fork 18
Q&A
You provide the transform using the -t (or --transform=) command line option followed by the transform(s). Provide -e (--entropy=)
Example: -t TEXT or -t RLT+TEXT+UTF+LZ
Provide -e (--entropy=) on the command line followed by the codec of your choice:
Example: -e ANS1
By default, kanzi detects the number of cores in the CPU and uses half of the cores. The maximum number of parallel jobs allowed is hard coded to 64.
Providing -j 1 on the command line makes (de)compression use one core.
Providing -j 0 on the command line makes (de)compression use all available cores.
Yes, if the input source provided on the command line is a directory, all files under that folder are going be recursively processed.
The files will be processed in parallel if more than one core is available.
To avoid recursion and process only the top level folder, use a dot syntax:
EG. -i ~/myfolder/. on Linux
EG. -i c:\users\programs\. on Windows
Yes, to avoid processing link files, add this option to the command line --no-link
To avoid processing dot files, add this option to the command line --no-dot-file
Yes, one way to do it is to use STDIN/STDOUT as input/output on the command line:
gunzip /tmp/kanzi.1.gz | java -jar kanzi.jar -c -i stdin -l 2 -o /tmp/kanzi.1.knz
java -jar kanzi.jar -d -i /tmp/silesia.tar.knz -o stdout | tar -xf -
Or, using redirections,
java -jar kanzi.jar -c -f -l 2 < /tmp/enwik8 > /tmp/enwik8.knz
If -i is absent from the command line, the data is assumed to come from STDIN and go to STDOUT. Another example (processing a 0 length pseudo-file !):
cat /proc/stat | java -jar kanzi.jar -c -i stdin -l 0 -o /tmp/stat.knz
java -jar kanzi.jar -d -i /tmp/stat.knz -o stdout
Notice that, during compression, kanzi stores the size of the input file (when it is available) so that the decompressor can verify the output size after decompression. The original size is also used by the decompressor to optimize internal resources. Thus, providing -i and -o is recommended over redirection.
Yes, it is possible to decompress only one or a sequence of consecutive blocks by using the --from and --to options during decompression.
java -jar kanzi.jar -d -i /tmp/book1.knz -v 4 -f
Block 1: 34451 => 36530 [0 ms] => 65536 [0 ms]
Block 2: 33295 => 35330 [0 ms] => 65536 [0 ms]
Block 3: 33702 => 35807 [0 ms] => 65536 [0 ms]
Block 4: 33555 => 35502 [0 ms] => 65536 [0 ms]
Block 5: 34057 => 36065 [0 ms] => 65536 [0 ms]
Block 6: 33556 => 35622 [0 ms] => 65536 [0 ms]
Block 7: 33357 => 35167 [0 ms] => 65536 [0 ms]
Block 8: 33460 => 35446 [0 ms] => 65536 [0 ms]
Block 9: 33428 => 35431 [0 ms] => 65536 [0 ms]
Block 10: 33177 => 35180 [0 ms] => 65536 [0 ms]
Block 11: 33218 => 35156 [0 ms] => 65536 [0 ms]
Block 12: 24871 => 26246 [0 ms] => 47875 [0 ms]
Decompressing: 1 ms
Input size: 394176
Output size: 768771
Throughput (KB/s): 750752
java -jar kanzi.jar -d -i /tmp/book1.knz -v 4 -f --from=4 --to=10
Block 4: 33555 => 35502 [0 ms] => 65536 [0 ms]
Block 5: 34057 => 36065 [0 ms] => 65536 [0 ms]
Block 6: 33556 => 35622 [0 ms] => 65536 [0 ms]
Block 7: 33357 => 35167 [0 ms] => 65536 [0 ms]
Block 8: 33460 => 35446 [0 ms] => 65536 [0 ms]
Block 9: 33428 => 35431 [0 ms] => 65536 [0 ms]
Decompressing: 1 ms
Input size: 394176
Output size: 393216
Throughput (KB/s): 384000
Yes, just use a combination of options (verbosity, from and to):
java -jar kanzi.jar -d -i /tmp/silesia.tar.knz -f -v 3 --from=1 --to=1
1 file to decompress
Verbosity: 3
Overwrite: true
Using 4 jobs
Input file name: '/tmp/silesia.tar.knz'
Output file name: '/tmp/silesia.tar.knz.bak'
Decompressing /tmp/silesia.tar.knz ...
Bitstream version: 5
Checksum: false
Block size: 4194304 bytes
Using HUFFMAN entropy codec (stage 1)
Using PACK+LZ transform (stage 2)
Original size: 211957760 bytes
Decompressing: 17 ms
Input size: 68350949
Output size: 0
Throughput (KB/s): 0
- The bitstream header is CRC checked during decompression.
- All transforms sanitize parameters coming from the bitstream during decompression.
- The decompressor checks the size of the output file against the original size stored in the bitstream (when available).
- A 32 bit CRC is stored for each block when the -x/--checksum command line option is provided.
There is no hash for the whole original file in the bitstream. However, adding a hash is possible with the following trick:
# compress and append MD5 of original file
java -jar kanzi.jar -c -i log -l 2
md5sum log | cut -d " " -f 1 >> log.knz
# decompress
java -jar kanzi.jar -d -i log.knz
# check MD5 of decompressed file vs stored one
tail -c 33 log.knz
be8ddef3d35483622f2fab9a4f812040
md5sum log.knz.bak | cut -d " " -f 1
be8ddef3d35483622f2fab9a4f812040
Yes, the bitstream version is part of the bitstream header and is used during decompression to ensure that old versions can be decompressed.