Possible data loss at stream end in -w4 -l2 #16

silentbicycle · 2015-05-24T13:24:13Z

I got an excellent bug report from @unixdj with detailed, minimal steps to reproduce data loss at the end of the stream:

$ echo -n aaaa | ./heatshrink -e -w4 -l2 | ./heatshrink -d -w4 -l2
a   # should be "aaaa"

A regression test has been added.

This is not currently known to occur under any other combinations of settings except -w4 -l2, the absolute minimum.

A change that fixes this without breaking reverse compatibility would be strongly preferred.

The text was updated successfully, but these errors were encountered:

silentbicycle · 2015-05-24T14:05:08Z

The encoder is correct, the problem is in the decoder. The tag bit (1) plus the window size (4) plus the lookahead size (2) is only 7 bits, and the decoder can end up in a state where it has 7 bits of unprocessed data, but the end of input stream takes precedence and it prematurely indicates that it has completed.

This will not happen under any other configuration because everything else will use at least 8 bits, causing things to be buffered correctly.

silentbicycle · 2015-05-24T14:24:17Z

I'm considering closing this by increasing the minimum lookahead size to 3, rather than adding special cases. Thoughts, @unixdj?

silentbicycle · 2015-05-24T20:25:49Z

Resolved by 15ebadd . The minimum lookahead needs to be increased to 3, because otherwise a state can occur at the end of the input whose interpretation is ambiguous, leading to either a few bytes being discarded or the last byte being repeated incorrectly.

unixdj · 2015-05-24T21:08:44Z

Increasing minimum lookahead size fixes the issue, of course.

The problem with the encoder is that it encodes "a" as [0xb0 0x80], which, if the decoder is fixed to handle the case above, may be interpreted by the decoder as:

1        tag bit: literal follows
01100001 'a'
0        tag bit: backref
0000     backref index 1
00       backref count 1

and decoded as "aa". The encoder, however, would never produce such stream (though it would make sense to). So in order to handle this correctly, there are two solutions:

(a) Make the decoder disregard backreferences shorter than the encoder would produce.
(b) Make the encoder fill the empty bits in the last byte with ones instead of zeroes, so that the decoder would finish in the HSDS_YIELD_LITERAL state.

I would say that option (b) seems most conceptuallly sound.

silentbicycle · 2015-05-24T21:31:16Z

Indeed. I discovered that "aa" issue while investigating the root cause of the original issue, as well.

(b) seems like a good choice, and IIRC you've already done some investigation along those lines. If that covers all the cases, then the minimum length won't need to be increased after all.

Option (a) reminded me that there may be an opportunity to improve the compression ratio slightly (breaking reverse-compatibility), added as issue #17.

unixdj · 2015-05-24T23:14:29Z

Yes, I've tried (b), and I reckon it covers all the corner cases. Here's the code:

master...unixdj:eof42

silentbicycle added the bug label May 24, 2015

silentbicycle closed this as completed May 24, 2015

silentbicycle reopened this May 24, 2015

silentbicycle added this to the v0.4.0 milestone Jun 7, 2015

silentbicycle closed this as completed in de92281 Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible data loss at stream end in -w4 -l2 #16

Possible data loss at stream end in -w4 -l2 #16

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

unixdj commented May 24, 2015

silentbicycle commented May 24, 2015

unixdj commented May 24, 2015

Possible data loss at stream end in -w4 -l2 #16

Possible data loss at stream end in -w4 -l2 #16

Comments

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

silentbicycle commented May 24, 2015

unixdj commented May 24, 2015

silentbicycle commented May 24, 2015

unixdj commented May 24, 2015