-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CanonicalCode::read_symbol()'s peeking doesn't work with UnsatisfiableReadBehavior::Reject #25005
Comments
...and then fail only when discard_previously_peeked_bits() more bits than we have. `unzip ~/Downloads/enwik8.zip`: 1.005s -> 1.050s 4.9% slowdown :/ Fixes SerenityOS#25005
The 2nd commit in https://github.com/SerenityOS/serenity/compare/master...nico:serenity:deflate-peek?expand=1 does the first idea now. It's a 5% slow-down, but in return it's correct :/ |
With that change, UnsatisfiableReadBehavior can be removed – turns out webp doesn't need any special handling here. Regular deflate just happened to work before since it the superfluous bits came for peeking came from the checksum that always follows the bitstream. |
#22130 provides additional information about the same issue or a related/similar problem.
The main problem I see with the first one is that something might start to act on the peeked bits already, which is probably to be avoided (even though I most likely wouldn't even catch that during a code review). Zero isn't always the do nothing/end case, as the fuzzer has successfully found. Therefore, I'm currently more inclined towards the second option, but I'll give it some thought. |
Symbols that need <= 8 bits hit a fast path as of SerenityOS#18075, but the slow path has done a full binary search over all symbols ever since this code was added in XXX. Instead of doing a binary search over all codes for every single bit read, this implements the Moffat-Turpin approach described at https://www.hanshq.net/zip.html#huffdec, which only requires a table read per bit. hyperfine 'Build/lagom/bin/unzip ~/Downloads/enwik8.zip' 1.008 s ± 0.016 s => 957.7 ms ± 3.9 ms, 5% faster Due to issue SerenityOS#25005, we can't peek the full 15 bits at once but have to read them one-by-one. This makes the code look a bit different than in the linked article. I also tried not changing CanonicalCode::from_bytes() too much. It does 16 passes over all symbols. I think it could do it in a single pass instead. But that's for a future change. No behavior change (other than slightly faster perf).
Symbols that need <= 8 bits hit a fast path as of SerenityOS#18075, but the slow path has done a full binary search over all symbols ever since this code was added in SerenityOS#2963. (SerenityOS#3405 even added a FIXME for doing this, but SerenityOS#18075 removed it.) Instead of doing a binary search over all codes for every single bit read, this implements the Moffat-Turpin approach described at https://www.hanshq.net/zip.html#huffdec, which only requires a table read per bit. hyperfine 'Build/lagom/bin/unzip ~/Downloads/enwik8.zip' 1.008 s ± 0.016 s => 957.7 ms ± 3.9 ms, 5% faster Due to issue SerenityOS#25005, we can't peek the full 15 bits at once but have to read them one-by-one. This makes the code look a bit different than in the linked article. I also tried not changing CanonicalCode::from_bytes() too much. It does 16 passes over all symbols. I think it could do it in a single pass instead. But that's for a future change. No behavior change (other than slightly faster perf).
For fast deflate implementations, you normally do something like:
The first, fast lookup table stores each entry many times: For example, if one symbol is six bits long, it it stores 4 entries, each consisting of the 6 bits the symbol takes up, combined with all 4 possible two-bit patterns to make up 8 bits (assuming the fast table has 8-bit entries. Ours does, cf If your bit stream keeps 64 bits in a word, then the peek is just a shift and mask, and that and the table lookup are your hot loop. The second option means adding conditionals and a bunch of other work in the hot loop, so I'm a bit worried about perf. (My local patch for the first option however also adds a check when accepting a symbol and slows things down. Maybe we should just always pass a slice that has some trailing padding data as deflate input, so that we never run into this. Often, that's easy to do, but e.g. webp's VP8L basic lossless format has no naturally occurring trailing data.) |
Symbols that need <= 8 bits hit a fast path as of SerenityOS#18075, but the slow path has done a full binary search over all symbols ever since this code was added in SerenityOS#2963. (SerenityOS#3405 even added a FIXME for doing this, but SerenityOS#18075 removed it.) Instead of doing a binary search over all codes for every single bit read, this implements the Moffat-Turpin approach described at https://www.hanshq.net/zip.html#huffdec, which only requires a table read per bit. hyperfine 'Build/lagom/bin/unzip ~/Downloads/enwik8.zip' 1.008 s ± 0.016 s => 957.7 ms ± 3.9 ms, 5% faster Due to issue SerenityOS#25005, we can't peek the full 15 bits at once but have to read them one-by-one. This makes the code look a bit different than in the linked article. I also tried not changing CanonicalCode::from_bytes() too much. It does 15 passes over all symbols. I think it could do it in a single pass instead. But that's for a future change. No behavior change (other than slightly faster perf).
#24567 uses the name |
Symbols that need <= 8 bits hit a fast path as of #18075, but the slow path has done a full binary search over all symbols ever since this code was added in #2963. (#3405 even added a FIXME for doing this, but #18075 removed it.) Instead of doing a binary search over all codes for every single bit read, this implements the Moffat-Turpin approach described at https://www.hanshq.net/zip.html#huffdec, which only requires a table read per bit. hyperfine 'Build/lagom/bin/unzip ~/Downloads/enwik8.zip' 1.008 s ± 0.016 s => 957.7 ms ± 3.9 ms, 5% faster Due to issue #25005, we can't peek the full 15 bits at once but have to read them one-by-one. This makes the code look a bit different than in the linked article. I also tried not changing CanonicalCode::from_bytes() too much. It does 15 passes over all symbols. I think it could do it in a single pass instead. But that's for a future change. No behavior change (other than slightly faster perf).
Sounds like a good idea, let's do that. That's also less prone to error than filling in bits silently. |
Consider the test case added in https://github.com/SerenityOS/serenity/compare/master...nico:serenity:deflate-peek?expand=1
It adds a canonical code that uses 1 bit for symbol 0, 2 bits for symbol 1, 3 bits for symbol 2, and so on, and then 15 bits each for symbols 14 and 15.
(15 is the longest allowed code length in deflate.)
It then uses a bit stream that contains one
0
, one10
, one110
, and so on, to encode each symbol from 0 to 15 once. That's 15 * 16 / 2 + 15 == 135 bits in total. It's one bit short of a multiple of 8, so the bitstream also contains a single trailing 0 bit.The bitstream decodes to the symbols 0, 1, 2, ..., 14, 15 – and then the final trailing
0
bit should decode to 0 again.But instead the final read results in:
That's because
CanonicalCode::read_symbol
does aTRY(stream.peek_bits<size_t>(m_max_prefixed_code_length));
to look up 8 bits into the future, for the fast lookup added in #18075. However, #21890 made it a mistake to look up at more bits than remain in the stream – even when just peeking.If I change LittleEndianInputBitStream's ctor to use UnsatisfiableReadBehavior::FillWithZero as default argument, the test starts passing.
I'm not sure what the nicest fix is:
discard_previously_peeked_bits
with more bits than available?The first seems a bit nicer, probably.
@timschumi @trflynn89
The text was updated successfully, but these errors were encountered: