Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to inflate a sequence of bytes #174

Closed
photopea opened this issue Sep 24, 2019 · 16 comments
Closed

Failing to inflate a sequence of bytes #174

photopea opened this issue Sep 24, 2019 · 16 comments
Labels

Comments

@photopea
Copy link

Deflate is used in a PDF format for lossless compression.

A PDF file has been sent to me with a following sequence of 6419 bytes: "file.log".

When I try to inflate them with my own library UZIP.js, I get 71731 bytes: "file.txt".

Also, the PDF file can be displayed with pdf.js, and Adobe Reader (both seem to have their own Deflate implementations).

pako.js fails on this sequence of bytes, showing "Uncaught invalid distance too far back"
file.log
file.txt

@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

I don't understand what to do with this and how is this related to pako. Do you have small executable code sample, proving that problem is really at pako side?

@photopea
Copy link
Author

The problem is, that pako.js can not do Deflate decompression. Could you fix it?

var buf = new Uint8Array([88, 133, 205, 93, 219, ...]);  // the content of file.log
var bytes = pako.inflate(buf);  // throws an error

@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

If you came with statement about pako bug, i need proof. Because

  • input may be invalid.
  • wrong function may be used (inflate <=> inflateRaw) and so on.

Current info is not enougth. For example, you could create test repo with node.js script, and use node's built-in zlib method. Also see #139.

@photopea
Copy link
Author

photopea commented Sep 25, 2019

I thought you would trust me a bit more :P As I said before, it is a valid ZLIB stream, that is accepted by three different ZLIB implementations.

Anyway, I made this demo: http://www.ivank.net/veci/pako_test.html

Look at the source code and the console output. As you can see, UZIP.js and zlib.js process it well, while pako.js crashes.

@puzrin puzrin added the bug label Sep 25, 2019
@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

Can this be a dupe of #139?

There were posted workarounds in issie and branch. But one strange test fails, and nobody can remember what it does :).

@photopea
Copy link
Author

It is hard to tell, if the bugs are related.

I am just providing a valid input, for which, pako.js does not provide a correct output.

But if pako.js is a port of ZLIB library, then, the bug is either in ZLIB itself, or in the way you rewrote it to Javascript (I guess the second one is more probable).

@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

Not necessary. Refered bug is in wrapper, zlib port itself is correct.

@photopea
Copy link
Author

You are right! If I slice 2 bytes from the beginning and 4 bytes from the end of the sequence, pako.inflateRaw() works correctly!

The first byte = 88 = 0101 1000 defines the LZ77 window size of 8192 Bytes. I guess the Deflate stream inside uses a larger size for compression. Could you just ignore these two bytes and always use the window size of 32 768 Bytes?

@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

Sound reasonable. Could you try add | 15 here to the end https://github.com/nodeca/pako/blob/master/lib/inflate.js#L137 ?

@photopea
Copy link
Author

I am sorry, I am quite busy at the moment :(

I don't really need pako.js , as I replaced it by my own library UZIP.js . I just thought you might want to know about this bug.

@puzrin
Copy link
Member

puzrin commented Sep 25, 2019

No problem. I will add fix a bit later. Thank you for your help!

@puzrin
Copy link
Member

puzrin commented Sep 26, 2019

I digged sources, this can not be fixed without diverging zlib port from upstream.

  1. https://github.com/nodeca/pako/blob/master/lib/zlib/inflate.js#L482 - hear length is taken from header AND next line skip window size set if windowBits option passed before.
  2. https://github.com/nodeca/pako/blob/master/lib/zlib/inflate.js#L491 - but hear max distance still based on header value.

https://github.com/madler/zlib/blob/master/inflate.c#L699 - upstream has the same behaviour.

If first bytes replaced to [120, 156] (max window + updated header crc), inflate works

@puzrin
Copy link
Member

puzrin commented Sep 26, 2019

Since following mainstream is more important than autofixing mailformed header, i'd prefer leave everything as as.

madler/zlib#449 reported to upstream for sure.

@RenaKunisaki
Copy link

So what am I to do when I get this problem with data that I know to be valid but don't control?

@puzrin
Copy link
Member

puzrin commented Nov 6, 2020

@RenaKunisaki you can apply raw inflate manually, for example.

@RenaKunisaki
Copy link

How do I do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants