Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency and undefined return in inflate method between versions v1 and v2 #283

Open
microshine opened this issue Jan 23, 2024 · 0 comments

Comments

@microshine
Copy link

Hello pako team,

I have encountered a couple of issues with the inflate method when working with different versions of the pako library.

Environment:

  • Node.js version: v20.10.0
  • Pako version 1 (alias pako_v1): 1.0.11
  • Pako version 2 (alias pako_v2): 2.1.0

Steps to Reproduce:

  1. Install two versions of pako using the following commands:
    npm install pako_v1@npm:pako@^1.0.0
    npm install pako_v2@npm:pako@^2.0.0
  2. Run the following script:
    const pako1 = require('pako_v1');
    const pako2 = require('pako_v2');
    
    function main() {
      const data = Buffer.from("789c3d4e4b0ac2400cdd07728739c1f465da693b2005c522ba2bcc4e5c886029584aab0b8f6f3a8a0492bcbc0fe19969261fbc7500c43903adba105b56210457257c1b293b8e30fb893a22ee9824dde5c732216d4bcf94b5efd701a67f32eda2c2ed325c1faa8c7775390bd5fddb17fbb258472d623d4cd4b4f3ca6e805c9fcad15c4c3c31b59af601d2ca22900a", "hex");
    
      const result1 = pako1.inflate(data);
      const result2 = pako2.inflate(data);
    
      console.log(Buffer.from(result1).toString());
      if (result2 === undefined) {
        console.log("pako2.inflate returns undefined");
      } else {
        console.log(Buffer.from(result2).toString());
      }
    }
    
    main();

Expected Behavior:

The inflate method should return a Uint8Array or throw an error if the inflation is unsuccessful.

Observed Behavior:

  1. In pako version 2 (pako_v2), the inflate method returns undefined instead of a Uint8Array or an error.
  2. There seems to be a discrepancy in how extra bytes at the end of the input data are handled between versions. Version 1 (pako_v1) correctly discards the extra byte 0x0A at the end of the data, while version 2 (pako_v2) does not.

Additional Context:

The input data buffer is extracted from a PDF document's stream object. Some PDF documents may incorrectly specify the Length for stream objects and may also set an incorrect EOL character before endstream, resulting in binary data with extra bytes (such as 0x0A or 0x0D), as in this case.

This behavior is problematic because it affects the ability to process certain PDF streams, which might have incorrectly reported lengths or have an additional EOL character due to incorrect PDF generation.

Could you please look into these issues? The handling of the end bytes is crucial for my use case, where I process PDF files that may not always be correctly formed.

Thank you for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant