Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Attempt to handle decompression error on some broken PDF files
from times to times we go through files where no text is detected, while readers like evince reads the pdf nicely. After digging it occured this is because the PDF includes some badly compressed data (unproper checksum). This may be fixed by uncompressing byte per byte and ignoring the error on the checksum bytes (arbitrarily found to be the 4 last, which seems consistent with a int32 checksum). This has been largely inspired by py-pdf/pypdf#422 and the test file has been taken from there, so credits to @zegrep.
- Loading branch information