-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gofpdi fails to correctly parse streams on some pdfs #53
Comments
napalu
added a commit
to napalu/gofpdi
that referenced
this issue
Jun 5, 2022
use bytes.Buffer instead of string concatenation in readToken closes phpdave11#53
syamcode
added a commit
to syamcode/gofpdi
that referenced
this issue
Aug 9, 2022
Bugfix/read past end of stream phpdave11#53
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When reading some PDFs (seen this typically when importing scanned-in PDFs), gofpdi will fail to detect 'endstream', panicking with
panic: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: dstream
.When reading a PDF stream the reader should start reading stream after the first
CRLF
sequence but instead skips all leading whitespace which can result in reading past the 'endstream' token.Here's a test PDF with described behaviour.
BRW2C6FC94B5488_000827.pdf
The text was updated successfully, but these errors were encountered: