-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PdfReadError: Unexpected end of stream #1090
Comments
the PDF has an inline image where there is a EMC between the EI and the Q. PyPDF2 used to detect the end of the image by having a Q following the EI. This is not in accordance with the standard although this sequence is very common. I've issued the PR using [whitespace]EI[whitespace] to detect the end of the image. this is compatible with presence if EI within the image flow (a test case with such a file exists in test_generic.py) |
Potentially related PR: #332 |
Fix some images reading when some operations are inserted between EI and Q end of image is now considered with [whitespace]EI[whitespace] (4 characters should be sufficient) Fixes #1090
agree with you, @MartinThoma . |
I wanted to extract text from a PDF
Environment
Which environment were you using when you encountered the problem?
$ python -m platform Linux-5.4.0-121-generic-x86_64-with-glibc2.31 $ python -c "import PyPDF2;print(PyPDF2.__version__)" 2.4.2
Code + PDF
The pdf:
pdf/5c7a7f24459bcb9700d650062e0ab8bb.pdf
The text was updated successfully, but these errors were encountered: