'PdfReadError: File has not been decrypted' for unencrypted file #991

MartinThoma · 2022-06-14T16:20:08Z

When trying to extract the text from a PDF, I get an exception.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-113-generic-x86_64-with-glibc2.31

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.2.0

MCVE: Code and PDF

Using this PDF: https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf

from PyPDF2 import PdfReader
from tests import get_pdf_from_url
from io import BytesIO

reader = PdfReader(BytesIO(get_pdf_from_url("https://corpora.tika.apache.org/base/docs/govdocs1/976/976028.pdf", "tika-976028.pdf")))  # PdfReadWarning: incorrect startxref pointer(1)
reader.pages[0].extract_text()

I get:

Traceback (most recent call last):
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 354, in _get_num_pages
    self.decrypt("")
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1617, in decrypt
    return self._decrypt(password)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1657, in _decrypt
    raise NotImplementedError(
NotImplementedError: only algorithm code 1 and 2 are supported. This PDF uses code 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1462, in __getitem__
    len_self = len(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1453, in __len__
    return self.length_function()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 357, in _get_num_pages
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted

The text was updated successfully, but these errors were encountered:

MartinThoma · 2022-06-14T16:21:08Z

Might be related to #416

MartinThoma · 2022-06-14T16:23:22Z

Might change with #749

MartinThoma · 2022-06-19T09:55:13Z

This issue no longer occurs 🎉

MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Jun 14, 2022

MartinThoma self-assigned this Jun 14, 2022

MartinThoma added workflow-text-extraction From a users perspective, text extraction is the affected feature/workflow Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jun 14, 2022

MartinThoma closed this as completed Jun 19, 2022

MartinThoma mentioned this issue Jul 10, 2022

PdfReadError: File has not been decrypted #1088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'PdfReadError: File has not been decrypted' for unencrypted file #991

'PdfReadError: File has not been decrypted' for unencrypted file #991

MartinThoma commented Jun 14, 2022 •

edited

Loading

MartinThoma commented Jun 14, 2022

MartinThoma commented Jun 14, 2022

MartinThoma commented Jun 19, 2022

'PdfReadError: File has not been decrypted' for unencrypted file #991

'PdfReadError: File has not been decrypted' for unencrypted file #991

Comments

MartinThoma commented Jun 14, 2022 • edited Loading

Environment

MCVE: Code and PDF

MartinThoma commented Jun 14, 2022

MartinThoma commented Jun 14, 2022

MartinThoma commented Jun 19, 2022

MartinThoma commented Jun 14, 2022 •

edited

Loading