Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PdfReadError: File has not been decrypted #1088

Closed
MartinThoma opened this issue Jul 10, 2022 · 7 comments · Fixed by #1170
Closed

PdfReadError: File has not been decrypted #1088

MartinThoma opened this issue Jul 10, 2022 · 7 comments · Fixed by #1170
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF MCVE in Tests The MCVE was added to PyPDF2 test suite PdfReader The PdfReader component is affected workflow-encryption From a users perspective, encryption is the affected feature/workflow

Comments

@MartinThoma
Copy link
Member

MartinThoma commented Jul 10, 2022

I was trying to read metadata from a PDF that is not encrypted. The file is encrypted with an empty password:

$ pdfinfo example.pdf

...
Encrypted:      yes (print:yes copy:no change:no addNotes:no algorithm:RC4)
...

Environment

$ python -m platform
Linux-5.4.0-121-generic-x86_64-with-glibc2.31

# Seen first (inclusive)
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.2

# Seen last (inclusive)
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.8.0

Code + PDF

The PDF: pdf/9bc3765eb6426bb34139d419a6e1f79e.pdf

>>> from PyPDF2 import PdfReader
>>> reader = PdfReader("pdf/9bc3765eb6426bb34139d419a6e1f79e.pdf")

>>> len(reader.pages)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1469, in __len__
    return self.length_function()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 389, in _get_num_pages
    return self.trailer[TK.ROOT]["/Pages"]["/Count"]  # type: ignore
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 680, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 252, in get_object
    obj = self.pdf.get_object(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1090, in get_object
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted

>>> reader.metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 319, in metadata
    obj = self.trailer[TK.INFO]
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 680, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 252, in get_object
    obj = self.pdf.get_object(self)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 1090, in get_object
    raise PdfReadError("File has not been decrypted")
PyPDF2.errors.PdfReadError: File has not been decrypted
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF workflow-encryption From a users perspective, encryption is the affected feature/workflow PdfReader The PdfReader component is affected labels Jul 10, 2022
@MartinThoma
Copy link
Member Author

Related to #416 and #991

@MartinThoma MartinThoma added the Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests label Jul 10, 2022
@MartinThoma
Copy link
Member Author

MartinThoma commented Jul 10, 2022

@MatthiasValvekens
Copy link

That file is encrypted, but with an empty user password :) (AKA "mild obfuscation" rather than encryption, but hey). Seems to be 128-bit RC4.

I don't remember offhand how PyPDF2 handles this case, but it could be that a .decrypt('') call on the reader is all that is needed.

@MartinThoma
Copy link
Member Author

@MatthiasValvekens When I specify PdfReader(stream, password=""), I get PyPDF2.errors.PdfReadError: Wrong password.

@MartinThoma
Copy link
Member Author

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same. From a users perspective, this is very confusing.

MartinThoma added a commit that referenced this issue Jul 17, 2022
@MartinThoma MartinThoma added MCVE in Tests The MCVE was added to PyPDF2 test suite and removed Has MCVE A minimal, complete and verifiable example helps a lot to debug / understand feature requests labels Jul 17, 2022
@pubpub-zz
Copy link
Collaborator

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same. From a users perspective, this is very confusing.

I agree with your proposal, however, there definitively seems to be a problem with the decoder : I did some test with pdfminer.six and the empty password do work

@xilopaint
Copy link
Contributor

Seeing that basically every PDF viewer automatically tries the empty password, I think PyPDF2 should do the same.

Hope to see this implemented soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF MCVE in Tests The MCVE was added to PyPDF2 test suite PdfReader The PdfReader component is affected workflow-encryption From a users perspective, encryption is the affected feature/workflow
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants