Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCryptodome is required for some PDFs, but is not installed automatically as a dependency #1525

Closed
samuelbradshaw opened this issue Dec 31, 2022 · 12 comments

Comments

@samuelbradshaw
Copy link

When pycryptodome is not installed, pypdf fails to read some PDFs, and gives this error:

pypdf.errors.DependencyError: PyCryptodome is required for AES algorithm

Because I wasn't familiar with pycryptodome, I wasn't sure what I needed to do to get it working. Eventually I figured out that pycryptodome was a Python library, and all I had to do was run pip3 install pycryptodome to fix the error.

If possible, it would be nice if pypdf could 1) install pycryptodome as a dependency as part of the installation process for pypdf, OR 2) provide more information in the error, letting the user know that pycryptodome is a Python library than can be installed via pip.

Environment

Which environment were you using when you encountered the problem?

$ python3 -m platform
macOS-13.1-arm64-arm-64bit

$ python3 -c "import pypdf;print(pypdf.__version__)"
3.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

  1. Install pypdf (pip3 install pypdf).
  2. Make sure pycryptodome is not installed (pip3 uninstall pycryptodome).
  3. Run the following Python script:
from pypdf import PdfReader
from urllib.request import urlopen
from io import BytesIO

# Get the PDF and convert it into a byte stream
pdf_url = 'https://web.archive.org/web/30000101000000if_/http://www.latterdaytruth.org/pdf/100846.pdf'
pdf_file = urlopen(pdf_url).read()
pdf_bytes_stream = BytesIO(pdf_file)

# Load the file with pypdf
pdf_reader = PdfReader(pdf_bytes_stream)

# Print the number of pages
pages_count = len(pdf_reader.pages)
print('Number of pages: {0}'.format(pages_count))

This is the PDF I'm attempting to read: https://web.archive.org/web/30000101000000if_/http://www.latterdaytruth.org/pdf/100846.pdf

Traceback

Traceback (most recent call last):
  File "/Users/sbradshaw/Desktop/test-pypdf-pages.py", line 14, in <module>
    pages_count = len(pdf_reader.pages)
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_page.py", line 2063, in __len__
    return self.length_function()
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 445, in _get_num_pages
    return self.trailer[TK.ROOT]["/Pages"]["/Count"]  # type: ignore
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 266, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_base.py", line 259, in get_object
    obj = self.pdf.get_object(self)
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1205, in get_object
    retval = self._get_object_from_stream(indirect_reference)  # type: ignore
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1136, in _get_object_from_stream
    obj_stm: EncodedStreamObject = IndirectObject(stmnum, 0, self).get_object()  # type: ignore
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/generic/_base.py", line 259, in get_object
    obj = self.pdf.get_object(self)
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_reader.py", line 1269, in get_object
    retval = self._encryption.decrypt_object(
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 761, in decrypt_object
    return cf.decrypt_object(obj)
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 185, in decrypt_object
    obj._data = self.stmCrypt.decrypt(obj._data)
  File "/Users/sbradshaw/.pyenv/versions/3.10.2/lib/python3.10/site-packages/pypdf/_encryption.py", line 147, in decrypt
    raise DependencyError("PyCryptodome is required for AES algorithm")
pypdf.errors.DependencyError: PyCryptodome is required for AES algorithm
@MartinThoma
Copy link
Member

If you want it, you need to install the extra as documented

https://pypdf.readthedocs.io/en/latest/user/encryption-decryption.html

@samuelbradshaw
Copy link
Author

If you want it, you need to install the extra as documented

https://pypdf.readthedocs.io/en/latest/user/encryption-decryption.html

Thanks, it looks like the link to the installation guide at the top of that page is broken; however, I was able to get to the installation guide using the link in the table of contents on the left.

I tried running pip3 install pypdf[full] and got this error:

zsh: no matches found: pypdf[full]

Maybe it's because I'm using zsh instead of bash?

In any case, I was able to install pypdf[full] by adding quote marks: pip3 install "pypdf[full]"

@MartinThoma
Copy link
Member

Yes, that is a zsh issue. You need to escape the brackets

Thanks for pointing out that the link is broken, I'll fix that tomorrow

@MartinThoma MartinThoma reopened this Dec 31, 2022
@mogocat
Copy link

mogocat commented Jan 20, 2023

The doc for https://pypdf.readthedocs.io/en/latest/user/installation.md is down, shows 404, who can help me with this?

@MartinThoma
Copy link
Member

Where did you get that URL from?
You're probably looking for https://pypdf.readthedocs.io/en/latest/user/installation.html

@abyesilyurt
Copy link
Contributor

@MartinThoma the links has been broken on https://pypdf.readthedocs.io. There was a PR to fix that #1537. Can you re-deploy the docs?

@MartinThoma
Copy link
Member

I have no clue what you mean @abyesilyurt . When I go on the page, the links are not broken.

Also, this has nothing to do with the PyCryptodome issue, right? It would help me and others if you opened another issue instead of writing it into a random issue

@MartinThoma
Copy link
Member

aaah, now I get it 💡

#1537 was undone because it broke the build process completely

@MartinThoma
Copy link
Member

Let me open another issue for this :-)

@abyesilyurt
Copy link
Contributor

I have no clue what you mean @abyesilyurt . When I go on the page, the links are not broken.

Also, this has nothing to do with the PyCryptodome issue, right? It would help me and others if you opened another issue instead of writing it into a random issue

I am not writing this into a random issue, @mogocat mentioned above that he cannot access the links.

@rkshaon
Copy link

rkshaon commented Jan 22, 2025

Just install pip3 install pycryptodome this and restart the program, problem will be solved. But this package should install automatically as it's a dependency library.

@stefan6419846
Copy link
Collaborator

There is nothing wrong on the pypdf side here - we support running without any cryptography library and without Pillow installed. If you need support for one of them, you should use the provided extras:

pypdf/pyproject.toml

Lines 42 to 51 in f1b471b

full = [
"cryptography",
"Pillow>=8.0.0",
]
crypto = [
"cryptography",
]
cryptodome = [
"PyCryptodome",
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants