Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encryption and Decryption of PDFs examples in documentation remove attachments, links, ... #2543

Closed
redfast00 opened this issue Mar 26, 2024 · 3 comments

Comments

@redfast00
Copy link
Contributor

I want to decrypt a PDF document, while keeping the document intact. The documentation gives an example for this (https://pypdf.readthedocs.io/en/stable/user/encryption-decryption.html); but this decrypts the document, then reconstructs it by copying page by page to the new document.

This removes attachments, but also the table of contents, the links in the document pointing to other pages, ...

I would like an approach where the encrypted data is decrypted 'in-place', keeping the document structure intact.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.5.0-26-generic-x86_64-with-glibc2.35

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('cryptography', '42.0.5'), PIL=none
(modified by me to add support for the pubsec decryptor, but this doesn't affect anything)

Code + PDF

This is a minimal, complete example that shows the issue; it does not use encryption, but it has the same problem as the sample that does use encryption.

from pypdf import PdfReader, PdfWriter

reader = PdfReader("PN7160_PN7161.pdf")
writer = PdfWriter()

for idx, page in enumerate(reader.pages):
    writer.add_page(page)

print("Saving to file...")
# Save the new PDF to a file
with open("out.pdf", "wb") as f:
    writer.write(f)

PN7160_PN7161.pdf

@j-t-1
Copy link
Contributor

j-t-1 commented Mar 26, 2024

Create a backup of the PDF, then switch these:

writer = PdfWriter()

for idx, page in enumerate(reader.pages):
    writer.add_page(page)

writer = PdfWriter(clone_from=reader)

@redfast00
Copy link
Contributor Author

@j-t-1 that works! Should I make a PR to replace that in the documentation?

@j-t-1
Copy link
Contributor

j-t-1 commented Mar 26, 2024

Could do, currently it is not obvious that adding each page is insufficient to have equivalency; I had a similar problem #2485.

@py-pdf py-pdf locked and limited conversation to collaborators Mar 26, 2024
@stefan6419846 stefan6419846 converted this issue into discussion #2544 Mar 26, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants