Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to copy a pdf #986

Conversation

mohamedsalem401
Copy link
Contributor

closes #951
Some badly-made PDF files can be automatically repaired by Adobe Reader, but can't be edited by pdf-lib.
Sometimes, working with a copy of these corrupted pdf can fix those issues.

Examples:
1- #951
https://github.com/Hopding/pdf-lib/files/6912122/test.pdf

opening and saving the original file:

image

opening and saving with PDFDocument.load(......, {returnCopy: true})

image

2- #140
Having the ability to load a copy directly allows you to remove all of the deleted pages' objects from the document. (thus reducing file size):

A file with deleted pages that weren't removed from the document

image

the exact same copy but after loading a copy and deleting all of the deleted pages' objects from the document.

image

3-Other undamaged pdfs can be viewed better when copied to pdf-lib first:

copy

after6

original

before6

@mohamedsalem401
Copy link
Contributor Author

@Hopding
Hi there,
just wondering if this PR requires any modification to any documentation 🤔

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohamedsalem401 thanks for working on this! It's intriguing to me that building a copy seems to fix so many different issues. I wonder why that is? 🤔

This is, of course, a workaround for the problems rather than a full solution of the root cause, because this requires users to know when they need to use .copy(). Also, .copy() won't always copy all information over to the new doc (acroforms, etc...).

Regardless, I'm happy to have a feature like this built into the lib. I've requested a slight refactor. Please also be sure to update the unit tests and integration tests to exercise this logic. 🙂

Comment on lines 153 to 186
if (returnCopy) {
const pdfDoc = new PDFDocument(context, ignoreEncryption, updateMetadata);
const pdfCopy = await PDFDocument.create();
const contentPages = await pdfCopy.copyPages(
pdfDoc,
pdfDoc.getPageIndices(),
);

for (const page of contentPages) {
pdfCopy.addPage(page);
}
if (pdfDoc.getAuthor() !== undefined) {
pdfCopy.setAuthor(pdfDoc.getAuthor()!);
}
if (pdfDoc.getCreationDate() !== undefined) {
pdfCopy.setCreationDate(pdfDoc.getCreationDate()!);
}
if (pdfDoc.getCreator() !== undefined) {
pdfCopy.setCreator(pdfDoc.getCreator()!);
}
if (pdfDoc.getModificationDate() !== undefined) {
pdfCopy.setModificationDate(pdfDoc.getModificationDate()!);
}
if (pdfDoc.getProducer() !== undefined) {
pdfCopy.setProducer(pdfDoc.getProducer()!);
}
if (pdfDoc.getSubject() !== undefined) {
pdfCopy.setSubject(pdfDoc.getSubject()!);
}
if (pdfDoc.getTitle() !== undefined) pdfCopy.setTitle(pdfDoc.getTitle()!);
pdfCopy.defaultWordBreaks = pdfDoc.defaultWordBreaks;

return pdfCopy;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are related to, but still distinct from, loading a document. So I think this would fit better as a .copy() method on PDFDocument. Then, instead of:

PDFDocument.load(..., { returnCopy: true })

we could do:

PDFDocument.load(...).copy()

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you've done this, be sure to add a proper doc comment to the .copy() method

@mohamedsalem401 mohamedsalem401 changed the title Add the ability to load a copy of a pdf Add the ability to copy a pdf Sep 27, 2021
@mohamedsalem401
Copy link
Contributor Author

I have made the changes you asked for 😃
I have also added documentation comment and tests for the new method.

Copy link
Owner

@Hopding Hopding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks again @mohamedsalem401! This feature will go out in the next release 🎉

@Hopding Hopding changed the base branch from master to load-a-copy-of-a-pdf-polish October 1, 2021 20:11
@Hopding Hopding merged commit 1e9a8e4 into Hopding:load-a-copy-of-a-pdf-polish Oct 1, 2021
@Hopding Hopding mentioned this pull request Oct 1, 2021
Hopding added a commit that referenced this pull request Oct 1, 2021
@mohamedsalem401
Copy link
Contributor Author

Thanks for accepting my PR @Hopding

I have some questions if you have the time:

  1. I will try to add integration tests for the .copy() method. which pdf from the assets should I try to copy ? or should I create a new pdf ? 🤔
  2. My PR should have closed Corrupted PDF #951 automatically when merged but since it was merged into a different branch the issue wasn't closed, should it be closed now or when the next release is here 🤔
  3. which issue in pdf-lib you would suggest I should try to work on next ?

Looking forward to making more contributions to this wonderful repository 😃

@mohamedsalem401 mohamedsalem401 mentioned this pull request Oct 7, 2021
@Hopding
Copy link
Owner

Hopding commented Oct 7, 2021

@mohamedsalem401

  1. I do request integration tests for most new features. But in this case, since the copy method packages up functionality that is exercised in other integration tests, I think just the unit tests will suffice.
  2. Good point. I hadn't considered that. I think I'll drop a comment in that issue when the next release is cut, make sure it works for the OP, and close the issue if so.
  3. Great question! I'm glad you're interested in making more contributions to pdf-lib! I don't have a good answer at the moment. But I'm working on a roadmap for pdf-lib and will be able to provide a better answer once it's done 🙂.

@emilsedgh
Copy link

emilsedgh commented Oct 8, 2021

@mohamedsalem401 Thank you so much for working on this!
I'm the original author of #951 and I have been looking for your email address but couldn't find it. So, I'm sorry for messaging you here.

I had originally set a bounty on #951 and I'd love to discuss it with you.

Is it possible for you to email me at emil@rechat.com ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Corrupted PDF
3 participants