Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted PDF #951

Open
emilsedgh opened this issue Aug 1, 2021 · 13 comments · Fixed by #986
Open

Corrupted PDF #951

emilsedgh opened this issue Aug 1, 2021 · 13 comments · Fixed by #986

Comments

@emilsedgh
Copy link

Hi.

This is an amazing library. Thanks a lot @Hopding. I know you've been inactive for a while but the quality of the code and the support you gave for this during your active time has been absolutely phenomenal. You don't see such fantastic support even for paid products. Good luck whatever you're up to.

My issue is this: I have this PDF file than looks like this:

Screen Shot 2021-07-31 at 9 06 17 PM

But when I open/save it using pdf-lib, it will look like this:

Screen Shot 2021-07-31 at 9 10 36 PM

Has anyone ever had a similar experience?

@emilsedgh
Copy link
Author

test.pdf

Here is the PDF file for reference so this could be easily reproduced.

@emilsedgh
Copy link
Author

emilsedgh commented Aug 1, 2021

I have added a $500 bounty for anyone who can fix this.

Not that I would consider this as fixed (for the bounty) only if this is fixed on PDF-Lib, not by changing the pdf file (eg saving/compressing it using other programs)

@PhakornKiong
Copy link

@emilsedgh

image

I believe there is some non-critical error in the pdf file provided since I'm not able to run it through iText RUPS to investigate the structure

com.itextpdf.kernel.PdfException: Invalid indirect reference {0}.

I'm suspecting that the custom font is not properly embedded.

For example, the following is using ArialBold.

105 0 obj
<</V (��) /DA (/ArialBold 0 Tf 0 0 0.501961 rg) /DR 114 0 R /F 4 /FT /Tx /Rect [39.5289 469.115 139.84 480.419 ] /Subtype /Widget /T (Lease MLS) /TU (Lease MLS) /Type /Annot /MK 118 0 R /Ff 0 /M (D:20210728200742Z) /AP <</N 19 0 R >> >> 
endobj
107 1 obj
<</Length 0 /Subtype /Form /BBox [0 0 99.64 11.479 ] >> stream

endstream

endobj

FYI it is not part of the standardFont
https://pdf-lib.js.org/docs/api/enums/standardfonts

Are you in control of the generation of that particular PDF File? or do you just want to modify it?

I've repaired your PDF file and provided in the following repo.

https://github.com/PhakornKiong/pdfLoadError

@emilsedgh
Copy link
Author

emilsedgh commented Aug 7, 2021

HI @PhakornKiong. Good job at investigating. Since other pdf software are able to recover from this situation, I'd love to see a patch that'd make pdf-lib also recover from it. For example other pdf software are able to fallback to other fonts.

Unfortunately I have a series of PDF's that are already generate. My intention is to be able to use them with pdf-lib.

Thanks.

@Hopding
Copy link
Owner

Hopding commented Sep 22, 2021

@emilsedgh does this happen if you save the document with pdfDoc.save({ useObjectStreams: false })?

@emilsedgh
Copy link
Author

Yes. The same thing happens although the results look slightly different.

@sparticvs
Copy link

test.pdf

Here is the PDF file for reference so this could be easily reproduced.

Is this the corrupted PDF or the original PDF?

@emilsedgh
Copy link
Author

emilsedgh commented Sep 28, 2021 via email

@dcsline
Copy link

dcsline commented Oct 7, 2021

I also received this cryptic issue. Then I tested with the pdfLoadError tool. The individual lines didn't convince me as error handling. So I split the PDF document into the individual pages (https://www.ilovepdf.com/split_pdf) and, curiously, the split first page is now displayed correctly. So just by splitting the problem is gone. I hope @Hopding ding it helps you.

@mohamedsalem401
Copy link
Contributor

@dcsline
there is an easier solution to your problem that will come with the new release of pdf-lib ( look PR NO #986 ).
Hopefully, that means that splitting your pdf is no longer needed before working with pdf-lib 😃

@gpugems
Copy link

gpugems commented Feb 6, 2024

Is the issue solved or not yet.

@nvlled
Copy link

nvlled commented Aug 22, 2024

@gpugems Looks like it's already fixed, @emilsedgh and @Hopding didn't bother updating the issue?

As mentioned in #986, copying the pdf fixes the issue:

import { PDFDocument } from 'pdf-lib';
import fs from "fs";

const buffer = fs.readFileSync("test.pdf");
const pdfDoc = await (await PDFDocument.load(buffer)).copy();
const pdfBytes = await pdfDoc.save();

fs.writeFileSync("output.pdf", pdfBytes);

Although, I guess that's more of a workaround rather than an actual fix.

@nvlled
Copy link

nvlled commented Aug 22, 2024

Or not, there's a large size difference:

5.7M    test.pdf
1.5M    output.pdf

Copying the PDF also strips away this embedded js from test.pdf:

Calculate Field:
{"type":"Context","context":"list_price","format":"MMMM DD, YYYY"}

Calculate Field:
{"type":"Context","context":"list_date","format":"MMMM DD, YYYY"}

Calculate Field:
{"type":"Context","context":"expiration_date","format":"MMMM DD, YYYY"}

Calculate Field:
{"type":"Context","context":"year_built","format":"MMMM DD, YYYY"}

Calculate Field:
{"type":"Context","context":"street_number"}

Calculate Field:
{"type":"Context","context":"street_dir_prefix"}

Calculate Field:
{"type":"Context","context":"street_name"}

Calculate Field:
{"type":"Context","context":"street_suffix"}

Calculate Field:
{"type":"Context","context":"unit_number"}

Calculate Field:
{"type":"Context","context":"state"}

Calculate Field:
{"type":"Context","context":"county"}

Calculate Field:
{"type":"Context","context":"city"}

Calculate Field:
{"type":"Context","context":"postal_code"}

Calculate Field:
{"type":"Context","context":"lot_number"}

Calculate Field:
{"type":"Context","context":"block_number"}

Calculate Field:
{"type":"Context","context":"subdivision"}

Calculate Field:
{"type":"Context","context":"mls_area_major"}

Calculate Field:
{"type":"Context","context":"mls_area_minor"}

Calculate Field:
{"type":"Date","format":"MMMM DD, YYYY"}

Calculate Field:
{"type":"Role","role":["SellerAgent"],"number":0,"attributes":["agent.mlsid"]}

Calculate Field:
{"type":"Roles","role":["Landlord"],"attributes":["legal_full_name"]}

Calculate Field:
{"type":"Context","context":"building_number"}

Calculate Field:
{"type":"Assignment","role":["SellerAgent"],"number":0,"assignment":"Signature"}

Calculate Field:
{"type":"Assignment","role":["Seller","SellerPowerOfAttorney"],"number":1,"assignment":"Signature"}

Calculate Field:
{"type":"Assignment","role":["Seller","SellerPowerOfAttorney"],"number":0,"assignment":"Signature"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants