Showing squares with hex values instead of text in some PDFs #15289

jeremyn · 2022-08-08T00:28:07Z

Attach (recommended) or Link to PDF file here:

Any O'Reilly book preview PDF at Humble Bundle seems to have the problem. You can find a current bundle here with many direct links. Please note the bundles only last for a limited time, and direct links to preview PDFs have a "ttl" query string parameter so I suppose they expire. I'm unwilling to upload an example here since I don't own the content. It would be ideal if a pdf.js dev could download test cases themselves while the bundle is still active.

Configuration:

Web browser and its version: Firefox 103.0.1 (64-bit).
Operating system and its version: Windows 10 Pro 10.0.19044.
PDF.js version: 2.15.129 according to the developer tools console.
Is a browser extension: No.

Steps to reproduce the problem:

Open one of the O'Reilly book preview PDFs on Humble Bundle and see the problem. It also happens if I save the preview PDF locally and then open it. Others have reported the problem on Reddit here. It only started recently, in the past month or so. If I copy and paste text from the broken preview PDF into Notepad, the text looks fine.
The problem doesn't happen if I:

open a complete O'Reilly non-preview PDF that is local on my system
open an O'Reilly preview PDF in Firefox on Linux
open an O'Reilly preview PDF in Chrome on Windows
open a Humble Bundle preview PDF from another publisher (Packt)

What is the expected behavior? (add screenshot)

PDF is readable.

What went wrong? (add screenshot)

PDF is unreadable because all the text is replaced by squares with hex values. On some but not all broken PDFs I see errors like this in the console:

Warning: Failed to load font 'g_d0_f3': 'SyntaxError: An invalid or illegal string was specified'. pdf.js:446:13
downloadable font: CFF : Failed to parse Global Subrs INDEX (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: CFF : Failed to parse table (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: rejected by sanitizer (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: font load failed (font-family: "g_d0_f3" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):

N/A.

jeremyn · 2022-08-08T01:06:10Z

Also it seems that some but not all of the preview PDFs in the current Essential Classic Fantasy RPG Collection bundle, not from O'Reilly, have the problem. An interesting example is the preview PDF for Let's Get Kraken where only the header text which is supposed to be "PART ONE: ADVENTURE OVERVIEW" is squares-with-hex-values, with the rest of the text readable.

Snuffleupagus · 2022-08-08T10:05:59Z

Attaching a preview here, which is hopefully OK, since this isn't really easily actionable otherwise: issue15289.pdf

PDF f8951e869f298f8d2652f80b3c02c490 [1.7 GPL Ghostscript 9.56.1 / AH CSS Formatter V6.0 MR2 for Linux64 : 6.0.2.5372 (2012/05/16 18:26JST)] (PDF.js: 2.15.322) [viewer.js:1531:13](resource://pdf.js/web/viewer.js)
Warning: Failed to load font 'g_d0_f2': 'SyntaxError: An invalid or illegal string was specified'. [pdf.js:456:13](resource://pdf.js/build/pdf.js)
downloadable font: CFF : Failed to parse Global Subrs INDEX (font-family: "g_d0_f2" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: CFF : Failed to parse table (font-family: "g_d0_f2" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: rejected by sanitizer (font-family: "g_d0_f2" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
downloadable font: font load failed (font-family: "g_d0_f2" style:normal weight:400 stretch:100 src index:0) source: (invalid URI)
Warning: Out of bounds subrIndex for callgsubr 30 [pdf.worker.js:1123:13](resource://pdf.js/build/pdf.worker.js)

All of the affected fonts are Type1/Type1C, i.e. CFF fonts.
Unfortunately there appears to be multiple different issues with the font-data in that PDF document:

The regular font, i.e. "JZKJAS+MinionPro-Regular", which renders with hex-outlines. There's no errors or warning messages for this one, however it renders correctly with disableFontFace=true set.
The italic font, i.e. "JCVSVF+MinionPro-It", which renders only partially. This font is rejected by OTS, see above, and it's not helped by the disableFontFace option.
The bold font, i.e. "YZOBJZ+MyriadPro-SemiboldCond", which doesn't render at all. There's a warning message for this one, however it renders correctly with disableFontFace=true set.

jeremyn · 2022-08-08T13:46:33Z

@Snuffleupagus I can confirm that your uploaded example reproduces the issue for me.

Please note that I said some but not all of the examples have errors in the console. Both the preview PDFs for Robust Python in the O'Reilly bundle I first linked, and Let's Get Kraken from the RPG bundle I linked in my first comment, have hex squares but no console errors. Also I looked more at the current Packt bundle here and contrary to my initial description where I said the Packt books were okay, the preview for the book Machine Learning with PyTorch and Scikit-Learn does have hex squares (and no console errors).

calixteman · 2022-08-08T14:50:55Z

About the sanitizer issue, it's very likely because of:

pdf.js/src/core/cff_parser.js

Line 1870 in 40f9f7e

return [0, 0, 0];

The specs say:

An empty INDEX is represented by a count field with a 0 value
and no additional fields. Thus, the total size of an empty INDEX
is 2 bytes.

and a bug has been fixed in the sanitizer 13y ago:
khaledhosny/ots@9c33fbf
when the patch for the CFFParser is 11y old:
ce53b1b#diff-20c7b0dcebfcf76bb7b0da0ea5fd10dd0a9b146507a7751dc16d16944dbf7cbc

The fix for OTS is likely to replace [0, 0, 0] by [0, 0].

That said I have no idea for the other issues.

calixteman · 2022-08-08T17:26:37Z

On mac OS, the rendering of page 1 is ok except for the italic font.
And the italic font issue is fixed thanks to the patch shown in my previous comment.
As @Snuffleupagus said, everything is ok with disableFontFace=true, so it's very likely an issue with the font rendering engine.

jeremyn · 2022-08-08T18:43:35Z

I can confirm the hex squares become text with pdfjs.disableFontFace set to true in about:config in Firefox, so that's good.

I'm not sure what disableFontFace does but if in your experience there is some common change PDF creators can make to avoid this problem with the default false setting, I can open a support issue with Humble Bundle about it. It would need to be specific advice though, not "Firefox can't display your PDFs, here's a bug report" but "the developers think you should embed fonts/change some list of numbers/etc". Even then I don't know if it will help but I'd like to try multiple approaches, especially if this will take a while on the pdf.js side (talking about 11 year old fixes).

calixteman · 2022-08-08T20:14:46Z

I filed bug for Firefox:
https://bugzilla.mozilla.org/show_bug.cgi?id=1783740

jeremyn · 2022-08-08T22:02:33Z

Thanks for the quick turnaround! For what it's worth however, though I can reproduce the hex square ("tofu", I guess) with the plop.html and plop.ttf attached to the Firefox bug, changing pdfjs.disableFontFace to true does not fix the problem with those files. If that's expected then all right, I just wanted to report that here.

jeremyn · 2022-08-19T17:05:22Z

@calixteman I'm not really sure of the status of this issue. You submitted a PR (thanks!) to this repository about two weeks ago and closed this issue. The related Firefox bug is still open. That bug is marked as "Version: Firefox 105". The current stable version of Firefox is v103. Does that mean this pdf.js bug should definitely be fixed in Firefox v105?

marco-c · 2022-08-22T11:10:17Z

The fix for a subset of this issue (#15290) landed in Firefox Nightly in https://bugzilla.mozilla.org/show_bug.cgi?id=1784537.
The rest of the issues you mentioned are covered by https://bugzilla.mozilla.org/show_bug.cgi?id=1783740, until https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 is fixed, you'll still see them.

jeremyn · 2022-08-22T14:32:00Z

@marco-c I don't know any of the details of how pdf.js works or how it plugs into Firefox. As an end user the issue I reported is that when I open a subset of PDFs from a specific publisher (Humble Bundle), some of the text in those PDFs is unreadable.

Looking at all these issues and bugs what I see as an end user is:

this GitHub issue we're on is closed but the PDFs are still broken in Firefox
there is a discrepancy that I mentioned above between the test case in the Firefox issue and the behavior I'm seeing in this GitHub issue, so I'm unclear if the Firefox bug really matches this GitHub issue
the Firefox issue appears to have stalled out almost two weeks ago with some open questions which might be answered by pointing the asker back to this GitHub issue
there's no clear sign when this should be fixed in Firefox, so even if I ignore all this other stuff as strictly internal, I don't have any specific point when I could say "the bug tracker said this was fixed but it is not"

To be clear I'm not trying to rush anybody. If this were at a point of "yes, we have all the info we need, we'll get to it when we get to it" then that's fine. At the moment though there are still some open questions and uncertainty on my side whether the correct problem is being tracked, so I'd like to resolve those before leaving things alone. In fact it feels a little like the various devs here have been sidetracked on different problems but the core problem of "I can't read these PDFs" has gotten lost.

marco-c · 2022-08-22T16:20:37Z

@marco-c I don't know any of the details of how pdf.js works or how it plugs into Firefox. As an end user the issue I reported is that when I open a subset of PDFs from a specific publisher (Humble Bundle), some of the text in those PDFs is unreadable.

@jeremyn there were actually different root issues affecting the PDFs you shared with us, one class of issues has been fixed as part of #15290 (which closed this issue). The rest of the issues are unrelated to pdf.js itself but are due to Firefox internal graphics engine, and these issues are tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1783740.

there's no clear sign when this should be fixed in Firefox, so even if I ignore all this other stuff as strictly internal, I don't have any specific point when I could say "the bug tracker said this was fixed but it is not"

Until https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 is fixed, you will still be able to reproduce some (if not all) of the issues you mentioned initially.

the Firefox issue appears to have stalled out almost two weeks ago with some open questions which might be answered by pointing the asker back to this GitHub issue

Thanks, I'll point Jonathan to this issue. @calixteman is away, or he would have answered him.

jeremyn · 2022-08-22T19:04:11Z

@marco-c Thanks.

Do you have thoughts on the workaround of setting pdfjs.disableFontFace to true in Firefox? I've read several issues and discussions about this setting and still don't understand it. I think one setting has the OS render fonts, and the other setting keeps font rendering in PDF.js/Firefox but I'm not sure which is which. Also some people say they get different breakages depending on the setting.

Is PDF.js/Firefox treating this as a specific problem for a few fonts or as a systemic problem? As my earlier comments say this is widespread across Humble Bundle PDFs from a variety of publishers. It would be unfortunate for me for this issue to take a long time to resolve only to find out it was some hyper-specific fix for the one sample PDF uploaded here.

Also about Humble Bundle, in an earlier comment #15289 (comment) I asked here if there is some useful request I can make to their support group. If they are generating PDFs in some bad way then I can just ask them to stop. Do you have any advice about that?

marco-c · 2022-08-23T16:43:56Z

@jeremyn it seems to be related to these PDFs and not a widespread problem, it could be useful to ask them questions to answer all of @jfkthame's questions from https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 (and maybe he has more after reading this thread).

jeremyn · 2022-08-23T23:47:11Z

@marco-c I created an issue with Humble Bundle support and directed them to the Bugzilla issue. I can't say what the escalation process is between their support people and whoever deals with this sort of problem on their side. I want to get out of the middle here so as far as that all goes, this is not my issue anymore.

Do you have any info about pdfjs.disableFontFace? I'll let it go after this, but since it makes the problem go away I'm curious if it's something that I can just set to true and forget about, or what.

marco-c · 2022-08-25T23:10:44Z

@jeremyn I'm not familiar with that option, but if it isn't the default it must mean that it has downsides that exceed the improvements, so I would keep it to false.

humble-cburnham · 2022-09-08T16:50:42Z

Chris from Humble Bundle here.

It looks like our process for making pdf preview is to take the full PDF and run it through GhostScript to truncate the PDF.
We use -sDEVICE=pdfwrite, and only include a few pages starting from the first chapter (skipping the title page and table of contents).
We also potentially reduce the resolution in some cases.

The full book PDFs do render just fine in Firefox, It's clear something in this process is triggering the bug in Firefox, but I'm not sure what. If I get some free time, I can try some alternate arguments for ghostscript to workaround this issue going forward. Maybe something useful in the stripped out pages is getting lost?

I also looked at Bugzilla, and they've got a pretty minimal test case to demonstrate the issue as well.

jfkthame · 2022-09-09T22:28:38Z

Given that the "full book PDFs do render just fine in Firefox", it appears that GhostScript is damaging the font in some way, perhaps during the process of subsetting to include only the characters present in the selected pages.

I don't think this is a really a Firefox bug as such; note that https://bugzilla.mozilla.org/show_bug.cgi?id=1783740 indicates that the font similarly fails to load in Edge.

Maybe the -dSubsetFonts=false would make a difference to how it behaves? It's pretty hard to diagnose exactly why the Windows API is rejecting the subsetted font, when OTS is happy with it and macOS seems to accept it fine, but my best guess is that GS's subsetting operation is doing something slightly questionable, and DirectWrite doesn't like it.

I suppose if you can share a "full" PDF that works, along with a truncated preview (created from the same document) where the font fails, we can try to extract the corresponding font resources from each and compare them, though CFF is a fearsomely complex format and it may be hard to identify exactly what is triggering the failure.

sergei-harbour · 2022-10-05T20:13:11Z

My issue was linked to this one so I tried to review all the comments and links here. But I'm not sure that it's the same. @Snuffleupagus points me to this comment in this thread, but my issue is not Firefox specific, also, even in the latest version of the Firefox the issue is still reproducible.
Sorry if I miss something obvious here.

jeremyn · 2022-10-06T12:45:35Z

@sergei-harbour As I understand it, this issue is only partially fixed, with the rest moved to the Bugzilla tracker. See #15289 (comment).

Also, your test file is broken for me in Firefox 105.0.2 with pdfjs.disableFontFace set to false (the default) but works if I set that to true, a workaround discussed in earlier comments, which suggests the two issues are related. However note #15289 (comment) which says this is probably not a good permanent workaround.

sergei-harbour · 2022-10-12T10:59:10Z

but works if I set that to true

True, it works, but after some research it looks like it stops working for those cases when a pdf has a non-standard font that is not embedded to the doc. It doesn't fall back to the system font. Maybe some retry logic can be a workaround here, something like:

Render with disableFontFace: true, stopAtErrors: true
If the previous step fails fall back to disableFontFace: false
Hope no one on earth uses a doc with a set of non-standard embedded and unembedded fonts at the same time

The thing is that I deal with tons of PDF docs in my system and I can't control the way how they are created. Maybe I need to preprocess the docs somehow and replace/embed the fonts that don't play nice with pdf.js.

calixteman · 2023-01-07T12:38:28Z

After digging in the font I finally found that it's because of ExpansionFactor set to 0 in the private font dict.
It seems to be fixed by just removing this entry.
I've no idea about what this parameter is supposed to mean, there is nothing helpful in the CFF specifications:
https://adobe-type-tools.github.io/font-tech-notes/pdfs/5176.CFF.pdf

jfkthame · 2023-01-07T13:30:00Z

Oh, interesting! Congratulations on tracking this down.

Neither the Adobe CFF specification nor the OpenType spec for CFF2 seems to give any clue what this means; they just mention a default value of 0.06, but not a word about what other values would be valid or what effect it's supposed to have. shrug

Removing it in pdf.js should fix the immediate issue with rendering PDFs that contain such fonts, but we could also consider removing it in OTS, so that if a "bad" font is used as a webfont (independently of PDF embedding) it would also resolve that case. Though maybe such fonts only arise as a result of some (faulty?) PDF-generating workflows.

jfkthame · 2023-01-07T13:36:18Z

Ah - looks like this is inherited from the old Type 1 spec. See page 45 in https://adobe-type-tools.github.io/font-tech-notes/pdfs/T1_SPEC.pdf for information.

@calixteman I'm just wondering, if you reset the value to 0.06 (the default) instead of removing it, does that also resolve the failure? If so, maybe that would be the lowest-risk approach, just in case any rendering engine expects the entry to be present.

calixteman · 2023-01-07T15:08:09Z

I updated my PR to set the property to 0.06.
I'd say that OTS should do the job because I won't bet one euro that nobody uses such a font as a webfont.
Or if 0 is a legal value, we should ask to some MS people to fix the bug on Windows, since it appears to not be a problem on mac and linux.

Avoid null ExpansionFactor in type1 fonts (follow-up of #15289)

Snuffleupagus added font-conversion font-type1 font-wont-sanitize labels Aug 8, 2022

calixteman linked a pull request Aug 8, 2022 that will close this issue

Fix OTS issue with empty index (#15289) #15290

Merged

calixteman closed this as completed in #15290 Aug 8, 2022

rousek pushed a commit to signosoft/pdf.js that referenced this issue Aug 10, 2022

Fix OTS issue with empty index (mozilla#15289)

222c109

Snuffleupagus mentioned this issue Aug 28, 2022

Characters in Unicode Private Use Area fail to render #15358

Closed

Snuffleupagus mentioned this issue Sep 30, 2022

Nimbus Sans font is not displayed properly #15528

Closed

Snuffleupagus mentioned this issue Oct 7, 2022

PDF.js Viewer does not display PDF correctly. #15545

Closed

denschub mentioned this issue Oct 17, 2022

dl.humble.com - PDF rendering is broken webcompat/web-bugs#111387

Closed

Snuffleupagus mentioned this issue Oct 25, 2022

Blank PDF on Firefox (Windows) but displays correctly on Firefox (Mac) #15621

Closed

Snuffleupagus mentioned this issue Dec 12, 2022

OTS Parsing error: Unable to instantiate font face from font data. Warning: Failed to load font 'g_d0_f8' : 'SyntaxError': Invalid font data in ArrayBuffer || Issue only for Windows, in MacOS it is working as expected #15190

Closed

Snuffleupagus mentioned this issue Jan 5, 2023

Strange characters in some pdfs #15895

Closed

calixteman reopened this Jan 7, 2023

calixteman self-assigned this Jan 7, 2023

calixteman linked a pull request Jan 7, 2023 that will close this issue

Set ExpansionFactor to 0.06 when it's equals to 0 in the private dict of CFF fonts #15900

Merged

calixteman mentioned this issue Jan 7, 2023

Set ExpansionFactor to 0.06 when it's equals to 0 in the private dict of CFF fonts #15900

Merged

calixteman closed this as completed in #15900 Jan 7, 2023

calixteman added a commit to calixteman/pdf.js that referenced this issue Jan 7, 2023

Avoid null ExpansionFactor in type1 fonts (follow-up of mozilla#15289)

698d4d5

calixteman added a commit to calixteman/pdf.js that referenced this issue Jan 7, 2023

Avoid null ExpansionFactor in type1 fonts (follow-up of mozilla#15289)

c170245

calixteman added a commit that referenced this issue Jan 7, 2023

Merge pull request #15901 from calixteman/15289_followup

fcaeb5d

Avoid null ExpansionFactor in type1 fonts (follow-up of #15289)

jfkthame mentioned this issue Jan 25, 2023

Supplied Windows build v9.0.0 behaves differently than custom build khaledhosny/ots#254

Closed

Snuffleupagus mentioned this issue Feb 10, 2023

inline fonts not shown correctly in pdf.js web-viewer #16039

Closed

ZeroXClem mentioned this issue Aug 12, 2024

[Snyk] Upgrade pdfjs-dist from 2.9.359 to 2.16.105 ZeroXClem/metamesa#3

Closed

earthywh mentioned this issue Sep 24, 2024

[Snyk] Upgrade pdfjs-dist from 2.6.347 to 2.16.105 earthywh/filestash#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Showing squares with hex values instead of text in some PDFs #15289

Showing squares with hex values instead of text in some PDFs #15289

jeremyn commented Aug 8, 2022

jeremyn commented Aug 8, 2022

Snuffleupagus commented Aug 8, 2022 •

edited

Loading

jeremyn commented Aug 8, 2022 •

edited

Loading

calixteman commented Aug 8, 2022 •

edited

Loading

calixteman commented Aug 8, 2022

jeremyn commented Aug 8, 2022

calixteman commented Aug 8, 2022

jeremyn commented Aug 8, 2022

jeremyn commented Aug 19, 2022

marco-c commented Aug 22, 2022

jeremyn commented Aug 22, 2022

marco-c commented Aug 22, 2022

jeremyn commented Aug 22, 2022

marco-c commented Aug 23, 2022

jeremyn commented Aug 23, 2022

marco-c commented Aug 25, 2022

humble-cburnham commented Sep 8, 2022

jfkthame commented Sep 9, 2022

sergei-harbour commented Oct 5, 2022

jeremyn commented Oct 6, 2022 •

edited

Loading

sergei-harbour commented Oct 12, 2022 •

edited

Loading

calixteman commented Jan 7, 2023

jfkthame commented Jan 7, 2023

jfkthame commented Jan 7, 2023 •

edited

Loading

calixteman commented Jan 7, 2023

Showing squares with hex values instead of text in some PDFs #15289

Showing squares with hex values instead of text in some PDFs #15289

Comments

jeremyn commented Aug 8, 2022

jeremyn commented Aug 8, 2022

Snuffleupagus commented Aug 8, 2022 • edited Loading

jeremyn commented Aug 8, 2022 • edited Loading

calixteman commented Aug 8, 2022 • edited Loading

calixteman commented Aug 8, 2022

jeremyn commented Aug 8, 2022

calixteman commented Aug 8, 2022

jeremyn commented Aug 8, 2022

jeremyn commented Aug 19, 2022

marco-c commented Aug 22, 2022

jeremyn commented Aug 22, 2022

marco-c commented Aug 22, 2022

jeremyn commented Aug 22, 2022

marco-c commented Aug 23, 2022

jeremyn commented Aug 23, 2022

marco-c commented Aug 25, 2022

humble-cburnham commented Sep 8, 2022

jfkthame commented Sep 9, 2022

sergei-harbour commented Oct 5, 2022

jeremyn commented Oct 6, 2022 • edited Loading

sergei-harbour commented Oct 12, 2022 • edited Loading

calixteman commented Jan 7, 2023

jfkthame commented Jan 7, 2023

jfkthame commented Jan 7, 2023 • edited Loading

calixteman commented Jan 7, 2023

Snuffleupagus commented Aug 8, 2022 •

edited

Loading

jeremyn commented Aug 8, 2022 •

edited

Loading

calixteman commented Aug 8, 2022 •

edited

Loading

jeremyn commented Oct 6, 2022 •

edited

Loading

sergei-harbour commented Oct 12, 2022 •

edited

Loading

jfkthame commented Jan 7, 2023 •

edited

Loading