Saving a PDF (Print>Save as PDF) turns it unsearchable #14277

maverick74 · 2021-11-15T17:05:59Z

Saving a PDF (Print>Save as PDF) turns it unsearchable!

Attach (recommended) or Link to PDF file here:
https://africau.edu/images/default/sample.pdf (https://web.archive.org/web/20220531122837/http://www.africau.edu/images/default/sample.pdf)

Configuration:

Web browser and its version: 94.0
Operating system and its version: Neon Linux

Steps to reproduce the problem:

Go to https://africau.edu/images/default/sample.pdf
Click the print button
Select "save as PDF"
save the file
reopen it in firefox
try to select the text or search for any text

What is the expected behavior?
Text should be selectable and searchable

What went wrong?
Text is not selectable or searchable

marco-c · 2021-12-07T22:36:15Z

Is this the same as https://bugzilla.mozilla.org/show_bug.cgi?id=1274502?

maverick74 · 2021-12-08T00:06:05Z

Is this the same as https://bugzilla.mozilla.org/show_bug.cgi?id=1274502?

Yes, they seem to report the same problem.

But now I question:
is this a Firefox problem (and as such should be reported on the - 6 years old 8( ?!?! - link you shared) or is this a PDF.JS problem (and should be reported here)?

I'm OK with closing this bug as long as it is submitted on the right place.

(I think it has better chances here... But then again... I have bugs ignored for years too...)

Snuffleupagus · 2021-12-08T09:45:15Z

Go to https://africau.edu/images/default/sample.pdf

Click the print button

Select "save as PDF"

save the file

This, at least to me, sounds like a roundabout way of saving a PDF document that's opened with the Firefox PDF Viewer.

Why not directly use e.g. the download button (in the viewer), the Cmd/Ctrl+S keyboard shortcut, or the "Save Page As..." entry in the "File" menu (of the browser), rather than going through the printing process?
By invoking the download directly you'd get the original PDF document, and it'd be faster too.

maverick74 · 2021-12-08T12:25:52Z

Why not directly use e.g. the download button (in the viewer), the Cmd/Ctrl+S keyboard shortcut, or the "Save Page As..." entry in the "File" menu (of the browser), rather than going through the printing process? By invoking the download directly you'd get the original PDF document, and it'd be faster too.

You're right, unless when you have a multiple page PDF and you want just one page.

One example is invoices: there is software that prints 4 copies of the same invoice in the same document. Now imagine you want to send your client just the original...

another example: repair orders. Imagine you get an equipment to repair and the software generates 2 pages of the equipment "ID/profile" - one for you, another for the client. But when we have to send the equipment ID sheet for brand cross-check and have restrictions on the file size, you need to reduce the file to just one sheet.

ATM, using Firefox as your PDF reader, the only way to get just the first sheet is thru the print option, but that renders it unsearchable which is a problem!

marco-c · 2021-12-09T10:21:42Z

Basically what you want is a feature to split a PDF

maverick74 · 2021-12-09T12:24:11Z

Basically what you want is a feature to split a PDF

No exactly, because Firefox already allows me to save only the pages i need, just not in a searchable format!

What i want is to be able to save only the pages i need in a searchable format (because it's a requirement my job imposes)!

I really hate to say this but, in M$ edge, for example, i can do this.

i don't know how they do that, but it just works...

Snuffleupagus · 2021-12-09T12:35:37Z

Basically what you want is a feature to split a PDF

No exactly! What i want is to be able to save just the pages i need in a searchable format (because it's a requirement my job imposes)!

Well, technically speaking that's essentially what this would amount to :-)

Supporting such a use-case would require adding (more) arbitrary editing of PDF documents (currently we only support saving of form data), which is not really a small/simple thing to implement in general (and was never a goal of the project).

maverick74 · 2021-12-09T12:41:46Z

So, let me clarify it @Snuffleupagus :

You are saying that being able to save just the pages the user needs would involve a lot of work?
I thought that, since you already have the original file that is searchable, this would be a strait-forward detail to implement...

(I was also under the impression that PDF.JS was a lot more powerful and feature rich than google's Pdfium)

Snuffleupagus · 2021-12-09T13:54:33Z

You are saying that being able to save just the pages the user needs would involve a lot of work?

Yes, it'd require creating a new PDF document from the specified pages.

Given how PDF documents are structured internally (it's a fairly old format), there's in general no easy way to just "pick" a couple of pages and directly create a new valid PDF document from that. First of all, you'd probably need to remove e.g. font and graphics resources no longer needed in order to reduce the file size of the new PDF document. Secondly, you'd need to create a valid XRef (i.e. cross reference) table such that the new PDF document can be successfully opened in viewers.

Please note that, as mentioned above, arbitrary PDF editing has (thus far) never been a goal of this library, since it's a fairly complex topic given e.g. all the weird/corrupt data-structures found in real-world PDF documents.

maverick74 · 2021-12-09T14:17:29Z

OK.

This basically means "no fast solution on the horizon any time soon.. (if ever)."

I would still like to leave this open, if you agree... (as I believe it is an important feature to businesses)

marco-c · 2021-12-09T14:27:14Z

@maverick74 we are giving some thought to printing issues, so this might change soon.

maverick74 · 2021-12-09T14:43:46Z

@marco-c That's great news!
We're having a couple of issues with printing-related problems.

The other issues were already reported, however.

We intend to use Firefox not only as our default browser but also as our only PDF reader

marco-c · 2022-07-06T07:58:08Z

This is fixed in latest Firefox Nightly.

marco-c · 2022-07-06T07:59:58Z

Thanks to https://hg.mozilla.org/mozilla-central/rev/7d9376649d6d (https://bugzilla.mozilla.org/show_bug.cgi?id=1777209).

maverick74 · 2022-07-07T12:25:21Z

@marco-c i found a bug in the implementation!

Easy steps to reproduce:

Open: PDF example
Get to Print Dialog (CTRL+P) and Save as PDF
Open Saved File
Select and copy Text (Dummy PDF file) from the saved pdf file
Paste it somewhere (notepad, kate, whatever)

Result: Unrecognized characters
Expected: "Dummy PDF file" text

If you prefer i can fill a separated bug report

marco-c · 2022-07-07T12:36:40Z

Thanks, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1778484.

Did you see this with other PDFs too?

calixteman · 2022-07-07T12:44:17Z

It's very likely caused by
#9340

maverick74 · 2022-07-07T14:35:55Z

Did you see this with other PDFs too?

Yes.
I originally noted that on an "internal" receipt PDF.
Because the receipt uses a "weird" font i went on to try other more normal pdf's to be sure it wasn't a document-specific problem.

But in the documents i've tried the result was always the same.

marco-c · 2022-07-14T13:17:30Z

@maverick74 the issue is fixed in latest Nightly, please let us know if you see other problems.

maverick74 · 2022-07-15T10:22:56Z

I can confirm it's working ok now.
As soon as i have bit of free time I'll do some extra tests with more complex PDF.
If i find anything worth mentioning I'll post it back here.

Thank you all :)

cksgh1224 · 2022-08-24T01:21:10Z

It said it was fixed in the latest Firefox Nightly, but when I tested it on Nightly 106.0a1 (2022-08-23), the problem didn't seem to be resolved.

https://mozilla.github.io/pdf.js/web/viewer.html
From here, open the saved PDF file again after 'Print-Save as PDF'

I searched for "Trace" text, but it doesn't search and drag.

Isn't the latest Nightly mentioned not released?

maverick74 · 2022-08-24T09:20:36Z

@cksgh1224 works for me.

I've tested it on the latest 104 (release) and on the latest Nightly 106 and in both cases it worked as it was supposed to.

Prior to this, i've always tested in the official Nightly

marco-c · 2022-08-25T23:03:47Z

@cksgh1224 on what PDF could you still reproduce the problem?

cksgh1224 · 2022-08-26T07:09:47Z

@marco-c

https://mozilla.github.io/pdf.js/web/viewer.html

When I tested with the PDF file here, searching and dragging did not work.

in the original article
https://africau.edu/images/default/sample.pdf When I tested using this PDF and the sample PDF I have, it can be searched and dragged...

Is this a problem with the PDF here https://mozilla.github.io/pdf.js/web/viewer.html?

marco-c · 2022-08-26T08:50:37Z

What you're using when you load https://mozilla.github.io/pdf.js/web/viewer.html is the web viewer of pdf.js, not the version included in Firefox itself. The version included in Firefox is using some internal Firefox APIs to be able to print correctly.
You can test by loading https://raw.githubusercontent.com/mozilla/pdf.js/master/test/pdfs/tracemonkey.pdf directly in Firefox and printing it to PDF.

cksgh1224 · 2022-08-29T03:59:03Z

I tested it as you said and it works fine!!

https://mozilla.github.io/pdf.js/web/viewer.html The reason it doesn't work here is, is it because the version of pdf.js used for loading is low?

marco-c · 2022-08-29T09:42:29Z

No, the reason is what I mentioned above: the PDF reader in Firefox itself is using internal Firefox APIs to print in a better way, while the viewer you see on the page is just a normal web page and so unable to use Firefox internal APIs.

cksgh1224 · 2022-09-01T00:27:47Z

@marco-c Thanks for the kind explanation~!!!

timvandermeij added the printing label Nov 17, 2021

marco-c closed this as completed Jul 6, 2022

marco-c added this to PDF.js quality Mar 26, 2024

marco-c moved this to Closed in PDF.js quality Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving a PDF (Print>Save as PDF) turns it unsearchable #14277

Saving a PDF (Print>Save as PDF) turns it unsearchable #14277

maverick74 commented Nov 15, 2021 •

edited by marco-c

Loading

marco-c commented Dec 7, 2021

maverick74 commented Dec 8, 2021 •

edited

Loading

Snuffleupagus commented Dec 8, 2021

maverick74 commented Dec 8, 2021 •

edited

Loading

marco-c commented Dec 9, 2021

maverick74 commented Dec 9, 2021 •

edited

Loading

Snuffleupagus commented Dec 9, 2021

maverick74 commented Dec 9, 2021 •

edited

Loading

Snuffleupagus commented Dec 9, 2021 •

edited

Loading

maverick74 commented Dec 9, 2021 •

edited

Loading

marco-c commented Dec 9, 2021

maverick74 commented Dec 9, 2021

marco-c commented Jul 6, 2022

marco-c commented Jul 6, 2022

maverick74 commented Jul 7, 2022 •

edited

Loading

marco-c commented Jul 7, 2022

calixteman commented Jul 7, 2022

maverick74 commented Jul 7, 2022

marco-c commented Jul 14, 2022

maverick74 commented Jul 15, 2022

cksgh1224 commented Aug 24, 2022

maverick74 commented Aug 24, 2022

marco-c commented Aug 25, 2022

cksgh1224 commented Aug 26, 2022

marco-c commented Aug 26, 2022

cksgh1224 commented Aug 29, 2022

marco-c commented Aug 29, 2022

cksgh1224 commented Sep 1, 2022

Saving a PDF (Print>Save as PDF) turns it unsearchable #14277

Saving a PDF (Print>Save as PDF) turns it unsearchable #14277

Comments

maverick74 commented Nov 15, 2021 • edited by marco-c Loading

marco-c commented Dec 7, 2021

maverick74 commented Dec 8, 2021 • edited Loading

Snuffleupagus commented Dec 8, 2021

maverick74 commented Dec 8, 2021 • edited Loading

marco-c commented Dec 9, 2021

maverick74 commented Dec 9, 2021 • edited Loading

Snuffleupagus commented Dec 9, 2021

maverick74 commented Dec 9, 2021 • edited Loading

Snuffleupagus commented Dec 9, 2021 • edited Loading

maverick74 commented Dec 9, 2021 • edited Loading

marco-c commented Dec 9, 2021

maverick74 commented Dec 9, 2021

marco-c commented Jul 6, 2022

marco-c commented Jul 6, 2022

maverick74 commented Jul 7, 2022 • edited Loading

marco-c commented Jul 7, 2022

calixteman commented Jul 7, 2022

maverick74 commented Jul 7, 2022

marco-c commented Jul 14, 2022

maverick74 commented Jul 15, 2022

cksgh1224 commented Aug 24, 2022

maverick74 commented Aug 24, 2022

marco-c commented Aug 25, 2022

cksgh1224 commented Aug 26, 2022

marco-c commented Aug 26, 2022

cksgh1224 commented Aug 29, 2022

marco-c commented Aug 29, 2022

cksgh1224 commented Sep 1, 2022

maverick74 commented Nov 15, 2021 •

edited by marco-c

Loading

maverick74 commented Dec 8, 2021 •

edited

Loading

maverick74 commented Dec 8, 2021 •

edited

Loading

maverick74 commented Dec 9, 2021 •

edited

Loading

maverick74 commented Dec 9, 2021 •

edited

Loading

Snuffleupagus commented Dec 9, 2021 •

edited

Loading

maverick74 commented Dec 9, 2021 •

edited

Loading

maverick74 commented Jul 7, 2022 •

edited

Loading