-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving a PDF (Print>Save as PDF) turns it unsearchable #14277
Comments
Is this the same as https://bugzilla.mozilla.org/show_bug.cgi?id=1274502? |
Yes, they seem to report the same problem. But now I question: I'm OK with closing this bug as long as it is submitted on the right place. (I think it has better chances here... But then again... I have bugs ignored for years too...) |
This, at least to me, sounds like a roundabout way of saving a PDF document that's opened with the Firefox PDF Viewer. Why not directly use e.g. the download button (in the viewer), the Cmd/Ctrl+S keyboard shortcut, or the "Save Page As..." entry in the "File" menu (of the browser), rather than going through the printing process? |
You're right, unless when you have a multiple page PDF and you want just one page. One example is invoices: there is software that prints 4 copies of the same invoice in the same document. Now imagine you want to send your client just the original... another example: repair orders. Imagine you get an equipment to repair and the software generates 2 pages of the equipment "ID/profile" - one for you, another for the client. But when we have to send the equipment ID sheet for brand cross-check and have restrictions on the file size, you need to reduce the file to just one sheet. ATM, using Firefox as your PDF reader, the only way to get just the first sheet is thru the print option, but that renders it unsearchable which is a problem! |
Basically what you want is a feature to split a PDF |
No exactly, because Firefox already allows me to save only the pages i need, just not in a searchable format! What i want is to be able to save only the pages i need in a searchable format (because it's a requirement my job imposes)! I really hate to say this but, in M$ edge, for example, i can do this. i don't know how they do that, but it just works... |
Well, technically speaking that's essentially what this would amount to :-) Supporting such a use-case would require adding (more) arbitrary editing of PDF documents (currently we only support saving of form data), which is not really a small/simple thing to implement in general (and was never a goal of the project). |
So, let me clarify it @Snuffleupagus : You are saying that being able to save just the pages the user needs would involve a lot of work? (I was also under the impression that PDF.JS was a lot more powerful and feature rich than google's Pdfium) |
Yes, it'd require creating a new PDF document from the specified pages. Given how PDF documents are structured internally (it's a fairly old format), there's in general no easy way to just "pick" a couple of pages and directly create a new valid PDF document from that. First of all, you'd probably need to remove e.g. font and graphics resources no longer needed in order to reduce the file size of the new PDF document. Secondly, you'd need to create a valid XRef (i.e. cross reference) table such that the new PDF document can be successfully opened in viewers. Please note that, as mentioned above, arbitrary PDF editing has (thus far) never been a goal of this library, since it's a fairly complex topic given e.g. all the weird/corrupt data-structures found in real-world PDF documents. |
OK. This basically means "no fast solution on the horizon any time soon.. (if ever)." I would still like to leave this open, if you agree... (as I believe it is an important feature to businesses) |
@maverick74 we are giving some thought to printing issues, so this might change soon. |
@marco-c That's great news! The other issues were already reported, however. We intend to use Firefox not only as our default browser but also as our only PDF reader |
This is fixed in latest Firefox Nightly. |
@marco-c i found a bug in the implementation! Easy steps to reproduce:
Result: Unrecognized characters If you prefer i can fill a separated bug report |
Thanks, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1778484. Did you see this with other PDFs too? |
It's very likely caused by |
Yes. But in the documents i've tried the result was always the same. |
@maverick74 the issue is fixed in latest Nightly, please let us know if you see other problems. |
I can confirm it's working ok now. Thank you all :) |
It said it was fixed in the latest Firefox Nightly, but when I tested it on Nightly 106.0a1 (2022-08-23), the problem didn't seem to be resolved. https://mozilla.github.io/pdf.js/web/viewer.html I searched for "Trace" text, but it doesn't search and drag. Isn't the latest Nightly mentioned not released? |
@cksgh1224 works for me. I've tested it on the latest 104 (release) and on the latest Nightly 106 and in both cases it worked as it was supposed to. Prior to this, i've always tested in the official Nightly |
@cksgh1224 on what PDF could you still reproduce the problem? |
https://mozilla.github.io/pdf.js/web/viewer.html When I tested with the PDF file here, searching and dragging did not work. in the original article Is this a problem with the PDF here https://mozilla.github.io/pdf.js/web/viewer.html? |
What you're using when you load https://mozilla.github.io/pdf.js/web/viewer.html is the web viewer of pdf.js, not the version included in Firefox itself. The version included in Firefox is using some internal Firefox APIs to be able to print correctly. |
I tested it as you said and it works fine!! https://mozilla.github.io/pdf.js/web/viewer.html The reason it doesn't work here is, is it because the version of pdf.js used for loading is low? |
No, the reason is what I mentioned above: the PDF reader in Firefox itself is using internal Firefox APIs to print in a better way, while the viewer you see on the page is just a normal web page and so unable to use Firefox internal APIs. |
@marco-c Thanks for the kind explanation~!!! |
Saving a PDF (Print>Save as PDF) turns it unsearchable!
Attach (recommended) or Link to PDF file here:
https://africau.edu/images/default/sample.pdf (https://web.archive.org/web/20220531122837/http://www.africau.edu/images/default/sample.pdf)
Configuration:
Steps to reproduce the problem:
What is the expected behavior?
Text should be selectable and searchable
What went wrong?
Text is not selectable or searchable
The text was updated successfully, but these errors were encountered: