Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to pdf doesn't include pdfs #5943

Open
Mr-Kanister opened this issue Jan 2, 2022 · 33 comments
Open

Export to pdf doesn't include pdfs #5943

Mr-Kanister opened this issue Jan 2, 2022 · 33 comments
Labels
bug It's a bug desktop All desktop platforms medium Medium priority issues

Comments

@Mr-Kanister
Copy link
Contributor

Environment

Joplin version: 2.6.10
Platform: Windows 10
OS specifics: 21H1

Steps to reproduce

  1. Import a pdf File into a Joplin Note
  2. Export this note as pdf file
  3. Open the pdf file
  4. Click on the linked pdf file

Describe what you expected to happen

The linked pdf file can't be opend. I expected that the linked pdf would open up. If you look at the file size of the exported pdf, the linked pdf is there! Also I read, that the pdf gets created from the html and when I export to html instead, I am able to open the linked pdf.

@Mr-Kanister Mr-Kanister added the bug It's a bug label Jan 2, 2022
@laurent22 laurent22 added desktop All desktop platforms high High priority issues labels Jan 2, 2022
@laurent22 laurent22 added medium Medium priority issues and removed high High priority issues labels Jan 9, 2022
@sidkhuntia
Copy link

sidkhuntia commented Jan 25, 2022

@Mr-Kanister @laurent22 can i work on it ?

@Mr-Kanister
Copy link
Contributor Author

Sorry for the late answer. Yes, you can work on it, I can't commit since I do not know this language yet :)

@nk183
Copy link

nk183 commented Feb 19, 2022

hi @Mr-Kanister, I am not able to reproduce the bug, I am using Okular (pdf viewer)(in ubuntu) though, It is working fine.
which pdf viewer you are using
cc @laurent22

@kshitij86
Copy link

I am able to successfully reproduce this on Ubuntu 20.04. Copying the link I can see that the PDF is there, but the viewer is not able to open it (offline/online). I'll try to dig deeper into it.

@kshitij86
Copy link

@Mr-Kanister It'd be really helpful if you could mention the PDF Viewer you are using.

@kshitij86
Copy link

@Mr-Kanister @laurent22 Looks like an issue with how Electron implements printing to PDF really since contents.printToPDF() is called under the hood and everything is handled there so it depends the pdfData returned in the InterOpServiceHelper. Assuming the returned pdfData is not modified after, this seems really an Electron issue more than a Joplin issue. I'll try to find more about the issue and report that soon, maybe with a possible fix (tweaking the options may help maybe?).

@nk183
Copy link

nk183 commented Feb 19, 2022

hi @kshitij86, can you specifies the pdf viewer you are using.
I really don't think there is any issue with contents.printToPDF() ,I might be wrong, but in different pdf viewer it is behaving differently eg: Okular pdf viewer it is working fine, I guess it depend on PDF Viewer that is is capable to inserting a pdf or not

@Mr-Kanister
Copy link
Contributor Author

Okay, Hi!

I have now tested it again with three pdf viewers and got partly different results: on Windows 10 with Sumatra PDF v3.3.3 I cannot open the pdf by clicking, but I can right-click and copy a link, enter it in Firefox and so read the file. Exactly the same happens in Okular v20.12.3 under Debian Bullseye. Again on Windows 10, I can click the link directly in the PDF Annotator and am redirected to Firefox. However, when I open the parent pdf directly in Firefox, there is no link to click or copy.

This pdf link to copy is virtually the plain pdf file and in my individual case is about three million characters long. If I want to paste this somewhere, my PC first has to calculate for a very long time and Firefox becomes altogether very jerky. In Kate, I can't even highlight any of this string without my PC having to calculate for ten seconds.

I think it's very likely that this is not the fault of Joplin but of Electron, but still this behaviour is at least annoying....How should we proceed, is this bug report out of place here?

Greetings!

@kshitij86
Copy link

@Mr-Kanister While I think @laurent22 might be able to better guide us on this issue, it works in some means it may also be how the PDF viewers handle opening the file.

@adi-uchiha
Copy link

any updates on this issue ?

@MukeshKaswan1
Copy link

MukeshKaswan1 commented Mar 6, 2023

Can i work on this project to find and remove some bugs.

@roman-r-m
Copy link
Collaborator

Can i work on this project to find and remove some bugs.

You can of course but try to understand and replicate the issue first

@7adidaz
Copy link

7adidaz commented Mar 14, 2024

I was able to regenerate this issue, IMO the most reasonable solution is to make the links appear as hyperlinks, but not clickable, and of course, not include the linked media(file, pdf, images, etc) in the pdf package, what do you think?

@laurent22
Copy link
Owner

IMO the most reasonable solution is to make the links appear as hyperlinks, but not clickable

The PDFs are embedded in the document so ideally if you click on the link it would open that embedded PDF. But I don't know if that's even possible. Maybe the task is to investigate first if it can be done at all. If it cannot, then we shouldn't embed the PDFs to begin with, and indeed disable the links.

@7adidaz
Copy link

7adidaz commented Mar 14, 2024

Maybe the task is to investigate first if it can be done at all

I'll be working on investigating this, and when I'm done, I'll do a PR, thanks for your time!

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

Here's what i came up with:

  1. The issue doesn’t originate from Electron’s contents.printToPDF() function. This function simply prints the content of a web page, The real problem arises during the conversion of a note to HTML, which is then used by printToPDF(). This conversion takes a note object and transforms it into HTML, including any attached files as raw data.

  2. Example of Embedded Link in a Note:

    [fileName.txt](:/xxx)

    converted to:

     <a data-from-md="" title="_resources/xxx.txt" href="data:text/plain;base64,/*some data*/" download="xxx.txt">fileName.txt</a>

    The href attribute contains data that, when executed in a new browser tab, opens the original file. This behavior causes issues when exporting to HTML or PDF with large attached files, resulting in very large output files.

  3. One approach to address this is to remove the href attribute if it starts with data:, This won’t affect images since they are rendered using the <img> tag.

7adidaz pushed a commit to 7adidaz/joplin that referenced this issue Mar 15, 2024
@Mr-Kanister
Copy link
Contributor Author

But this approach then completely removes every attachment (excluding images). I expect PDFs to be included instead! This is supported by PDFs (https://community.adobe.com/t5/acrobat-discussions/embedding-pdf-files-documents-inside-a-adobe-acrobat-pdf/m-p/4674928).

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

It doesn't! here is a PDF result of my approach, i included a normal link, a link to a file, a PDF, and an image

https://drive.google.com/file/d/1eZjRWzpFKmWsoACxVM3yP-_RdN7-2PWp/view?usp=sharing

the original note contained this:

[abdalah_elhdad_resume_go.pdf](:/b7f40e4e8ce646ceb2bc1d12fb3d2a88)


[summary of lasttime debugging.txt](:/29bfdbf0640c41f28893e2e9952b9777)

fsdfsdfsdffff

![mermaid-1710357604361.png](:/37d15b6759b34fa8802a63afa6a7cf96)

[normalLink](https://www.youtube.com/) 


large file 
[userguide.pdf](:/613a3e6e370047acbe41875a0da91b24)

@Mr-Kanister
Copy link
Contributor Author

...yes and everything but the image and the "normal link" got removed. As a user, I'd want to have the rest included, too.

There must be a way to do so, as it is supported by the pdf format.

@laurent22
Copy link
Owner

It doesn't

But we already know that it doesn't - that's the point of this issue. Now, what can we do about it? What did you try to make embedded PDFs work?

If Adobe Acrobat can do it, maybe there's a way to format the HTML or setup Electron to make it work. Or maybe not, but from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

@Mr-Kanister
Copy link
Contributor Author

from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

If you mean me, then yes, I haven't tried anything else. If there really is no alternative, then of course it's also a bug fix to remove the feature.

@laurent22
Copy link
Owner

If you mean me, then yes, I haven't tried anything else

I was actually answering 7adidaz since he's interested in working on this issue.

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

If Adobe Acrobat can do it, maybe there's a way to format the HTML or setup Electron to make it work. Or maybe not, but from your comments it sounds like you tried the existing feature, saw that it doesn't work and didn't try much else.

I have researched the capability of doing it, i.e. a single PDF file, with a hyperlink, when clicked, opens another PDF file. but IMO and based on the research I did, it's not possible to do it outside an environment like Adobe Acrobat.

I have seen the attached Adobe guide on this, when the region or the link is clicked inside Adobe Acrobat, it opens the attachment, outside it... it doesn't as I show in the demo.

2024-03-1518-47-38-ezgif com-crop

what do you think? should we go with the safe route and just disable attachment links as I did in the PR?

@Mr-Kanister
Copy link
Contributor Author

Just curious: Does it work in Firefox?

With the PDF Annotator (that's paid software, I'm happy to test things and report them, so you don't have to buy it...) I can add attachments to pdfs. Those aren't clickable links, but attachments like E-Mail attachments. In Firefox those get displayed in the sidebar:
image

In Dolphin a pop up appears:
image

But in SumatraPDF they aren't viewable and Edge isn't displaying them either: https://answers.microsoft.com/en-us/microsoftedge/forum/all/edge-and-pdfs-with-attachements/0d9f4536-6dd7-400c-83f8-1d2066648930

This is the file:
Test.pdf

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

The files outputted currently from Joplin don't show the attached media as processable entities, not in Adobe, Firefox, or Evince, here is an example file: export_w_media.pdf .. try to extract the data attached!

But! the files outputted from Adobe show as attachments in both Firefox and Evince, here is a test file: output_from_adobe.pdf

  • Evince is the default document viewer that comes with Ubuntu

The attached Test.pdf shows the attachment in Firefox, Adobe, and Evince.
image

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

@Mr-Kanister I hate mentions, but I updated my comment.. sorry I misunderstood you! :)

@Mr-Kanister
Copy link
Contributor Author

Mr-Kanister commented Mar 15, 2024

So quick summary of the current situation:

The first allows to position an area which, when clicked, may (depending on the viewer) guide to this attachment, while the second one only displays them in an "attachment-window" without a positional reference. Both are not supported by all viewers (this was expected by my side as not all viewers display comments/annotations, too).

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

I was researching this, and I found a lib, that can be used to attach files to pdf, what are the policy here about using another package?

@roman-r-m
Copy link
Collaborator

I think ideally we should not simply attach files to pdf but make links to those files from within the document work.

Doing so may require rewriting the whole pdf export logic.

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

Can you explain what u mean by links to those files? like the case of Adobe, where when a region is clicked, the linked pdf opens?

@roman-r-m
Copy link
Collaborator

In joplin you create a link in your note and clicking on the link opens the document. From a quick glance at the lib that you linked above, it seems to be attaching a file to pdf without creating a link (most likely there is a way to do that as well - that lib seems pretty good)

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

I get you... yep, this will require more work on extracting the PDFs for sure :)

@7adidaz
Copy link

7adidaz commented Mar 15, 2024

I will be applying to GSOC this year, I was interested in the "PDF annotations" so I'll include this in my research, and if I get accepted and I have time at the end of the season, I'll work on it!

7adidaz pushed a commit to 7adidaz/joplin that referenced this issue Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug desktop All desktop platforms medium Medium priority issues
Projects
None yet
Development

No branches or pull requests

9 participants