Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to convert to PDF/A #3215

Closed
fbotelho opened this issue Nov 4, 2016 · 22 comments
Closed

Option to convert to PDF/A #3215

fbotelho opened this issue Nov 4, 2016 · 22 comments

Comments

@fbotelho
Copy link

fbotelho commented Nov 4, 2016

The PDF/A standard is designed for the archival of information, i.e. it ensures that there are no dependencies in the way the file is created, which would prevent it from remaining viewable, over time, exactly as it was when it was created. It is especially important for libraries, governments, and any entity interested in ensuring that information remains accessible over long periods of time. A Description of the PDF/A standard is available here:
https://en.wikipedia.org/wiki/PDF/A

It would be great to have an optional parameter on Pandoc, which would create PDF/A files without requiring the end-user to understand its technical requirements.

@adunning
Copy link
Contributor

adunning commented Nov 4, 2016

This could be done straightforwardly by adding a variable to the LaTeX and ConTeXt templates. Our method of loading hyperref would also need to change slightly (see p. 12 of the manual at https://www.ctan.org/pkg/pdfx). Note that it only works with XeLaTeX and LuaLaTeX from TeX Live 2016, and that the pdfx package isn't included in a slim installation of TeX Live (i.e. BasicTeX), so it certainly shouldn't be a default.

For ConTeXt, see http://wiki.contextgarden.net/PDF/A.

@jgm
Copy link
Owner

jgm commented Nov 5, 2016

Do you want to do a PR to the template along these lines, as a starting
point?

+++ Andrew Dunning [Nov 04 16 16:01 ]:

This could be done straightforwardly by adding a variable to the LaTeX
and ConTeXt templates. Our method of loading hyperref would also need
to change slightly (see p. 12 of the manual at
[1]https://www.ctan.org/pkg/pdfx). Note that it only works with XeLaTeX
and LuaLaTeX from TeX Live 2016, and that the pdfx package isn't
included in basic installation of TeX Live.

For ConTeXt, see [2]http://wiki.contextgarden.net/PDF/A.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, [3]view it on GitHub, or [4]mute the
thread.

References

  1. https://www.ctan.org/pkg/pdfx
  2. http://wiki.contextgarden.net/PDF/A
  3. Option to convert to PDF/A #3215 (comment)
  4. https://github.com/notifications/unsubscribe-auth/AAAL5Bvuadwd9_Dii_W_Hq9ns0cmfHXqks5q67ksgaJpZM4Kp_1X

@adunning
Copy link
Contributor

adunning commented Jan 4, 2017

What would be the best way to implement this? I see two options:

  1. A variable called pdfformat that directly feeds options to the pdfx package, e.g. a-3b.
  2. Boolean variables such as pdfa, pdfx, etc. that would be configure these broader standards with the most recent available.

The first option would be a simpler implementation in the template, and provide maximum flexibility and forward compatibility; but it would mean a different interface for LaTeX and ConTeXt.

The second option could allow the same variables to apply to both systems, and take the bother of figuring out the exact version of the standard out of the hands of the user; but then if someone wanted something specific, it would necessitate a custom template.

I do wonder how broadly useful this option would actually be, given that tagging isn't supported by the pdfx package.

@jgm
Copy link
Owner

jgm commented Jan 4, 2017 via email

@mb21
Copy link
Collaborator

mb21 commented Jan 4, 2017

Option two might make sense if we had more PDF generators that could output PDF/A, but wkhtmltopdf won't support PDF/A anytime soon...

@adunning
Copy link
Contributor

Has it been determined whether this is desirable? I am wondering whether I should simply close https://github.com/jgm/pandoc-templates/pull/231 or put up a new PR in this repository. The limitations of xpdf expressed there, meaning that it's not really practical with XeLaTeX, are still present.

@jgm
Copy link
Owner

jgm commented Aug 14, 2017

I remember looking into this and finding that the requirements for fully conforming pdf/a were very extensive, and not addressed simply by the template changes -- this is probably why I didn't merge the PR, but I honestly can't recall.

I don't think there's much point in making the template changes unless we can combine these with the other changes that would be needed for conforming pdf/a output.

@adunning
Copy link
Contributor

Yes, I think that makes sense, especially since the changes do make the template rather more complex. Perhaps there will be a better route some day …

@hmenke
Copy link
Contributor

hmenke commented Jan 23, 2018

I could provide an adapted ConTeXt template which generates PDF/A-1b:2005. This requires some setup on the user side though because ICC profiles have to be installed manually due to licensing reasons.
Furthermore it will not ✨ magically 💫 generate PDF/A. All included images have to conform with PDF/A, the font must embeddable, colours must be compatible, etc.

@mb21
Copy link
Collaborator

mb21 commented Jan 23, 2018

As much as I like the idea of PDF/A: I guess there are applications that can automatically post-process PDFs to generate PDF/As (by converting images, removing unsupported features etc.), right? So if the user would have to run one of those to make sure all her images/fonts are compliant anyway, I'm not sure it helps a lot if pandoc gets you halfway there?

On the other hand, if there are simple changes that can be done to the existing ConTeXt template that help PDF/A production, that would certainly be welcome. But I dread the added complexity of a second ConTeXt template that needs to be maintained...

@fbotelho
Copy link
Author

fbotelho commented Jan 23, 2018 via email

@fbotelho
Copy link
Author

fbotelho commented Jan 23, 2018 via email

@mb21
Copy link
Collaborator

mb21 commented Jan 23, 2018

It seems that you'd usually use ghostscript for PDF -> PDF/A.
I guess @KurtPfeifle is our resident PDF expert :)

@fbotelho
Copy link
Author

fbotelho commented Jan 23, 2018 via email

@hmenke
Copy link
Contributor

hmenke commented Jan 23, 2018

@mb21

But I dread the added complexity of a second ConTeXt template that needs to be maintained...

Actually, the settings required for PDF/A creation can be added to the existing template without breaking anything. If ICC color profiles are not found there will simply be a warning in the log file but processing will continue just fine. Also if the resulting PDF is not actually PDF/A nothing unforeseen will happen. Of course the user has to verify that the produced file is PDF/A before distributing it as such, using e.g. the Apache PDFBox preflight app. There is no way around this verification step no matter how the PDF/A was produced.

@mb21
Copy link
Collaborator

mb21 commented Jan 23, 2018

the settings required for PDF/A creation can be added to the existing template without breaking anything

@hmenke that sounds good! I think it would be great if you made a pull request so that @jgm can decide whether to merge the proposed changes... or are there a lot of changes?

@hmenke
Copy link
Contributor

hmenke commented Jan 23, 2018

@mb21 The change is essentially a one-liner (plus adapting the tests). See #4294 for details.

@mb21
Copy link
Collaborator

mb21 commented May 6, 2018

Thanks to @hmenke this was implemented in #4294 by going through context. (If someone knows of an easy way to generate PDF/A using another --pdf-engine, suggestions are welcome.)

Usage (search also the MANUAL for pdfa):

pandoc input.md --pdf-engine=context -V pdfa

@Atrate
Copy link

Atrate commented Jan 17, 2023

Thanks to @hmenke this was implemented in #4294 by going through context. (If someone knows of an easy way to generate PDF/A using another --pdf-engine, suggestions are welcome.)

Usage (search also the MANUAL for pdfa):

pandoc input.md --pdf-engine=context -V pdfa

I have managed to generate a compliant (tested with verapdf) PDF/A-2U (also 2B) file by adding

\usepackage[a-2u,mathxmp]{pdfx}
\usepackage[pdfa]{hyperref}

to my header-includes at the top of the markdown file, without changing the default engine @mb21

@jgm
Copy link
Owner

jgm commented Jan 17, 2023

This is great: maybe we can modify the default template so that using -Vpdfa will be enough with pdflatex too?

Have you tried the same approach with xelatex and lualatex engines?

@tarleb
Copy link
Collaborator

tarleb commented Jan 17, 2023

I've used the below with LuaLaTeX to produce PDF/A-3b.

\usepackage{hyperxmp}
\hypersetup{pdfapart=3,pdfaconformance=B}
\immediate\pdfobj stream attr{/N 3} file{sRGB.icc}
\pdfcatalog{/OutputIntents [<<
/Type /OutputIntent /S /GTS_PDFA1
/DestOutputProfile \the\pdflastobj\space 0 R
/OutputConditionIdentifier (sRGB) /Info (sRGB)
>>]}

Taken from here.

jgm added a commit that referenced this issue Jan 18, 2023
@jgm
Copy link
Owner

jgm commented Jan 18, 2023

I've added these incantations to a FAQ.

liruqi pushed a commit to chinapedia/pandoc that referenced this issue Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants