PDF/A-4 (ISO 19005-4): handling of embedded, associated files which are not PDF themselves #385
Labels
documentation
Improvements or additions to documentation
Parked
Parked (eg. passed to another TWG, next ISO spec)
PDF/A-3
ISO 19005-3:2012
PDF/A-4
PDF/A-4 (ISO 19005-4:202x)
We are producing tagged 2.0-PDFs which attach mathml and tex files as associated files (AF) to Formula structure elements. Trying to validate these files also against PDF/A-4 we got failures where we are unsure about the right handling according the spec.
In our files we have AF with the registered media type
application/mathml+xml
and the unregistered (but wildly used see e.g. wikipedia) media typeapplication/x-tex
. Both types are plain text files.A part of the AF are currently listed in the EmbeddedFiles name tree but we can (and also want) produce files where none of the AF are listed.
An example document is mathml-AF-ex1
Remark: the following quotes from ISO 19005-4 are from a draft and should be verified against the official version.
PDF/A-4 requirements
Question 1
6.9 Embedded files writes
What does as part of a file specification dictionary mean? All files whose stream is referenced from a /Filespec dictionary? Or only files listed in the EmbeddedFiles name tree?
What does shall conform with mean for plain text files like our mathml and tex files? Can they conform to these standards? And if yes how can one tell a validator that they do? Currently when validating against A-4 verapdf complains that none of our AF conforms to one of these standards, regardless if they are in the EmbeddedFiles name tree or not, and also regardless if they use a registered media type or not.
Question 2
6.9 Embedded files continues with
Table 43 — Entries in a file specification dictionary in ISO 32000-2:2020 does not list a Subtype in the file specification dictionary. The Subtype key is instead listed in Table 44 — Additional entries in an embedded file stream dictionary. This looks like an error in the spec.
Question 3
This relates to question 1: Does this apply to every embedded file, even to the ones not listed in the EmbeddedFiles name tree?
PDF/A-4f
Due to the failure we tried to validate against A-4f and the document passed. But it is not clear if this actually the correct way to handle them. The spec says here
EmbeddedFiles
entry?The exception of any type is rather vage. Does that refers only to the requirement regarding a registered media type mentioned in question 2 above or does that also lift the requirement that the files shall conform with ISO 19005-1, ISO 19005-2 or this international standard?
What does that means for AF files meant for accessibility support like our mathml files? Would a reader have to ask user before passing such a mathml to AT software?
The text was updated successfully, but these errors were encountered: