-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why name trees for EmbeddedFiles? #502
Comments
I agree that using an arbitrary name for an embedded file in the EmbeddedFiles name tree is "against the spirit" of the name tree concept that should make sure that these "objects in a PDF file can be referred to by name rather than by object reference" (as specified in the first sentence of 7.7.4 Name dictionary). I agree that this group should decide whether there should be a recommendation in the PDF Spec that the strings in the Names tree are meaningful if possible or that the strings for EmbeddedFiles should be identical to the names of the embedded files. (It already says that that should not be the case for unencrypted wrapper documents.) If this group decided that there should not be such a recommendation and you feel that that would be helpful for interoperability this "issue" could be brought to the ZUGFeRD committee (ferd.de). |
The keys in a name tree should be unique strings. As nothing prevents users to include files with names containing e.g. unicode chars and as it is not forbidden to include more than one file with the same name (see the attached PDF which embeds two grüße.txt with different content) I don't quite see how one could enforce a requirement to keep them in sync with the file names. (In LaTeX we came also across the problem, that some implementation assumes that the key |
ISO 32000 never defines which string is to be used as a filename for embedded files - this includes both embedded files listed in the document catalog Names/EmbeddedFiles name tree and those NOT listed in the EmbeddedFile name tree. Every embedded file stream must have F and/or UF entries in their EF dictionary - but there is no requirement that any of these match anything in the document catalog Names/EmbeddedFiles name tree, since embedded files are not mandated to always be listed in the DocCatalog name tree. In the case of PDF portable collections, embedded file matching and folder support are formally defined to use the document catalog Names/EmbeddedFiles name tree but only as a matching byte-string index. To support long-term preservation requirements, PDF/A (ISO 19005) does make certain things more explicit for filename display: "A conforming interactive reader shall provide a mechanism to display the name strings from the value of the EmbeddedFiles key in the names dictionary of a conforming file." - but this should NOT be extrapolated to "general PDF" since PDF/A also imposes many other constraints. And since ZUGFeRD e-invoices build on PDF/A, this is what ZUGFeRD capable viewers MUST do. See also my previous PDF Association article which discusses the inverse situation where an embedded file stream is referenced multiple times and thus may or may not need de-duplication... Thus, assuming that EmbeddedFile name tree strings are somehow always the "correct" filename for display of "general PDF" is an assumption made by some implementations - and implementations DO vary! There are other errata also related to handling embedded files (e.g. #481, #385) and, more recently, related updates to the latest edition of the dated revision of PDF/A-4 (ISO 19005-4:202x). During the Prague PDF Week meeting it was agreed to revise the PDF Association's "PDF 2.0 Application Note for Associated Files" (see here) so informative information and possible recommendations for filename display might be considered as part of that broader work. |
While checking interoperability for ZUGFeRD implementations I stumbled over an implementation where the embedded file specification was registered with a randomized/unspecified name in the name tree but the file specification had the correct file name in its file specification property (F):
Object 12 holds the correct file specification (
/F (factur-x.xml)
).While most implementations use the filename as the key in the name tree, too:
In the ZUGFeRD specification I also cannot find any requirement, that the file has to be registered with the name (e.g. here) "factur-x.xml" in the name tree but only that the file name of the file specification has to be that name.
Until now I thought, we can use the
EmbeddedFiles
name tree to find a file by its name (at the end a name tree is made for searching in it) - but it seems to be optional to keep the names in sync with the file specifications and I am wondering why there is a name tree then at all? For what I would search in this tree?I also cannot find anything about the naming of the keys for
EmbeddedFiles
name tree in the PDF specification:The only thing I can find about naming in this tree is for collections in case of folders:
What are your thoughts?
The text was updated successfully, but these errors were encountered: