-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9.8.1 Table 120: FontFamily should be a text-string not byte-string #486
Comments
Hhmmm, in the PDF Reference versions 1.5 and 1.6 that entry was a mere "string". In version 1.7 it became a "byte string". I would assume there was a reason for that change. Possibly the name here was meant to be identical at byte-level to the corresponding entry in the 'name' table of the font. PS: Ah, I just realized that the term "byte string" has only been introduced in Reference 1.7. Thus, someone had to read the reference and decide for each string which exact type of string it was. So it is possible that there is no such well-thought-of reason for making the font family a byte string as I assumed above... ;) |
@lrosenthol - do you have any PDF archaeological records as to this change? |
Some more context. This information comes from the We're obviously not embedding it as raw array of UTF-16BE bytes in the PDF. When creating a PDF and wanting to set this field, a PDF creator is going to receive it from whichever Font API they're using as their programming languages version of a "text string", because that's how the APIs present it - see eg
When consuming a PDF this field is optional, but if it were used, the most likely context would be trying to find a match for an unmebedded font in the PDF with one installed on the OS. And again, this involves interacting with a Font API, which will expect the font family as a string. |
But a key point I take away from the UTF-16BE definition at https://learn.microsoft.com/en-us/typography/opentype/spec/name, is that they support full BCP-47 language codes whereas PDF only supports the 2-char codes e.g. it even quotes "zh-Hant” in an example which is illegal in a PDF Unicode string - hence the need for something more flexible. I also assume that the UTF-16BE BoM is not present in the OpenType strings so again if its to byte match the encoding is not PDF UTF-16BE compatible... |
The language code is a bit of a red-herring. Yes, OpenType have very different language codes to PDF, they have very different language codes to BCP-47 too - see https://learn.microsoft.com/en-us/typography/opentype/spec/languagetags But they're not really applicable here. If this field is used for anything, it's used for matching a font on the OS, and there's no expectation that is done in a language-dependent way. If a font designer creates a font called "Foo" and decides its translation in French is actually "Arial", it shouldn't be chosen over normal "Arial" if the document (or OS) happens to be set to French. CSS, for example, doesn't do this, and font matching is done in CSS many, many orders of magnitude more often than it will ever be done in PDF. Luckily, Font Family names aren't generally localised like this: "Times New Roman" is the same in French. The only time this is really going to come up is with non-latin alphabets, and that's precisely why "byte string" is inappropriate.
If
If |
Describe the bug
Table 120 has
FontFamily
:An optional field and informative only, but it should certainly be "text string" not "byte string" - it's clearly human-readable and it could certainly be non-ASCII too.
The text was updated successfully, but these errors were encountered: