Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

Open
faceless2 opened this issue Oct 31, 2024 · 6 comments
Open
Labels
bug Something isn't correct

Comments

@faceless2
Copy link

Describe the bug
Table 120 has FontFamily:

(Optional; PDF 1.5) A byte string specifying the preferred font family name. EXAMPLE 1 For the font Times Bold Italic, the FontFamily is Times

An optional field and informative only, but it should certainly be "text string" not "byte string" - it's clearly human-readable and it could certainly be non-ASCII too.

@faceless2 faceless2 added the bug Something isn't correct label Oct 31, 2024
@faceless2 faceless2 changed the title FontFamily should be a text-string not byte-string 9.8.1 Table 120: FontFamily should be a text-string not byte-string Oct 31, 2024
@mkl-public
Copy link

mkl-public commented Oct 31, 2024

Hhmmm, in the PDF Reference versions 1.5 and 1.6 that entry was a mere "string". In version 1.7 it became a "byte string".

I would assume there was a reason for that change. Possibly the name here was meant to be identical at byte-level to the corresponding entry in the 'name' table of the font.

PS: Ah, I just realized that the term "byte string" has only been introduced in Reference 1.7. Thus, someone had to read the reference and decide for each string which exact type of string it was. So it is possible that there is no such well-thought-of reason for making the font family a byte string as I assumed above... ;)

@petervwyatt
Copy link
Member

This specific change was made somewhere between ISO 32000-1:2008 and the 2017 first edition of PDF 2.0.

The introduction of "byte strings" was done by Adobe (not ISO) in their version of the PDF 1.7 reference, prior to submission to ISO. See Table 3.32 in their edition:
image

I have vague memories of discussing this many years ago but will need to research all the comments submitted against ISO 32K over about 9 years. I'd be guessing that its because there is no intended or specific encoding of the data defined and FontFamily is not defined to be displayed anywhere so any sequence of bytes is valid. Of course, a lot of water has also gone under the bridge since then...

@petervwyatt petervwyatt added this to the Font and text related milestone Nov 2, 2024
@petervwyatt
Copy link
Member

@lrosenthol - do you have any PDF archaeological records as to this change?

@faceless2
Copy link
Author

Some more context. This information comes from the name table of an OpenType font, where it is normally a UTF-16BE String (https://learn.microsoft.com/en-us/typography/opentype/spec/name) however there are some legacy exceptions to that (see the last Note on that page).

We're obviously not embedding it as raw array of UTF-16BE bytes in the PDF.

When creating a PDF and wanting to set this field, a PDF creator is going to receive it from whichever Font API they're using as their programming languages version of a "text string", because that's how the APIs present it - see eg

When consuming a PDF this field is optional, but if it were used, the most likely context would be trying to find a match for an unmebedded font in the PDF with one installed on the OS. And again, this involves interacting with a Font API, which will expect the font family as a string.

@petervwyatt
Copy link
Member

But a key point I take away from the UTF-16BE definition at https://learn.microsoft.com/en-us/typography/opentype/spec/name, is that they support full BCP-47 language codes whereas PDF only supports the 2-char codes e.g. it even quotes "zh-Hant” in an example which is illegal in a PDF Unicode string - hence the need for something more flexible. I also assume that the UTF-16BE BoM is not present in the OpenType strings so again if its to byte match the encoding is not PDF UTF-16BE compatible...

@faceless2
Copy link
Author

The language code is a bit of a red-herring. Yes, OpenType have very different language codes to PDF, they have very different language codes to BCP-47 too - see https://learn.microsoft.com/en-us/typography/opentype/spec/languagetags

But they're not really applicable here. If this field is used for anything, it's used for matching a font on the OS, and there's no expectation that is done in a language-dependent way. If a font designer creates a font called "Foo" and decides its translation in French is actually "Arial", it shouldn't be chosen over normal "Arial" if the document (or OS) happens to be set to French. CSS, for example, doesn't do this, and font matching is done in CSS many, many orders of magnitude more often than it will ever be done in PDF.

Luckily, Font Family names aren't generally localised like this: "Times New Roman" is the same in French. The only time this is really going to come up is with non-latin alphabets, and that's precisely why "byte string" is inappropriate.

/FontFamily <7d30660e9ad4>

If FontFamily is a byte string, what do I do with that? Turn it into an ISO8859-1 string and try and match it to a font on the OS? As a byte string, this has no value.

/FontFamily <feff7d30660e9ad4>

If FontFamily is a text string, I know exactly what to do with this - that's 細明體, the Chinese name for MingLiU. I can pass that to whatever API I use to get my fonts to find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't correct
Projects
None yet
Development

No branches or pull requests

3 participants