9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

faceless2 · 2024-10-31T11:39:13Z

Describe the bug
Table 120 has FontFamily:

(Optional; PDF 1.5) A byte string specifying the preferred font family name. EXAMPLE 1 For the font Times Bold Italic, the FontFamily is Times

An optional field and informative only, but it should certainly be "text string" not "byte string" - it's clearly human-readable and it could certainly be non-ASCII too.

The text was updated successfully, but these errors were encountered:

mkl-public · 2024-10-31T13:20:28Z

Hhmmm, in the PDF Reference versions 1.5 and 1.6 that entry was a mere "string". In version 1.7 it became a "byte string".

I would assume there was a reason for that change. Possibly the name here was meant to be identical at byte-level to the corresponding entry in the 'name' table of the font.

PS: Ah, I just realized that the term "byte string" has only been introduced in Reference 1.7. Thus, someone had to read the reference and decide for each string which exact type of string it was. So it is possible that there is no such well-thought-of reason for making the font family a byte string as I assumed above... ;)

petervwyatt · 2024-11-02T10:36:07Z

This specific change was made somewhere between ISO 32000-1:2008 and the 2017 first edition of PDF 2.0.

The introduction of "byte strings" was done by Adobe (not ISO) in their version of the PDF 1.7 reference, prior to submission to ISO. See Table 3.32 in their edition:

I have vague memories of discussing this many years ago but will need to research all the comments submitted against ISO 32K over about 9 years. I'd be guessing that its because there is no intended or specific encoding of the data defined and FontFamily is not defined to be displayed anywhere so any sequence of bytes is valid. Of course, a lot of water has also gone under the bridge since then...

petervwyatt · 2024-11-02T10:36:50Z

@lrosenthol - do you have any PDF archaeological records as to this change?

faceless2 · 2024-11-03T09:18:10Z

Some more context. This information comes from the name table of an OpenType font, where it is normally a UTF-16BE String (https://learn.microsoft.com/en-us/typography/opentype/spec/name) however there are some legacy exceptions to that (see the last Note on that page).

We're obviously not embedding it as raw array of UTF-16BE bytes in the PDF.

When creating a PDF and wanting to set this field, a PDF creator is going to receive it from whichever Font API they're using as their programming languages version of a "text string", because that's how the APIs present it - see eg

When consuming a PDF this field is optional, but if it were used, the most likely context would be trying to find a match for an unmebedded font in the PDF with one installed on the OS. And again, this involves interacting with a Font API, which will expect the font family as a string.

petervwyatt · 2024-11-03T17:42:59Z

But a key point I take away from the UTF-16BE definition at https://learn.microsoft.com/en-us/typography/opentype/spec/name, is that they support full BCP-47 language codes whereas PDF only supports the 2-char codes e.g. it even quotes "zh-Hant” in an example which is illegal in a PDF Unicode string - hence the need for something more flexible. I also assume that the UTF-16BE BoM is not present in the OpenType strings so again if its to byte match the encoding is not PDF UTF-16BE compatible...

faceless2 · 2024-11-14T10:06:08Z

The language code is a bit of a red-herring. Yes, OpenType have very different language codes to PDF, they have very different language codes to BCP-47 too - see https://learn.microsoft.com/en-us/typography/opentype/spec/languagetags

But they're not really applicable here. If this field is used for anything, it's used for matching a font on the OS, and there's no expectation that is done in a language-dependent way. If a font designer creates a font called "Foo" and decides its translation in French is actually "Arial", it shouldn't be chosen over normal "Arial" if the document (or OS) happens to be set to French. CSS, for example, doesn't do this, and font matching is done in CSS many, many orders of magnitude more often than it will ever be done in PDF.

Luckily, Font Family names aren't generally localised like this: "Times New Roman" is the same in French. The only time this is really going to come up is with non-latin alphabets, and that's precisely why "byte string" is inappropriate.

/FontFamily <7d30660e9ad4>

If FontFamily is a byte string, what do I do with that? Turn it into an ISO8859-1 string and try and match it to a font on the OS? As a byte string, this has no value.

/FontFamily <feff7d30660e9ad4>

If FontFamily is a text string, I know exactly what to do with this - that's 細明體, the Chinese name for MingLiU. I can pass that to whatever API I use to get my fonts to find it.

faceless2 added the bug Something isn't correct label Oct 31, 2024

faceless2 changed the title ~~FontFamily should be a text-string not byte-string~~ 9.8.1 Table 120: FontFamily should be a text-string not byte-string Oct 31, 2024

petervwyatt added this to the Font and text related milestone Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

faceless2 commented Oct 31, 2024

mkl-public commented Oct 31, 2024 •

edited

Loading

petervwyatt commented Nov 2, 2024

petervwyatt commented Nov 2, 2024

faceless2 commented Nov 3, 2024

petervwyatt commented Nov 3, 2024

faceless2 commented Nov 14, 2024

9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

9.8.1 Table 120: FontFamily should be a text-string not byte-string #486

Comments

faceless2 commented Oct 31, 2024

mkl-public commented Oct 31, 2024 • edited Loading

petervwyatt commented Nov 2, 2024

petervwyatt commented Nov 2, 2024

faceless2 commented Nov 3, 2024

petervwyatt commented Nov 3, 2024

faceless2 commented Nov 14, 2024

mkl-public commented Oct 31, 2024 •

edited

Loading