Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-fonts] @font-face src: url() format() keywords vs. strings ambiguous in spec #6328

Open
drott opened this issue May 31, 2021 · 30 comments
Assignees

Comments

@drott
Copy link
Collaborator

drott commented May 31, 2021

Context: On the background of our plans to introduce COLR v1 color fonts we have requests from partner for feature detection for font formats.

One obstacle to reliable feature detection is inconsistent support for format() specifiers in engines, but even the spec is ambiguous on the syntax of the format specifier of the @font-face at-rule's src: line:

<url> [ format(<font-format> [supports <font-technology>#]?)]? | local(<font-face-name>)
<font-format> = [<string> | woff | truetype | opentype | woff2 | embedded-opentype | collection | svg]

Here, the font-format production allows a string and keywords.

Why does it allow a string, but parsing and meaning of that string is mostly left unspecified as far as I can see?

Engine behaviour

Engine behaviour is currently also inconsistent:

FF only accepts strings for what's in the format() function, compare https://bugzilla.mozilla.org/show_bug.cgi?id=650372

WebKit accepts strings or keywords and outputs the parsed at-rule in keyword style when looking at the parsed style rules:
using something like document.styleSheets[0].cssRules[0].cssText.

Blink also expects a string here and then consumes the string as a whole, internally compares it to a supported format list, compare code. The CSSOM output is also generated as strings, the supported format values are validated, otherwise the src: line gets dropped.

Inconsistent Examples

In addition, the current spec has examples (example 22, 23) with " quoted strings for format() as well as an example (example 24) that uses keywords.

Suggested Changes

I suggest

  • to freeze in the <font-format> production to be equal to an explicitly listed limited set of fixed strings (perhaps those at the time of when this was changed to keywords?) and explain how these are supposed to be interpreted (i.e. parse them analogously to the non-string parts of the <font-format> production).
  • to specify that for the parsed CSSOM cssText value of the parsed at-rule keyword representations should be returned.
  • to harmonise the examples in the spec to use the keyword syntax, i.e. change example 22 and 23 to use keyword syntax.
@drott
Copy link
Collaborator Author

drott commented Jun 2, 2021

Original issue an irc log tracked in issue #633.

@drott
Copy link
Collaborator Author

drott commented Jun 2, 2021

Also, AFAICS, the spec text does not explicitly explain how <string> should be interpreted as opposed to the keyword arguments.

@svgeesus
Copy link
Contributor

svgeesus commented Jun 2, 2021

Fonts 3 only allowed strings and restricted them to a given list, so the Gecko and Blink behavior is consistent with that. I'm not sure what the benefit is of also allowing keywords.

A format string which is unrecognized should result in that whole comma-delimited section being dropped, right?

src: url(ideal-sans-serif.woff2) format("woff2"),
    url(ideal-sans-serif.zeb) format("zebra"),
  url(basic-sans-serif.ttf) format("opentype");

So a ua that does not support woff2 would skip over the zebra and download the opentype.

(side note, example in the spec is missing a comma)

@svgeesus
Copy link
Contributor

svgeesus commented Jun 2, 2021

The serialization should also be clearly specified

@svgeesus
Copy link
Contributor

svgeesus commented Jun 2, 2021

I suspect the addition of format keywords was introduced for consistency, when we added supports keywords; but for Web compat we also allow strings? While supports has no such constraint.

@svgeesus
Copy link
Contributor

svgeesus commented Jun 2, 2021

(side note, example in the spec is missing a comma)

Fixed

@drott
Copy link
Collaborator Author

drott commented Jun 7, 2021

Since we did have "woff-variations" etc. for a while (between commits 5c914a8 and 4d36841 to be precise), for the serialisation, I suggest the following steps:

  1. If the specified format was a valid keyword, output the keyword/identifier.
  2. If the specified format was of string type and matched one of the format keywords, output the keyword/identifier.
  3. If the specified format was of string type and was one of "woff-variations", "woff2-variations", "opentype-variations", "truetype-variations" output as string.
  4. If supported technologies were listed, output "supports", otherwise serialisation is complete here.
  5. For each supported technology, output as keyword/identifier/function, omitting duplicates.

Examples:

  1. src: url(...) format("woff2-variations" supports variations variations) is serialized as src: url(...) format("woff2-variations" supports variations variations) - duplicate dropped, "woff2-variations" stays string.
  2. src: url(...) format("woff2" supports features) is serialized as src: url(...) format(woff2 supports features) - preference to return keyword for format were possible.

@svgeesus svgeesus self-assigned this Jul 14, 2021
@fantasai
Copy link
Collaborator

fantasai commented Nov 3, 2021

Thoughts:

  • If strings are the most backwards-compatible syntax, then we must serialize as strings, not as keywords.
  • If keywords aren't widely supported yet, let's just stick with strings only. If for some reason we think parsing in keywords is a good idea, we can do that, but they should still serialize out as strings.
  • The spec should be defined so that any IANA-registered font subtype is a valid argument to format(). We shouldn't need to maintain this registry ourselves now that there is a standard one.

@svgeesus
Copy link
Contributor

svgeesus commented Nov 9, 2021

@fantasai wrote

The spec should be defined so that any IANA-registered font subtype is a valid argument to format(). We shouldn't need to maintain this registry ourselves now that there is a standard one.

On the one hand, as editor of RFC 8081 I agree that would be nice and that registry already exists.

On the other hand, 4.4. Subtype Registrations says:

For each subtype, an @font-face format identifier is listed. This is
for use with the @font-face src descriptor, defined by the Cascading
Style Sheets Level 3 (CSS3) Fonts specification
[W3C.CR-css-fonts-3-20131003]. That specification is normative; the
identifiers here are informative.

so we have a bit of a catch-22 situation. But I think we can fix that by having Fonts 4 (which will supersede Fonts 3) normatively say that

a) the IANA font subtypes registry is normative :)
b) the format identifier is identical to the font subtype

@svgeesus
Copy link
Contributor

svgeesus commented Nov 9, 2021

Might want to errata RFC 8081 in that case

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed [css-fonts] @font-face src: url() format() keywords vs. strings ambiguous in spec, and agreed to the following:

  • RESOLVED: format() serializes with strings
  • RESOLVED: format() serializes with strings
  • RESOLVED: font/ MIME type registry manages valid values of format()
The full IRC log of that discussion <fantasai> Topic: [css-fonts] @font-face src: url() format() keywords vs. strings ambiguous in spec
<fantasai> github: https://github.com//issues/6328
<fantasai> chris_: Grammar here is complicated due to allowing strings and keywords
<fantasai> chris_: but most compatible form is string
<fantasai> chris_: so we should support string and serialize as string
<fantasai> astearns: do we want to accept keywords as well?
<fantasai> fantasai: let's split into 2 questions, first is resolving on serializing as strings because they are the most backwards-compatible syntax
<fantasai> astearns: does drott have any concerns?
<fantasai> ...
<fantasai> astearns: most backwards-compat usually winning argument
<fantasai> RESOLVED: format() serializes with strings
<fantasai> jfkthame: What are we intending to do with font-technology(), will that take keywords or strings, and will that be confusing?
<fantasai> chris_: That will take keywords as currently specced. Though that can change.
<fantasai> chris_: There's no back-compat concern with technology()
<fantasai> astearns: Since we have back-compat issue with one, maybe all of them should serialize as strings
<fantasai> jfkthame: I'm not sure what I favor
<fantasai> jfkthame: I recognize the back-compat issue
<fantasai> astearns: let's take that as a separate issue
<fantasai> astearns: and resolve to use strings for format() and then find out whether drott agrees, or whether we can do something more complicated, and if string serialization format sticks can decide on consistency for rest
<fantasai> astearns: any other concerns about serializing format() with strings?
<fantasai> RESOLVED: format() serializes with strings
<fantasai> chris_: There weren't font MIME types at IANA, so we faked it by creating our own names
<fantasai> chris_: now there is a fonts/ registry for MIME types
<fantasai> chris_: fantasai suggested that we just use that registry
<fantasai> chris_: but the RFC says that it is informative, and css-fonts-3 is normative
<fantasai> chris_: I think we'd like to change that so that they are normative, and we are informative, and they can handle registration of new formats so we don't have to
<fantasai> astearns: So we would normatively refer to their spec?
<fantasai> chris_: And then errata the RFC so that it no longer says our spec is normative
<fantasai> astearns: ...
<fantasai> fantasai: Shouldn't have different keywords allowed between string or keyword in format()
<fantasai> RESOLVED: font/ MIME type registry manages valid values of format()
<fantasai> ??: What's involved in getting IETF updated?
<fantasai> chris_: I will contact the chair and ask if this is in scope of errata or not
<fantasai> chris_: Unsure if publish a new RFC or not
<fantasai> chris_: if errata, then errata submission form
<astearns> s/??/PeterConstable/
<fantasai> chris_: if not then will need to spin up a very small wg to make the change
<fantasai> chris_: but anyway I'll deal with it
<fantasai> astearns: Last thing is whether we accept unquoted keywords
<fantasai> chris_: Does any implementation currently accept keywords for format()?
<fantasai> jfkthame: IIRC webkit does, but haven't checked
<fantasai> chris_: And do we want to continue to allow that and allow other browsers to do so, or get them to fix that?
<fantasai> astearns: out of time, so let's leave issue open on this question and ask for feedback

@astearns
Copy link
Member

The remaining question is whether format() should accept keywords in addition to strings, as WebKit and one of the spec examples does. I am seeing suggestions here to only allow strings. @litherum would WebKit be OK with making this change?

@svgeesus
Copy link
Contributor

svgeesus commented Nov 17, 2021

So I had forgotten that the strings in "format" are different than the font/ subtypes.

Name format() type /subtype
OpenType Collection "collection" font collection
Embedded OpenType "embedded-opentype" application vnd.ms-fontobject
OpenType "opentype" font otf
SVG Font "svg" image svg+xml
TrueType "truetype" font ttf
WOFF 1.0 "woff" font woff
WOFF 2.0 "woff2" font woff2
Generic Spline Font (none) font sfnt
TrueDoc Portable Font Resource (none) application font-tdpfr

Yes, I had to look up application/vnd.ms-fontobject

@astearns
Copy link
Member

Right, that was the point of my muddled question on the call about whether we would still need to normatively define our old strings, or whether those would be covered in the other spec

@svgeesus
Copy link
Contributor

svgeesus commented Nov 17, 2021

TDPFR from Bitstream dates to the old Netscape Navigator 4.x and is no longer used. EOT is still used for likes IE6 and older. I could certainly make the case that we can ignore vendor-prefix types. Re-registering EOT would be possible but a bit of a waste of time.

The main incompatibility is "opentype" vs. otf and "truetype" vs. ttf, and "svg" vs. "svg+xml" which is also in the image tree not the font tree because it is an image format

@svgeesus
Copy link
Contributor

whether we would still need to normatively define our old strings

Yes because there is a huge Web-compat need to define those.

@svgeesus
Copy link
Contributor

svgeesus commented Nov 17, 2021

From HTTP Archive Web Almanac 2020 usage of font MIME types

font-mime-types

text/plain and application/octet-stream together mean "server not set up correctly".

@svgeesus
Copy link
Contributor

Oh and from 2019:

font-mime

@drott
Copy link
Collaborator Author

drott commented Nov 18, 2021

The remaining question is whether format() should accept keywords in addition to strings, as WebKit and one of the spec examples does. I am seeing suggestions here to only allow strings. @litherum would WebKit be OK with making this change?

Yes I am interested in clarifying that, too.

With

RESOLVED: font/ MIME type registry manages valid values of format()

  1. Is the intention that the subtype string in quotes is the valid value for format(), or the full string, e.g. "font/woff2"?

  2. Is the following the right understanding? With @svgeesus ' table above, it looks like we can't immediately spec this resolution this with these inconsistencies between the current strings/keywords in the spec and the registry, at least not without specifying additional ones outside of the registry for backwards compat?

@svgeesus
Copy link
Contributor

svgeesus commented Nov 19, 2021

The inconsistencies certainly give me pause in implementing this resolution. Specifically we would at minimum need to add ttf and otf as aliases, which has no author benefit.

I'm also unsure what to do with legacy formats (the application/vnd.ms-fontobject registration explicitly says "specification: none" which is not strictly true but we certainly don't want to encourage use of EOT. TrueDoc Portable Font Resource is also super-legacy (and is in the standards tree not the vendor tree, I think some MPEG system required it in the distant past). And SVG fonts (actual SVG, not SVG-in-OpenType) also don't fit the pattern. In general I think, based on zero or near-zero usage, it is fine for Fonts 4 to be silent on those formats.

Adding an Internet Media Type column to the table of font formats in Fonts 4 would at least help with alignment and might be useful.

But probably we should revisit that resolution. Apologies for forgetting about the inconsistencies.

@astearns
Copy link
Member

I suppose we could change RFC 8081 to say that the subtypes defined there are normative (and optionally that the identifiers are also normative) and then change CSS Fonts to refer to a subset of those subtypes with some of our own identifiers as full substitutes (not aliases)

@svgeesus
Copy link
Contributor

Not sure of the difference between full substitutes and aliases

@astearns
Copy link
Member

If we say that we support some /subtypes as specified in RFC 8081 as format() string parameters, there would be two exceptions where we substitute "opentype" for otf and "truetype" for ttf. There is no aliasing (we do not support "otf" or "ttf" as strings), just a legacy substitution on our part.

@svgeesus
Copy link
Contributor

Aha, I see

@svgeesus
Copy link
Contributor

svgeesus commented Dec 3, 2021

Ping @litherum

The remaining question is whether format() should accept keywords in addition to strings, as WebKit and one of the spec examples does. I am seeing suggestions here to only allow strings. @litherum would WebKit be OK with making this change?

@litherum
Copy link
Contributor

litherum commented Dec 3, 2021

I don't see any reason to forbid the idents, and they seem very natural from an authoring point-of-view...

"format(opentype)" seems natural...

@svgeesus
Copy link
Contributor

svgeesus commented Mar 21, 2024

@drott said:

I suggest

  • to freeze in the <font-format> production to be equal to an explicitly listed limited set
    of fixed strings (perhaps those at the time of when this was changed to keywords?) and explain
    how these are supposed to be interpreted (i.e. parse them analogously to the non-string
    parts of the <font-format> production).

  • to specify that for the parsed CSSOM cssText value of the parsed at-rule keyword
    representations should be returned.

  • to harmonise the examples in the spec to use the keyword syntax, i.e. change
    example 22 and 23 to use keyword syntax.

@astearns said:

The remaining question is whether format() should accept keywords in addition to strings, as WebKit and one of the spec examples does. I am seeing suggestions here to only allow strings. @litherum would WebKit be OK with making this change?

to which @litherum responded

I don't see any reason to forbid the idents, and they seem very natural from an authoring point-of-view...

"format(opentype)" seems natural...

So it seems that the current proposal is to allow setting with either a string or a keyword?

And then @fantasai said

  • If strings are the most backwards-compatible syntax, then we must serialize as strings, not as keywords.

Which is also what we resolved on

And this implies we need to mint new strings for any formats that don't already have them.

Before I go making edits, does that all seem correct?

@drott
Copy link
Collaborator Author

drott commented Apr 11, 2024

I think the summary is correct, but how do we bring the following in agreement?

@drott wrote:

I suggest ... to specify that for the parsed CSSOM cssText value of the parsed at-rule keyword
representations should be returned.

@fantasai wrote:

If strings are the most backwards-compatible syntax, then we must serialize as strings, not as keywords.

Can we have a mixed serialization type? Such as every format that has a keyword equivalent is serialized as a keyword, and the rest as strings? I think it's beneficial to allow a round-trip of keywords to keywords - and I would hope at some point to phase out the strings here. I think serialization to strings is in the way of that.

But if we can't have a mixed serialization as keyword or string, I am okay with always serializing to strings, even if that downgrades a more specific type of keyword, to a more generic type, string.

@svgeesus
Copy link
Contributor

svgeesus commented Jan 9, 2025

@fantasai I think we are blocked on an answer to the question from @drott here. In other words to we add a table to say which format serializes as keyword and which as string? Or do we mint new strings for formats that don't have one (which would be odd if the main reaso for the design choice to use strings is backwards compat)?

@svgeesus
Copy link
Contributor

svgeesus commented Jan 9, 2025

Related (same "do we use all strings for font formats" question):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants