UTF-8 and declaration of the encoding #520

heidivanparys · 2024-11-26T15:57:50Z

I recently came across W3C's Encoding Standard. In 4.2. Names and labels, it specifies:

Authors must use the UTF-8 encoding and must use its (ASCII case-insensitive) "utf-8" label to identify it.

New protocols and formats, as well as existing formats deployed in new contexts, must use the UTF-8 encoding exclusively. If these protocols and formats need to expose the encoding’s name or label, they must expose it as "utf-8".

That subclause is referenced from e.g. the HTML specification, see 4.2.5.4 Specifying the document's character encoding:

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8. [ENCODING]

So the requirement from the Encoding Standard actually overrules the recommendation from the XML standard, 4.3.3 Character Encoding in Entities, which specifies that:

In an encoding declaration, the values " UTF-8 ", " UTF-16 ", " ISO-10646-UCS-2 ", and " ISO-10646-UCS-4 " SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, [...]

How does this impact TC 211's standards and resources? I guess it mainly would impact the XMG resources (encoding declaration has to be <?xml version="1.0" encoding="utf-8"?> instead of <?xml version="1.0" encoding="UTF-8"?>). The standards impacted probably mainly originate from OGC.

The text was updated successfully, but these errors were encountered:

PeterParslow · 2024-11-29T08:54:15Z

Heidi,
given that both the W3C sources you cite are explicit that it is a case-insensitive label I see no reason to change from UTF-8 to utf-8 or vice versa.

ReesePlews · 2025-01-14T03:36:33Z

@heidivanparys @PeterParslow could you please attach a label to the issue so it can be filtered for easy identification. thank you.

heidivanparys · 2025-01-14T08:40:12Z

I will just close the issue (being also the one who opened it). I think the key part for TC 211 is the phrase “new protocols and formats”. XML is not new, and we don't “deploy” it “in a new context”, so the requirement does not apply, as far as I can see.

heidivanparys closed this as completed Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UTF-8 and declaration of the encoding #520

UTF-8 and declaration of the encoding #520

heidivanparys commented Nov 26, 2024

PeterParslow commented Nov 29, 2024

ReesePlews commented Jan 14, 2025

heidivanparys commented Jan 14, 2025

UTF-8 and declaration of the encoding #520

UTF-8 and declaration of the encoding #520

Comments

heidivanparys commented Nov 26, 2024

PeterParslow commented Nov 29, 2024

ReesePlews commented Jan 14, 2025

heidivanparys commented Jan 14, 2025