How are rule identifiers matched to one another? #572

bert-github · 2024-10-17T13:13:49Z

(This is part of the review by the Internationalization WG. Sorry for being late – it's entirely my fault.)

4.1. Rule Identifier
https://www.w3.org/TR/2024/WD-act-rules-format-1.1-20240618/#rule-identifier

This identifier must be unique when the rule is part of a ruleset. The identifier can be any text [...]

To know if an identifier is unique (and to be able to use it in one rule to point to another), you need to know when two identifiers are the same. E.g., are capital letters (ABC) the same as lowercase letters (abc)? If a letter can be encoded in Unicode in two ways (e.g., ‘é’ as single character vs separate ‘e’ + acute accent) are those the same?

‘Character Model for the World Wide Web: String Matching’ explains the issues with comparing two strings of text and has recommendations for choosing an algorithm, including for text strings used as identifiers.

daniel-montalvo · 2024-11-25T15:58:47Z

Hi @bert-github

Sorry, didn't mean to close before.

The Format does require that the identifiers be unique but we never wanted to prescribe how "unique" must be measured. Different rule writers may have different mechanisms to ensure their identifiers are unique.

For example, the CG always picks up identifiers that are lowercase ASCII characters, to prevent the situations you describe above.

daniel-montalvo · 2025-01-13T11:45:08Z

Hi @bert-github
Has the group had a chance to review my comment above? Would this explanation be sufficient to mark this as resolved?

aphillips · 2025-02-06T16:25:42Z

The I18N Working Group discussed this in our call of 2025-02-06. Our feeling is that we're not quite satisfied yet.

We understand that you do not explicitly define a particular identifier regime and thus cannot normatively specify what constitutes equality/identity. However, the problems inherent in identifier matching are easily overlooked. Since you require uniqueness, you should also require that how uniqueness is determined be called out.

We'd suggest something like:

Rulesets (??) MUST document the rules they use for determining if an identifier is unique. For a detailed explanation of the problem, see ‘Character Model for the World Wide Web: String Matching’

Note that, if a namespace (such as lowercase ASCII) is used to avoid the problem, the documentation can be very simple. If "any text" is allowed, then other considerations have to be applied.

bert-github added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Oct 17, 2024

bert-github mentioned this issue Oct 17, 2024

How are rule identifiers matched to one another? w3c/i18n-activity#1910

Open

daniel-montalvo closed this as completed Nov 25, 2024

daniel-montalvo reopened this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are rule identifiers matched to one another? #572

How are rule identifiers matched to one another? #572

bert-github commented Oct 17, 2024

daniel-montalvo commented Nov 25, 2024

daniel-montalvo commented Jan 13, 2025

aphillips commented Feb 6, 2025

How are rule identifiers matched to one another? #572

How are rule identifiers matched to one another? #572

Comments

bert-github commented Oct 17, 2024

daniel-montalvo commented Nov 25, 2024

daniel-montalvo commented Jan 13, 2025

aphillips commented Feb 6, 2025