improve definition of Literal #162

pchampin · 2025-02-26T16:09:36Z

This PR was motivated by the problem raised here, aiming to fix the definition of "literal term equality".
But ended up in a more involved refactoring of the definition of Literal.

Below is a summary of the changes

reorganizing the content, putting some parts in separate subsections ("Representation of literals", "Literal value")
simplifying some parts (lexical value now references 'RDF string', removed some redundancies)
insisting on the fact that the (upper/lower) case is not part of the language tag in the abstract syntax (so "chat"@fr and "chat"@FR are not just equal, they are really the same literal)

Preview | Diff

hartig · 2025-02-26T16:29:04Z

spec/index.html

    </ul>
-    <p>Comparison is performed using 
+    <p>Comparison of the [=lexical forms=] and of the [=datatype IRIs=] is performed using


For the datatype IRIs, shouldn't this better be covered by IRI equality?

fair point. I reused existing language, which didn't mention IRI equality. This is equivalent, because IRI equality is also based on string comparison, but this would be clearer.

spec/index.html

gkellogg

Minor points.

spec/index.html

Co-authored-by: Olaf Hartig <olaf.hartig@liu.se> Co-authored-by: Gregg Kellogg <gregg@greggkellogg.net>

afs · 2025-02-26T16:55:37Z

spec/index.html

+        In RDF 1.1, `"chat"@fr` and `"chat"@FR` were representing two distinct terms, but implementations had license to replace one with the other (which most did).
+        In RDF 1.2, they are now representing the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.


Reword: RDF 1.1 still exists:

Suggested change

In RDF 1.1, `"chat"@fr` and `"chat"@FR` were representing two distinct terms, but implementations had license to replace one with the other (which most did).

In RDF 1.2, they are now representing the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

In RDF 1.1, `"chat"@fr` and `"chat"@FR` represent two distinct terms, but implementations may replace one with the other (which many did).

In RDF 1.2, they represent the same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

spec/index.html

Co-authored-by: Andy Seaborne <andy@apache.org>

spec/index.html

Co-authored-by: Andy Seaborne <andy@apache.org>

pfps

A language tag is not a string. BCP 47 does not provide a good foundation for RDF language tags.

RDF Concepts could say that a language tag is a lowercase string that meets the requirements of BCP 47 or it could say that a language tag is a sequence of ASCII case-insensitive characters where the string constructed by taking any of members of the equivalence sets in sequence meets the requirements of BCP 47. ASCII case-insensitive characters are then equivalence sets of characters under the equivalence relation that treats two characeters as equivalent if they are both the same when converted to lower case using ASCII case conversion. The former is simpler but the latter provides guidance on how to treat surface syntax language tags.

Saying that language tags are strings and then going on to define an equality over them is like saying that language tags are cats and then going on to say that two language tags are the same if they have the same colour - the right way here is to say either that language tag strings are cat colours or that they are equivalence classes of cats under the same-colour equivalence.

afs · 2025-02-28T11:53:33Z

In what way does RDF Concepts not say that?

A change might be saying that language tags are represented by strings conforming to RFC 5646.

pfps · 2025-02-28T12:20:23Z

In what way does RDF Concepts not say that?

Not say what?

afs · 2025-02-28T13:40:41Z

In what way does RDF Concepts not say that?

Not say what?

What you describe.

What is the concrete proposal (PR, or suggested change to this PR) for changing RDF Concepts?

pfps · 2025-02-28T16:27:50Z

RDF Concepts says this, as far as I can tell:

language tags are strings and then [goes] on to define an equality over them

RDF Concepts does not say either of these, as far as I can tell:

RDF Concepts could say that a language tag is a lowercase string that meets the requirements of BCP 47 or it could say that a language tag is a sequence of ASCII case-insensitive characters where the string constructed by taking any of members of the equivalence sets in sequence meets the requirements of BCP 47. ASCII case-insensitive characters are then equivalence sets of characters under the equivalence relation that treats two characters as equivalent if they are both the same when converted to lower case using ASCII case conversion. The former is simpler but the latter provides guidance on how to treat surface syntax language tags.

pchampin · 2025-02-28T18:09:27Z

RDF Concepts says this, as far as I can tell:

language tags are strings and then [goes] on to define an equality over them

Not exactly. The text in this PR says

a non-empty language tag as defined by [BCP47]. [...] Two [BCP47]-complying strings that differ only by case represent the same language tag.

The goal is to convey the idea that RDF language tags are an abstraction of the string complying with BCP-47, without using such scary language. But ok, maybe it's too handwavy.

I would be happy with changing the definition of language tags to lower-case BCP47-compliant strings (as proposed by @pfps). The 3rd paragraph of 3.4.1, in my opinion, explains clearly enough that concrete syntaxes and implementations are free to use the case they want (as long as they ignore it when comparing language tags).

Note that I'm off for 1 week starting 1h ago, so this will not progress unless another editor takes custody of this PR.

TallTed

Some tweaks for clarity, grammar, and consistency.

TallTed · 2025-03-01T00:16:37Z

spec/index.html

@@ -733,125 +733,140 @@ <h3>Literals</h3>
    <p>Literals are used for values such as strings, numbers, and dates.</p>

    <p>A <dfn data-local-lt="RDF literal">literal</dfn> in an <a>RDF graph</a> consists of
-      two, three, or four elements, as follow:</p>
+      two, three, or four elements, as follow.</p>


This was intentionally a colon. A full-stop puts a bit too much break.

Suggested change

two, three, or four elements, as follow.

two, three, or four elements, as follow:

TallTed · 2025-03-01T00:20:11Z

spec/index.html

+        to a <a>literal value</a>.</li>
+      <li>If and only if the <a>datatype IRI</a> is
+        <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#langString</code> or
+        <code>http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString</code>, a
        non-empty <dfn>language tag</dfn> as defined by [[!BCP47]]. The
        language tag MUST be well-formed according to
        <a data-cite="bcp47#section-2.2.9">section 2.2.9</a>
        of [[!BCP47]],
        and MUST be treated consistently, that is, in a case insensitive manner.


Suggested change

and MUST be treated consistently, that is, in a case insensitive manner.

and MUST be treated consistently in a case insensitive manner.

TallTed · 2025-03-01T00:21:50Z

spec/index.html

+        a <dfn>base direction</dfn> that MUST be either<ul>
+          <li>`ltr`, indicating that the initial text direction is set to left-to-right, or</li>
+          <li>`rtl`, indicating that the initial text direction is set to right-to-left.</li>


Suggested change

a <dfn>base direction</dfn> that MUST be either<ul>

<li>`ltr`, indicating that the initial text direction is set to left-to-right, or</li>

<li>`rtl`, indicating that the initial text direction is set to right-to-left.</li>

a <dfn>base direction</dfn> that MUST be one of the following:<ul>

<li>`ltr`, indicating that the initial text direction is set to left-to-right</li>

<li>`rtl`, indicating that the initial text direction is set to right-to-left</li>

TallTed · 2025-03-01T00:23:09Z

spec/index.html


    <p><dfn data-local-lt="term-equal">Literal term equality</dfn>:
-      Two literals are term-equal (the same <a>RDF literal</a>)
+      two literals are term-equal (the same <a>RDF term</a>)
      if and only if:</p>


Suggested change

if and only if:

if and only if the following are all true:

TallTed · 2025-03-01T00:23:59Z

spec/index.html

+      <li>the two <a>lexical forms</a> compare equal,</li>
+      <li>the two <a>datatype IRIs</a> compare equal,</li>
+      <li>the two <a>language tags</a> are either both absent, or both present and compare equal,</li>
+      <li>the two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>


Suggested change

<li>the two <a>lexical forms</a> compare equal,</li>

<li>the two <a>datatype IRIs</a> compare equal,</li>

<li>the two <a>language tags</a> are either both absent, or both present and compare equal,</li>

<li>the two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>

<li>The two <a>lexical forms</a> compare equal.</li>

<li>The two <a>datatype IRIs</a> compare equal.</li>

<li>The two <a>language tags</a> are either both absent, or both present and compare equal.</li>

<li>The two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>

TallTed · 2025-03-01T00:33:33Z

spec/index.html

+        In RDF 1.1, `"chat"@fr` and `"chat"@FR` theoretically represent two distinct terms, but implementations may replace one with the other via some form of normalization.
+        In RDF 1.2, they represent the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.


Suggested change

In RDF 1.1, `"chat"@fr` and `"chat"@FR` theoretically represent two distinct terms, but implementations may replace one with the other via some form of normalization.

In RDF 1.2, they represent the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

In RDF 1.1, `"chat"@fr` and `"chat"@FR` represent two distinct terms,

but implementations may replace either with the other via some form of normalization.

In RDF 1.2, they represent the exact same literal,

i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

TallTed · 2025-03-01T00:35:06Z

spec/index.html

+        <li>If the literal is a <a>directional language-tagged string</a>, then the literal value is
+          a tuple of its <a>lexical form</a>, its <a>language tag</a>, and its <a>base direction</a>,
+          likewise in that order.</li>
+        <li>If the literal's <a>datatype</a> is handled by an RDF implementation,


Suggested change

<li>If the literal's <a>datatype</a> is handled by an RDF implementation,

<li>If the literal's <a>datatype</a> is handled by an RDF implementation, then one of the following applies:

TallTed · 2025-03-01T00:36:03Z

spec/index.html

+            <li>if the literal's <a>lexical form</a> is in the <a>lexical space</a>
+              of the <a>datatype</a>, then the literal value is the result of applying
+              the <a>lexical-to-value mapping</a> of the datatype to the
+              <a>lexical form</a>.</li>
+            <li>otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be
+               associated with the literal. Such a case produces a semantic
+               inconsistency but is not <em>syntactically</em> ill-formed.
+               Implementations SHOULD accept [=ill-typed=] literals and produce RDF
+               graphs from them. Implementations MAY produce warnings when
+               encountering [=ill-typed=] literals.</li>


Suggested change

<li>if the literal's <a>lexical form</a> is in the <a>lexical space</a>

of the <a>datatype</a>, then the literal value is the result of applying

the <a>lexical-to-value mapping</a> of the datatype to the

<a>lexical form</a>.</li>

<li>otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be

associated with the literal. Such a case produces a semantic

inconsistency but is not syntactically ill-formed.

Implementations SHOULD accept [=ill-typed=] literals and produce RDF

graphs from them. Implementations MAY produce warnings when

encountering [=ill-typed=] literals.</li>

<li>If the literal's <a>lexical form</a> is in the <a>lexical space</a>

of the <a>datatype</a>, then the literal value is the result of applying

the <a>lexical-to-value mapping</a> of the datatype to the

<a>lexical form</a>.</li>

<li>Otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be

associated with the literal. Such a case produces a semantic

inconsistency, but it is not syntactically ill-formed.

Implementations SHOULD accept [=ill-typed=] literals and produce RDF

graphs from them. Implementations MAY produce warnings when

encountering [=ill-typed=] literals.</li>

TallTed · 2025-03-01T00:36:44Z

spec/index.html

+      </ul>
+
+      <p>
+        Thus, two literals can have the same value


Suggested change

Thus, two literals can have the same value

It follows from the above that two literals can have the same value

TallTed · 2025-03-01T00:37:10Z

spec/index.html

+      </pre>
+
+      <p>denote the same <a data-lt="literal value">value</a>, but are not the
+        same literal <a>RDF terms</a> because their


Suggested change

same literal <a>RDF terms</a> because their

same literal <a>RDF term</a> because their

improve definition of Literal

0e00fa6

pchampin requested review from TallTed, hartig, gkellogg and afs February 26, 2025 16:13

pchampin added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Feb 26, 2025

hartig reviewed Feb 26, 2025

View reviewed changes

gkellogg reviewed Feb 26, 2025

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

spec/index.html Outdated Show resolved Hide resolved

hartig mentioned this pull request Feb 27, 2025

Adds RDF term equality definitions #161

Open

Apply suggestions from code review

77cd780

Co-authored-by: Olaf Hartig <olaf.hartig@liu.se> Co-authored-by: Gregg Kellogg <gregg@greggkellogg.net>

afs reviewed Feb 27, 2025

View reviewed changes

pchampin and others added 2 commits February 28, 2025 01:16

Update spec/index.html

74f4ead

Apply suggestions from code review

8b6fbb5

Co-authored-by: Andy Seaborne <andy@apache.org>

pchampin commented Feb 28, 2025

View reviewed changes

spec/index.html Outdated Show resolved Hide resolved

Update spec/index.html

e406376

Co-authored-by: Andy Seaborne <andy@apache.org>

pchampin mentioned this pull request Feb 28, 2025

Do not normalize language tags in D-interpretations w3c/rdf-semantics#96

Open

pfps requested changes Feb 28, 2025

View reviewed changes

TallTed suggested changes Mar 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve definition of Literal #162

improve definition of Literal #162

pchampin commented Feb 26, 2025 •

edited by pr-preview bot

Loading

hartig Feb 26, 2025

pchampin Feb 27, 2025

gkellogg left a comment

afs Feb 26, 2025 •

edited

Loading

pfps left a comment

afs commented Feb 28, 2025 •

edited

Loading

pfps commented Feb 28, 2025

afs commented Feb 28, 2025

pfps commented Feb 28, 2025

pchampin commented Feb 28, 2025

TallTed left a comment

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

TallTed Mar 1, 2025

		In RDF 1.1, `"chat"@fr` and `"chat"@FR` were representing two distinct terms, but implementations had license to replace one with the other (which most did).
		In RDF 1.2, they are now representing the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

	two, three, or four elements, as follow.</p>
	two, three, or four elements, as follow:</p>

	and MUST be treated consistently, that is, in a case insensitive manner.
	and MUST be treated consistently in a case insensitive manner.

	if and only if:</p>
	if and only if the following are all true:</p>

		In RDF 1.1, `"chat"@fr` and `"chat"@FR` theoretically represent two distinct terms, but implementations may replace one with the other via some form of normalization.
		In RDF 1.2, they represent the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

	<li>If the literal's <a>datatype</a> is handled by an RDF implementation,
	<li>If the literal's <a>datatype</a> is handled by an RDF implementation, then one of the following applies:

	Thus, two literals can have the same value
	It follows from the above that two literals can have the same value

	same literal <a>RDF terms</a> because their
	same literal <a>RDF term</a> because their

improve definition of Literal #162

Are you sure you want to change the base?

improve definition of Literal #162

Conversation

pchampin commented Feb 26, 2025 • edited by pr-preview bot Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gkellogg left a comment

Choose a reason for hiding this comment

afs Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

pfps left a comment

Choose a reason for hiding this comment

afs commented Feb 28, 2025 • edited Loading

pfps commented Feb 28, 2025

afs commented Feb 28, 2025

pfps commented Feb 28, 2025

pchampin commented Feb 28, 2025

TallTed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pchampin commented Feb 26, 2025 •

edited by pr-preview bot

Loading

afs Feb 26, 2025 •

edited

Loading

afs commented Feb 28, 2025 •

edited

Loading