Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve definition of Literal #162

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

improve definition of Literal #162

wants to merge 5 commits into from

Conversation

pchampin
Copy link
Contributor

@pchampin pchampin commented Feb 26, 2025

This PR was motivated by the problem raised here, aiming to fix the definition of "literal term equality".
But ended up in a more involved refactoring of the definition of Literal.

Below is a summary of the changes

  • reorganizing the content, putting some parts in separate subsections ("Representation of literals", "Literal value")
  • simplifying some parts (lexical value now references 'RDF string', removed some redundancies)
  • insisting on the fact that the (upper/lower) case is not part of the language tag in the abstract syntax (so "chat"@fr and "chat"@FR are not just equal, they are really the same literal)

Preview | Diff

@pchampin pchampin added the spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial label Feb 26, 2025
</ul>
<p>Comparison is performed using
<p>Comparison of the [=lexical forms=] and of the [=datatype IRIs=] is performed using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the datatype IRIs, shouldn't this better be covered by IRI equality?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. I reused existing language, which didn't mention IRI equality. This is equivalent, because IRI equality is also based on string comparison, but this would be clearer.

Copy link
Member

@gkellogg gkellogg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor points.

Co-authored-by: Olaf Hartig <olaf.hartig@liu.se>
Co-authored-by: Gregg Kellogg <gregg@greggkellogg.net>
spec/index.html Outdated
Comment on lines 816 to 817
In RDF 1.1, `"chat"@fr` and `"chat"@FR` were representing two distinct terms, but implementations had license to replace one with the other (which most did).
In RDF 1.2, they are now representing the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.
Copy link
Contributor

@afs afs Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword: RDF 1.1 still exists:

Suggested change
In RDF 1.1, `"chat"@fr` and `"chat"@FR` were representing two distinct terms, but implementations had license to replace one with the other (which most did).
In RDF 1.2, they are now representing the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.
In RDF 1.1, `"chat"@fr` and `"chat"@FR` represent two distinct terms, but implementations may replace one with the other (which many did).
In RDF 1.2, they represent the same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

pchampin and others added 2 commits February 28, 2025 01:16
Co-authored-by: Andy Seaborne <andy@apache.org>
Co-authored-by: Andy Seaborne <andy@apache.org>
Copy link
Contributor

@pfps pfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A language tag is not a string. BCP 47 does not provide a good foundation for RDF language tags.

RDF Concepts could say that a language tag is a lowercase string that meets the requirements of BCP 47 or it could say that a language tag is a sequence of ASCII case-insensitive characters where the string constructed by taking any of members of the equivalence sets in sequence meets the requirements of BCP 47. ASCII case-insensitive characters are then equivalence sets of characters under the equivalence relation that treats two characeters as equivalent if they are both the same when converted to lower case using ASCII case conversion. The former is simpler but the latter provides guidance on how to treat surface syntax language tags.

Saying that language tags are strings and then going on to define an equality over them is like saying that language tags are cats and then going on to say that two language tags are the same if they have the same colour - the right way here is to say either that language tag strings are cat colours or that they are equivalence classes of cats under the same-colour equivalence.

@afs
Copy link
Contributor

afs commented Feb 28, 2025

In what way does RDF Concepts not say that?

A change might be saying that language tags are represented by strings conforming to RFC 5646.

@pfps
Copy link
Contributor

pfps commented Feb 28, 2025

In what way does RDF Concepts not say that?

Not say what?

@afs
Copy link
Contributor

afs commented Feb 28, 2025

In what way does RDF Concepts not say that?

Not say what?

What you describe.

What is the concrete proposal (PR, or suggested change to this PR) for changing RDF Concepts?

@pfps
Copy link
Contributor

pfps commented Feb 28, 2025

RDF Concepts says this, as far as I can tell:

language tags are strings and then [goes] on to define an equality over them

RDF Concepts does not say either of these, as far as I can tell:

RDF Concepts could say that a language tag is a lowercase string that meets the requirements of BCP 47 or it could say that a language tag is a sequence of ASCII case-insensitive characters where the string constructed by taking any of members of the equivalence sets in sequence meets the requirements of BCP 47. ASCII case-insensitive characters are then equivalence sets of characters under the equivalence relation that treats two characters as equivalent if they are both the same when converted to lower case using ASCII case conversion. The former is simpler but the latter provides guidance on how to treat surface syntax language tags.

@pchampin
Copy link
Contributor Author

RDF Concepts says this, as far as I can tell:

language tags are strings and then [goes] on to define an equality over them

Not exactly. The text in this PR says

a non-empty language tag as defined by [BCP47]. [...] Two [BCP47]-complying strings that differ only by case represent the same language tag.

The goal is to convey the idea that RDF language tags are an abstraction of the string complying with BCP-47, without using such scary language. But ok, maybe it's too handwavy.

I would be happy with changing the definition of language tags to lower-case BCP47-compliant strings (as proposed by @pfps). The 3rd paragraph of 3.4.1, in my opinion, explains clearly enough that concrete syntaxes and implementations are free to use the case they want (as long as they ignore it when comparing language tags).

Note that I'm off for 1 week starting 1h ago, so this will not progress unless another editor takes custody of this PR.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tweaks for clarity, grammar, and consistency.

@@ -733,125 +733,140 @@ <h3>Literals</h3>
<p>Literals are used for values such as strings, numbers, and dates.</p>

<p>A <dfn data-local-lt="RDF literal">literal</dfn> in an <a>RDF graph</a> consists of
two, three, or four elements, as follow:</p>
two, three, or four elements, as follow.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentionally a colon. A full-stop puts a bit too much break.

Suggested change
two, three, or four elements, as follow.</p>
two, three, or four elements, as follow:</p>

to a <a>literal value</a>.</li>
<li>If and only if the <a>datatype IRI</a> is
<code>http://www.w3.org/1999/02/22-rdf-syntax-ns#langString</code> or
<code>http://www.w3.org/1999/02/22-rdf-syntax-ns#dirLangString</code>, a
non-empty <dfn>language tag</dfn> as defined by [[!BCP47]]. The
language tag MUST be well-formed according to
<a data-cite="bcp47#section-2.2.9">section 2.2.9</a>
of [[!BCP47]],
and MUST be treated consistently, that is, in a case insensitive manner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and MUST be treated consistently, that is, in a case insensitive manner.
and MUST be treated consistently in a case insensitive manner.

Comment on lines +754 to +756
a <dfn>base direction</dfn> that MUST be either<ul>
<li>`ltr`, indicating that the initial text direction is set to left-to-right, or</li>
<li>`rtl`, indicating that the initial text direction is set to right-to-left.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a <dfn>base direction</dfn> that MUST be either<ul>
<li>`ltr`, indicating that the initial text direction is set to left-to-right, or</li>
<li>`rtl`, indicating that the initial text direction is set to right-to-left.</li>
a <dfn>base direction</dfn> that MUST be one of the following:<ul>
<li>`ltr`, indicating that the initial text direction is set to left-to-right</li>
<li>`rtl`, indicating that the initial text direction is set to right-to-left</li>


<p><dfn data-local-lt="term-equal">Literal term equality</dfn>:
Two literals are term-equal (the same <a>RDF literal</a>)
two literals are term-equal (the same <a>RDF term</a>)
if and only if:</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if and only if:</p>
if and only if the following are all true:</p>

Comment on lines +771 to +774
<li>the two <a>lexical forms</a> compare equal,</li>
<li>the two <a>datatype IRIs</a> compare equal,</li>
<li>the two <a>language tags</a> are either both absent, or both present and compare equal,</li>
<li>the two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li>the two <a>lexical forms</a> compare equal,</li>
<li>the two <a>datatype IRIs</a> compare equal,</li>
<li>the two <a>language tags</a> are either both absent, or both present and compare equal,</li>
<li>the two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>
<li>The two <a>lexical forms</a> compare equal.</li>
<li>The two <a>datatype IRIs</a> compare equal.</li>
<li>The two <a>language tags</a> are either both absent, or both present and compare equal.</li>
<li>The two <a>base directions</a> are either both absent, both `ltr`, or both `rtl`.</li>

Comment on lines +816 to +817
In RDF 1.1, `"chat"@fr` and `"chat"@FR` theoretically represent two distinct terms, but implementations may replace one with the other via some form of normalization.
In RDF 1.2, they represent the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In RDF 1.1, `"chat"@fr` and `"chat"@FR` theoretically represent two distinct terms, but implementations may replace one with the other via some form of normalization.
In RDF 1.2, they represent the exact same literal, i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.
In RDF 1.1, `"chat"@fr` and `"chat"@FR` represent two distinct terms,
but implementations may replace either with the other via some form of normalization.
In RDF 1.2, they represent the exact same literal,
i.e., the case difference in the concrete syntax does not propagate into the abstract syntax.

<li>If the literal is a <a>directional language-tagged string</a>, then the literal value is
a tuple of its <a>lexical form</a>, its <a>language tag</a>, and its <a>base direction</a>,
likewise in that order.</li>
<li>If the literal's <a>datatype</a> is handled by an RDF implementation,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li>If the literal's <a>datatype</a> is handled by an RDF implementation,
<li>If the literal's <a>datatype</a> is handled by an RDF implementation, then one of the following applies:

Comment on lines +837 to +846
<li>if the literal's <a>lexical form</a> is in the <a>lexical space</a>
of the <a>datatype</a>, then the literal value is the result of applying
the <a>lexical-to-value mapping</a> of the datatype to the
<a>lexical form</a>.</li>
<li>otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be
associated with the literal. Such a case produces a semantic
inconsistency but is not <em>syntactically</em> ill-formed.
Implementations SHOULD accept [=ill-typed=] literals and produce RDF
graphs from them. Implementations MAY produce warnings when
encountering [=ill-typed=] literals.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<li>if the literal's <a>lexical form</a> is in the <a>lexical space</a>
of the <a>datatype</a>, then the literal value is the result of applying
the <a>lexical-to-value mapping</a> of the datatype to the
<a>lexical form</a>.</li>
<li>otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be
associated with the literal. Such a case produces a semantic
inconsistency but is not <em>syntactically</em> ill-formed.
Implementations SHOULD accept [=ill-typed=] literals and produce RDF
graphs from them. Implementations MAY produce warnings when
encountering [=ill-typed=] literals.</li>
<li>If the literal's <a>lexical form</a> is in the <a>lexical space</a>
of the <a>datatype</a>, then the literal value is the result of applying
the <a>lexical-to-value mapping</a> of the datatype to the
<a>lexical form</a>.</li>
<li>Otherwise, the literal is <dfn data-lt-no-plural>ill-typed</dfn> and no literal value can be
associated with the literal. Such a case produces a semantic
inconsistency, but it is not <em>syntactically</em> ill-formed.
Implementations SHOULD accept [=ill-typed=] literals and produce RDF
graphs from them. Implementations MAY produce warnings when
encountering [=ill-typed=] literals.</li>

</ul>

<p>
Thus, two literals can have the same value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Thus, two literals can have the same value
It follows from the above that two literals can have the same value

</pre>

<p>denote the same <a data-lt="literal value">value</a>, but are not the
same literal <a>RDF terms</a> because their
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
same literal <a>RDF terms</a> because their
same literal <a>RDF term</a> because their

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:enhancement Change to enhance the spec without affecting conformance (class 2) –see also spec:editorial
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants