Normalize literals? #15

gkellogg · 2022-10-17T19:25:24Z

The CG spec has the following issue:

It seems that quads must be normalized,
so that literals with different syntactic representations
but the same semantic representations are merged,
and that two graphs differing in the syntactic representation
of a literal will produce the same set of blank node identifiers.

While this may be possible for certainly well-known datatypes (e.g., XSD), datasets may use literals with datatypes
from vocabularies with no defined normalization, or even L2V operation.

afs · 2022-10-18T10:42:19Z

@gkellogg - Thanks for pointing this out.

Even taking XSD as an example, not all systems provide complete coverage for XSD and so don't normalize for all datatypes. And some systems don't faithfully preserve all datatypes (xsd:positiveInteger vs xsd:integer).

And if we choose some datatypes and not others, (e.g. XSD integers but not XSD duration), the impact on supporting systems may be significant because they don't support such choices. Implementing RDC would need changes lower down which is an impedance to adoption.

The algorithm could be defined on RDF terms, and two graphs that differ by non-canonical syntactic form are different.
That might be desirable at least as a base level because they are different graphs. (Another "we are where we are" situation.)

A graph may have both

:s :p +1 .
:s :p 1 .

Two triples. Another case is trailing zeroes in decimals and doubles - sometimes used to informally indicate precision.

Often it's not what most users expect (but not all users). Parameters to the algorithm will need to be propagated with the results (#11).

gkellogg · 2022-11-24T20:58:51Z

I propose that we close this,, and remove the discussion from the spec. It is flawed, as @afs noted.

TallTed · 2022-11-28T17:50:58Z

I think there needs to be some statement along the lines of (but not necessarily exactly), "Literals are not normalized, for a number of reasons (possibly with a link to this issue, or a brief summary of some of those reasons). For purposes of RCH, literals with different syntactic representations but the same semantic representations are not merged, and two graphs differing only in the syntactic representation of one or more literals may produce different sets of blank node identifiers."

... with a note explaining why literals are not normalized. Fixes #15.

gkellogg added the question Further information is requested label Oct 17, 2022

gkellogg added the propose closing label Nov 24, 2022

gkellogg added a commit that referenced this issue Dec 5, 2022

Add an explanation on step 2.1 of the canonicalization algorithm

a711fcf

... with a note explaining why literals are not normalized. Fixes #15.

gkellogg mentioned this issue Dec 5, 2022

Add an explanation on step 2.1 of the canonicalization algorithm #52

Merged

gkellogg closed this as completed in #52 Dec 6, 2022

gkellogg added a commit that referenced this issue Dec 6, 2022

Add an explanation on step 2.1 of the canonicalization algorithm

1779525

... with a note explaining why literals are not normalized. Fixes #15.

philarcher mentioned this issue Oct 18, 2023

CR Request for RDF Dataset Canonicalization w3c/transitions#571

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize literals? #15

Normalize literals? #15

gkellogg commented Oct 17, 2022

afs commented Oct 18, 2022

gkellogg commented Nov 24, 2022

TallTed commented Nov 28, 2022

Normalize literals? #15

Normalize literals? #15

Comments

gkellogg commented Oct 17, 2022

afs commented Oct 18, 2022

gkellogg commented Nov 24, 2022

TallTed commented Nov 28, 2022