Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize literals? #15

Closed
gkellogg opened this issue Oct 17, 2022 · 3 comments · Fixed by #52
Closed

Normalize literals? #15

gkellogg opened this issue Oct 17, 2022 · 3 comments · Fixed by #52
Labels
propose closing question Further information is requested

Comments

@gkellogg
Copy link
Member

The CG spec has the following issue:

It seems that quads must be normalized,
so that literals with different syntactic representations
but the same semantic representations are merged,
and that two graphs differing in the syntactic representation
of a literal will produce the same set of blank node identifiers.

While this may be possible for certainly well-known datatypes (e.g., XSD), datasets may use literals with datatypes
from vocabularies with no defined normalization, or even L2V operation.

@gkellogg gkellogg added the question Further information is requested label Oct 17, 2022
@afs
Copy link

afs commented Oct 18, 2022

@gkellogg - Thanks for pointing this out.

Even taking XSD as an example, not all systems provide complete coverage for XSD and so don't normalize for all datatypes. And some systems don't faithfully preserve all datatypes (xsd:positiveInteger vs xsd:integer).

And if we choose some datatypes and not others, (e.g. XSD integers but not XSD duration), the impact on supporting systems may be significant because they don't support such choices. Implementing RDC would need changes lower down which is an impedance to adoption.

The algorithm could be defined on RDF terms, and two graphs that differ by non-canonical syntactic form are different.
That might be desirable at least as a base level because they are different graphs. (Another "we are where we are" situation.)

A graph may have both

:s :p +1 .
:s :p 1 .

Two triples. Another case is trailing zeroes in decimals and doubles - sometimes used to informally indicate precision.

Often it's not what most users expect (but not all users). Parameters to the algorithm will need to be propagated with the results (#11).

@gkellogg
Copy link
Member Author

I propose that we close this,, and remove the discussion from the spec. It is flawed, as @afs noted.

@TallTed
Copy link
Member

TallTed commented Nov 28, 2022

I think there needs to be some statement along the lines of (but not necessarily exactly), "Literals are not normalized, for a number of reasons (possibly with a link to this issue, or a brief summary of some of those reasons). For purposes of RCH, literals with different syntactic representations but the same semantic representations are not merged, and two graphs differing only in the syntactic representation of one or more literals may produce different sets of blank node identifiers."

gkellogg added a commit that referenced this issue Dec 5, 2022
... with a note explaining why literals are not normalized.

Fixes #15.
gkellogg added a commit that referenced this issue Dec 6, 2022
... with a note explaining why literals are not normalized.

Fixes #15.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
propose closing question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants