Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonicalization #17

Merged
merged 20 commits into from
Mar 30, 2023
Merged

Canonicalization #17

merged 20 commits into from
Mar 30, 2023

Conversation

gkellogg
Copy link
Member

@gkellogg gkellogg commented Feb 14, 2023

This will ultimately make its way into N-Triples, as well.

Note separate of Security Considerations and a phrase about the potential for unescaped characters to obfuscate a string presentation.

For #2.


Preview | Diff

@gkellogg gkellogg requested review from afs, domel and pchampin February 14, 2023 23:38
Copy link
Contributor

@domel domel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keywords, from RFC 2119, in this section are in capital letters but in other parts are in small letters. I think that it should unified.

@gkellogg
Copy link
Member Author

Keywords, from RFC 2119, in this section are in capital letters but in other parts are in small letters. I think that it should unified.

In non-normative sections the lower case "may" and "must" words are typically used so that they don't invoke RFC 2119; this would be meaningless in non-normative sections anyway, but it can be confusing.

Lower case versions are avoided in normative sections so they don't look like they are not, in fact, normative.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@domel domel self-requested a review February 15, 2023 22:18
Copy link
Contributor

@domel domel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After @TallTed editions, looks good!

Copy link
Contributor

@afs afs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commentary as well as changes. Maybe we should have an issue for canonical form because RCH has a much stronger requirement on the cannoical form.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@afs
Copy link
Contributor

afs commented Feb 16, 2023

If the goals of the canonical form is to be strictly canonical (for RCH and other signing, hashing uses), we should remove the text

  <p class="note">Even when not explicitly serializing
    canonical N-Quads, implementers are encouraged to produce this form.</p>

The choice of escape rules for the canonical form will likely be chooses for the strict canonical goal.

This is different to the original motivation for the canonical NT (text tools, specifically so a test suite can test the outcome of NT processing without itself being an NT processor). There, for example, raw control characters are more useful than escaped.

Change to a paragraph for the motivation being a canonical form of a quad being unique for a given choice of blank node labels and a unique document except for the order of quads.

@domel domel self-requested a review February 16, 2023 16:05
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@gkellogg gkellogg requested a review from afs February 17, 2023 21:27
Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small tweaks for clarity, and one key question that will likely lead to another small change.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated
<p class="note">Even when not explicitly serializing
canonical N-Quads, implementers are encouraged to produce this form.</p>
<p class="note">A canonical form for N-Quads can be used to ensure
that the form of a quad is unique for a given choice of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the form of a quad doesn't seem right, if my understanding is correct, that this form of N-Quads should result in reduction of semantically identical but syntactically different quads to a single canonical quad...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"syntactically different" needs some work. We don't want to imply the writing of the RDF terms themselves is affected.

"presentation"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the provision that this can't really be fully accomplished until #16 is also considered, the intention of this note is to limit choice of representing code points in the resulting RDF term (literal, in this case).

How about something like the following:

A canonical form of N-Quads can be used to ensure
that variations in the syntactic representation of terms
within that quad is determined; each code point
can be represented by only one of
UCHAR, ECHAR and unencoded character
where the relevant production allows for a choice in representation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See updated note.

spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
spec/index.html Outdated Show resolved Hide resolved
@pfps
Copy link
Contributor

pfps commented Feb 20, 2023

I don't believe that the working group has decided to take up this technical work.

@afs
Copy link
Contributor

afs commented Feb 20, 2023

@pfps - yes and no. These non-editorial errata are hard to gauge.

There is an errata which leads to https://lists.w3.org/Archives/Public/public-rdf-comments/2022Nov/0000.html

rdf-canon has a need for a more canonical "canonical form" which isn't editorial errata.

One thing that would help is for the WG to say which ones can proceed to the point of proposal, and which need WG discussion on whether to address at all and whether it is ready for a proposal.

The problem as I see it is that FPWDs, and also the continuous publishing of working drafts, do not distinguish "proposal" from statements that suggest intended direction. (I thought we were going to use feature branches but I may have misremembered.)

cc @rdfguy, @ktk

spec/index.html Outdated Show resolved Hide resolved
@gkellogg
Copy link
Member Author

This PR is languishing and as some of the changes to the Security Considerations, at least, are gating w3c/rdf-concepts#16 as well as other repos, I'd like to get consensus. If you've provided review comments previously, please either ask for changes or approve. Otherwise, I suggest we just merge and deal with any other changes in subsequent PRs. (Note #16 still to be considered).

@pfps
Copy link
Contributor

pfps commented Feb 21, 2023

My view is that this PR should not be merged until the WG has determined that it will take up the notion of a canonical form for N-quads.

@gkellogg
Copy link
Member Author

@rdfguy and @ktk. Don't want to take time on Thursday's call, perhaps if either of you are on the Editor's call tomorrow we can discuss how to move forward on N-Quads canonicalization, which is gating for the RDF Dataset and Canonicalization WG. It is also an erratum against N-Quads.

Next step would be to split the PR between Security Considerations and Canonicalization so at least the Security Considerations part can move forward. (Generally a good idea, but these PRs have a way of snowballing).

@pfps
Copy link
Contributor

pfps commented Feb 21, 2023

Can you describe how the gate you mention works? I don't see how publishing a WD will help.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this will look good, after #19 is merged. But they're working on the same large blocks of text, so I'm not sure.

@gkellogg
Copy link
Member Author

RDF Canonicalization has an algorithm for creating canonical blank node identifiers which depends on using a canonical form of N-Quads. It’s described in w3.org/TR/rdf-canon. Note the issue markers on required updates to N-Quads.

@gkellogg
Copy link
Member Author

@TallTed the changes should be disjoint now.

@domel
Copy link
Contributor

domel commented Feb 21, 2023

What is the reason for the following sentence?

Literals with the datatype http://www.w3.org/2001/XMLSchema#string MUST NOT use the datatype IRI part of the literal, and are represented using only STRING_LITERAL_QUOTE.

Maybe instead of exception, let all (typed) literals have datatype IRI (including string).

Copy link
Contributor

@yamdan yamdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this request, with a suggested change for a very minor typo.

spec/index.html Outdated Show resolved Hide resolved
@gkellogg gkellogg added the spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature label Mar 16, 2023
@pfps pfps added the needs discussion Proposed for discussion in an upcoming meeting label Mar 23, 2023
@pfps pfps removed the needs discussion Proposed for discussion in an upcoming meeting label Mar 30, 2023
gkellogg and others added 20 commits March 30, 2023 13:44
…y Considerations stub.

Reference issue #16 as a future direction for canonicalization.
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Andy Seaborne <andy@apache.org>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Dan Yamamoto <yamdan@gmail.com>
@gkellogg
Copy link
Member Author

Forced merge after rebase.

@gkellogg
Copy link
Member Author

It was Resolved in the 30-03-2023 meeting to work on C14N.

RESOLUTION: the WG will work on c14n

gkellogg_: c14n. done for some time. ready to go. related to erratum.
… RCH group depends on it.
… I do think all issues have been discussed.
… the only thing that didn't happen was that it needs discussion on "label"

(?)
… adding section to n-quads. minor changes from what was done in n-triples.
… more changes anticipated.
… that would be the subject of future PRs.

ora: there has been substantial discsion on issues page. see no reason not to move forward.
… anyone need any time to still look at this before it gets merged?

<pfps> I'm happy with the workihg group taking up canonicalization issues. I would like to see a resolution that the working group is going to support this.

ora: hearing no objections.

pfps: I would like to see a resolution that we can point back to that we decided to do this.
… it is a substantial change.

ora: you're saying we should make a resolution that we will work on c14n, then we can merge?

pfps: yes.

ora: any objections to that?

gkellogg_: make a proposed resolution.

<ora> PROPOSAL: the WG will work on c14n

<gkellogg_> +1

<ora> +1

<TallTed> +1

<ktk> +1

<afs> +1

+1

<pfps> +1

<AZ> +1

<doerthe> +1

@gkellogg gkellogg merged commit ac9ad81 into main Mar 30, 2023
@gkellogg gkellogg deleted the c14n-document branch March 30, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants