-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canonicalization #17
Canonicalization #17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keywords, from RFC 2119, in this section are in capital letters but in other parts are in small letters. I think that it should unified.
In non-normative sections the lower case "may" and "must" words are typically used so that they don't invoke RFC 2119; this would be meaningless in non-normative sections anyway, but it can be confusing. Lower case versions are avoided in normative sections so they don't look like they are not, in fact, normative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After @TallTed editions, looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commentary as well as changes. Maybe we should have an issue for canonical form because RCH has a much stronger requirement on the cannoical form.
If the goals of the canonical form is to be strictly canonical (for RCH and other signing, hashing uses), we should remove the text
The choice of escape rules for the canonical form will likely be chooses for the strict canonical goal. This is different to the original motivation for the canonical NT (text tools, specifically so a test suite can test the outcome of NT processing without itself being an NT processor). There, for example, raw control characters are more useful than escaped. Change to a paragraph for the motivation being a canonical form of a quad being unique for a given choice of blank node labels and a unique document except for the order of quads. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small tweaks for clarity, and one key question that will likely lead to another small change.
spec/index.html
Outdated
<p class="note">Even when not explicitly serializing | ||
canonical N-Quads, implementers are encouraged to produce this form.</p> | ||
<p class="note">A canonical form for N-Quads can be used to ensure | ||
that the form of a quad is unique for a given choice of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the form of a quad
doesn't seem right, if my understanding is correct, that this form of N-Quads
should result in reduction of semantically identical but syntactically different quads to a single canonical quad...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"syntactically different" needs some work. We don't want to imply the writing of the RDF terms themselves is affected.
"presentation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the provision that this can't really be fully accomplished until #16 is also considered, the intention of this note is to limit choice of representing code points in the resulting RDF term (literal, in this case).
How about something like the following:
A canonical form of N-Quads can be used to ensure
that variations in the syntactic representation of terms
within that quad is determined; each code point
can be represented by only one of
UCHAR
,ECHAR
and unencoded character
where the relevant production allows for a choice in representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See updated note.
I don't believe that the working group has decided to take up this technical work. |
@pfps - yes and no. These non-editorial errata are hard to gauge. There is an errata which leads to https://lists.w3.org/Archives/Public/public-rdf-comments/2022Nov/0000.html rdf-canon has a need for a more canonical "canonical form" which isn't editorial errata. One thing that would help is for the WG to say which ones can proceed to the point of proposal, and which need WG discussion on whether to address at all and whether it is ready for a proposal. The problem as I see it is that FPWDs, and also the continuous publishing of working drafts, do not distinguish "proposal" from statements that suggest intended direction. (I thought we were going to use feature branches but I may have misremembered.) |
This PR is languishing and as some of the changes to the Security Considerations, at least, are gating w3c/rdf-concepts#16 as well as other repos, I'd like to get consensus. If you've provided review comments previously, please either ask for changes or approve. Otherwise, I suggest we just merge and deal with any other changes in subsequent PRs. (Note #16 still to be considered). |
My view is that this PR should not be merged until the WG has determined that it will take up the notion of a canonical form for N-quads. |
@rdfguy and @ktk. Don't want to take time on Thursday's call, perhaps if either of you are on the Editor's call tomorrow we can discuss how to move forward on N-Quads canonicalization, which is gating for the RDF Dataset and Canonicalization WG. It is also an erratum against N-Quads. Next step would be to split the PR between Security Considerations and Canonicalization so at least the Security Considerations part can move forward. (Generally a good idea, but these PRs have a way of snowballing). |
Can you describe how the gate you mention works? I don't see how publishing a WD will help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this will look good, after #19 is merged. But they're working on the same large blocks of text, so I'm not sure.
RDF Canonicalization has an algorithm for creating canonical blank node identifiers which depends on using a canonical form of N-Quads. It’s described in w3.org/TR/rdf-canon. Note the issue markers on required updates to N-Quads. |
@TallTed the changes should be disjoint now. |
What is the reason for the following sentence?
Maybe instead of exception, let all (typed) literals have datatype IRI (including string). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this request, with a suggested change for a very minor typo.
…y Considerations stub. Reference issue #16 as a future direction for canonicalization.
… when canonicalizing.
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Andy Seaborne <andy@apache.org>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Ted Thibodeau Jr <tthibodeau@openlinksw.com>
Co-authored-by: Dan Yamamoto <yamdan@gmail.com>
Forced merge after rebase. |
It was Resolved in the 30-03-2023 meeting to work on C14N. RESOLUTION: the WG will work on c14ngkellogg_: c14n. done for some time. ready to go. related to erratum. (?) ora: there has been substantial discsion on issues page. see no reason not to move forward. <pfps> I'm happy with the workihg group taking up canonicalization issues. I would like to see a resolution that the working group is going to support this. ora: hearing no objections. pfps: I would like to see a resolution that we can point back to that we decided to do this. ora: you're saying we should make a resolution that we will work on c14n, then we can merge? pfps: yes. ora: any objections to that? gkellogg_: make a proposed resolution. <ora> PROPOSAL: the WG will work on c14n <gkellogg_> +1 <ora> +1 <TallTed> +1 <ktk> +1 <afs> +1 +1 <pfps> +1 <AZ> +1 <doerthe> +1 |
This will ultimately make its way into N-Triples, as well.
Note separate of Security Considerations and a phrase about the potential for unescaped characters to obfuscate a string presentation.
For #2.
Preview | Diff