Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various errata from 1.1 #2

Closed
6 tasks done
gkellogg opened this issue Jan 21, 2023 · 5 comments
Closed
6 tasks done

Various errata from 1.1 #2

gkellogg opened this issue Jan 21, 2023 · 5 comments
Labels
propose closing Proposed for closing spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature

Comments

@gkellogg
Copy link
Member Author

In an email relevant to erratum 32, @kasei writes:

Within STRING_LITERAL_QUOTE, only the characters U+0022, U+005C, U+000A, U+000D are encoded using ECHAR. ECHAR must not be used for characters that are allowed directly in STRING_LITERAL_QUOTE.

Does this really mean that control characters must be written directly without escaping or encoding (e.g. NULL, BELL, BACKSPACE, etc.)? While their use probably isn’t common in N-Triples documents, the idea of a canonical representation requiring these to be written directly strikes me as ill-advised, as it makes handling of this data more difficult (e.g. having to carefully handle NULL characters vs. NULL terminators, not being able to copy-paste data containing unprintable control characters, etc.).

The ECHAR range is '\' [tbnrf"'\], which doesn't really cover control characters; those would have needed to be represented using UCHAR, which have been explicitly prohibited. We would need to add them back and require them for control characters, to be consistent, but this may be going too far far. But, this would presumably cover \u0000 through \u001F exclusive of those covered by allowed ECHAR.

I suggest we not consider this, unless there is a demonstrated need, as it was considered and resolved in 1.1.

@afs
Copy link
Contributor

afs commented Feb 14, 2023

We should consider this.

The original motivation for a canonical form was simple processing by text tools - e.g. regex of an NT line.
Now it is for RDF canonicalization and signing. Anything that can be used to confuse makes it a security issue.

The RDF 1.1 NT text includes "Implementers are encouraged to produce this form." so the format was not mandatory. We have some room for improvements.

A change to consider is

  • Characters in the codepoint range U+0020 to U+10FFFF MUST NOT be represented by UCHAR.
  • Characters in the codepoint range U+0000 to U+001F MUST be represented by ECHAR or represented by UCHAR where ECHAR is not available.

All UCHAR would be better but we are where we are.

This is an outline to show something is possible - the text needs refining.

Process-wise:

I suggest creating an issue for this, label security, and close the errata.
There should be something in the security section.

(We need a better way to track cross document concerns.)

@kasei
Copy link

kasei commented Feb 14, 2023

I suggest we not consider this, unless there is a demonstrated need, as it was considered and resolved in 1.1.

@gkellogg – Do you have any pointers to the previous discussions? From the outside of the WG, the handling of the canonical form seemed a bit rushed, and I wasn't left with the feeling that it got a lot of consideration. Would like to look into the reasoning used during 1.1 to end up with the decisions that were made.

@gkellogg
Copy link
Member Author

The discussion in the RDF WG was before my time. Looking through the RDF WG mail archives doesn't provide much, either.

@ericprud was likely involved in the C14N discussions. But, @afs's points about security certainly make a case for revisiting this. @dlongley may have a view on the implications for https://github.com/w3c/rdf-canon, but I suspect that there won't be any tests that overlap with the problem areas.

See #2 (comment) for a suggested change to using ECHAR and UCHAR for canonical N-Quads/Triples.

@gkellogg
Copy link
Member Author

Chatted with @ericprud on Skype. The main motivation for canonicalization in N-Triples was for testing. Best is to create an issue specific to escaping in literals, and note as an issue in the C14N section and in a new Security Considerations section.

@gkellogg gkellogg added the spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature label Mar 22, 2023
@gkellogg gkellogg added the propose closing Proposed for closing label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
propose closing Proposed for closing spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature
Projects
None yet
Development

No branches or pull requests

3 participants